Detection of Brain Tumor Using Unsupervised Enhanced K -Means, PCA and Supervised SVM Machine Learning Algorithms

The brain tumor is an abnormal cell growth in the human body. To know which type of brain tumor it is and where is the exact location of it. We are using the MR image is a tomographic imaging technique. MRI is based on Nuclear M agne t i c R e s o n a n c e signals. A brain tumor is of two types 1. Benignant 2. malignant. Benignant belongs to I and II grade; th i s type of tumor is not active cells and have a low-grade tumor. It has a uniform structure. Malignant belongs to III and IV grades, this type of tumor are active cells and have a high grade. It has a non-uniformity structure. The initial phase I n p u t MR image is transformed into a binary image by the Otsu threshold technique. The second step k-means segmentation process is used on binary images. Third step D i s c r e t e Wavelet Transform is used on segmented image for extracting the image and it reduces the large dimensionality by using PCA. It identifies the tumor by using Suppor t Vector Machine classification it gives the final output of a brain tumor that normal or abnormal. The proposed paper experimented on the detection of brain tumors using classification algorithms dataset about B r a T S dataset and compared with existing methodologies, and it is then proved that superior to existed.


Introduction
Nowadays in the medical field, e-health care systems and information technology support clinical specialists to offer improved healthiness to the patient. Magnetic Resonance Imaging is the most powerful tool for producing the inside images of the human body, prominently in the brain. This technique is secure in comparison to other techniques like CT-scan and X-ray. MR brain images produce prosperous information for clinical diagnosis[1,2,3]. Briefly, the MRI is a Tomographic imaging method that gives the images of the core bodily and biochemical features of an entity from visibly measured nuclear magnetic resonance (NMR) signs. MR imaging is created on the spectacle of NMR i.e., which was invented by Bloch and his colleagues at Stanford and Purcell and also his colleagues at Harvard in 1946. In 1952 Purcell and Bloch got the Nobel Prize in physics for this discovery. A tumor is an abandoned expansion of a carcinogenic cell in any portion of the body, whereas a brain tumor is an abandoned expansion of a carcinogenic cell in the brain. According to the WHO and American Brain Tumour Association (ABTA), the brain tumor scale can be four categories. Based on these scales they can classify tumor types as either benignant or malignant tumours . The benignant tumor comes  under category I and II, which has  homogeneousness in edifice and it doesn't cover  lively cancer cells, and it is a low-grade tumor,  whereas malignant tumours come under category  III and IV, which have no uniformity in structure  and it is a high-grade tumor. Patients with malignant tumours require continuous monitoring by MR Images or CT (Computed Tomography) scans every six to twelve months. The Brain tumor force impact any person at any stage, and its effect on the body may not be identical for every distinct. The Segmentation process is needed, for detecting infested tumor area from the medical imaging modes. In image analysis, segmentation is crucial. It is a technique for segmenting an image into various areas i.e., mutual and indistinguishable characteristics such as color, texture, boundaries, and shape. The segmentation of brain tumors comprises the procedure of segregating the tumor tissues from ordinary brain tissues such as enema, dead cells, and solid tumors, such as White Matter, Grey Matter, and Cerebrospinal fluid by using the MR images or any other imaging modalities [4, 5, and 6]. The paper aims to progress a computerized arrangement for enhancement, segmentation, and classification of brain tumors. The system can be used by neurosurgeons and healthcare specialists. The system incorporates image processing, pattern analysis and computer vision techniques and is expected to improve the sensitivity, specificity, and efficiency of brain tumor screening. The appropriate mixture and parameterization of the above stages empower the growth of assistant apparatuses that can aid in the premature diagnosis or the 24-hour care of the therapeutic measures.

Proposal Methodology
The main motto of this methodology is detecting the normality and abnormality of MRI brain scans. This sector discusses the block diagram of the projected architecture as shown in Figure 1

Image Thresholding
Thresholding is the humblest process of image segmentation. In this, the gray image is translated into a binary image. In the initial stage, the MR brain image is translated into a binary image by the Otsu Thresholding technique. The pixel values larger than the threshold values are recorded as white and others are recorded as black.

Segmentation
Image Segmentation is a technique of segregating an image into a collection of connected sets of pixels i.e., into regions, linear structures, and 2Dshapes. Image Segmentation is the precise part of the clinical diagnosis. Depends on the image, segmentation can be a very sophisticated process. The segmentation of the infested regions of the MRI brains is attained by the subsequent stages: In the primary stage, the pre-processed MR brain image is transformed into a binary image by the Otsu Thresholding technique [7,8,9,10]. Then in the next stage, the segmentation process will be applied to the binary image. In this study, Enhanced K-means is used for effective segmentation of the brain. The K-means machine learning technique is the best among the most popular partitioning method. In this method, the data is separated into a collectively k finite numeral of clusters and for the allocated each sample it gives the clustered index. In the case of massive data, the K-means clustering is highly deserved than the Hierarchical clustering. The clustering technique K-means is commonly applied for the unsupervised clustering procedure. Understanding the algorithm is easy and a stronghold when making it practice. But it does not provide valid accuracy. So, in this paper, we proposed Enhanced K-means i.e., selecting initial random centers with précising probabilities which increases the speed and also the correctness of the K-means.
Steps are shown below for Enhanced K-Means Algorithm: Phase 1: For the first center c1, choose any data point randomly from the dataset k. Phase 2: compute the distance from each data point to the center cj in the dataset g=d (ci, kj). Phase 3: Determine the second centroid with Phase 4: Determine the subsequent centroid randomly with the probability proportional to the distance from closest center to itself that you have already chosen from the dataset Phase 5: Repeat phase 4 up to the prescribed number of clusters is framed after that follow the standard K-means algorithm

Feature Extraction
It is the procedure of get-together perplexing level information from an image, for instance, shape, texture, color, and separation. Believe it or not, texture assessment is a huge limit of human visual understanding and machine learning structures. It is used effectively to improve the accuracy of the assurance system by picking obvious features. The primary task i n feature extraction i s pattern recognition i.e., takes the input pattern and gives it correctly to one of the desirable output classes [11,12]. This entire cycle is mentioned in two stages: one is Feature Selection, the other one is Classification. Highlight Selection is exceptionally fundamental in the whole cycle considering the way that the classifier won't be ready for seeing inadequately picked highlights. As per Lippman the rules to pick highlights are: "Highlights contains the data, which is critical to see classes, be difficult to irrelevant Variability in the information, furthermore be restricted in number, to allow, able assessment of discriminant limits and to keep the extent of arranging information required. The Fourier Transform (FT) is one of the orthodox tools for signal investigation i.e., which breaks the time domain signal into essential sinusoids of distinct frequencies, as follows, transforms the time domain signal t o t h e frequency domain.
After all that, F o u r i e r Transform has a very serious drawback i.e., it rejects the period info of the signal as shown in Figure 2.

Fig.2. Development of the signal analysis
The Gabor improved the Fourier Transform to analyze an insignificant sample of the signal at the time.
This type of method is called the Windowing technique or Short-Time Fourier-Transform (STFT). It enhances a particular shape of the window to the signal. The STFT can be contemplated as an adjustment amongst the frequency info and time info. The STFT also affords info about both the frequency domain and the time domain. The WT (Wavelet Transform) presents a technique of windowing with variable size.
The frequency information and time information of the signal will be preserved. The DWT (Discrete Wavelet Transform) is a robust orientation of wavelet transform by applying for diploid scales and positions. we are using 2D DWT. The DWT will enforce on 2D-images, which applies to each dimension separately. The image information will be interpreted in a simple hierarchical framework by using wavelets. In DWT, border distortion is the common technique, which is related to digital filters. When altering the image, at the edges the mask will enhance beyond the image, so the result is to wad the pixels outside of the images. In feature extraction, we have mainly 3 types. Those are Intensity, Shape, and Texture Features.

Feature Reduction
An enormous feature inflates the computation time and volume. The Curse of Dimensionality is a problem, it occurs due to the presence of a large number of dimensions or irrelevant features in the dataset. So, t h e dimensions or the number of features have to be reduced. The PCA is an effectual tool which reduces the dimensions of a data set, which consists of a huge number of interconnected variable quantity and possessing the contradictions. According to the importance, the data set will be transformed i n t o a new set of ordered variables. It improves the computational efficiency and also maintains the classification accuracy.

Classification
It classifies the objects of an image into separate classes and gives the concluding step. Recently, many scientists have introduced many classification techniques. Predominantly it was categorized into two leagues i.e., one is Supervised and another one is Unsupervised ML. Concerning, the managed course of action system performs better than the independent portrayal methodology. Sponsorship Vector Machine (SVM) is a benchmark in the field of machine learning. It is a class of controlled learning. It gives high exactness, direct numerical agreement, and rich mathematical sensibility [11]. The SVM gives the maximum margin among hyperplanes that isolates one class from another class. Kernel SVM (KSVM) recently, multiple kernel SVMs have been improved, grown rapidly. Those are very popular and effective. These kernel-based SVMs perform well when compared to conventional SVMs. The obtained solutions are unique, through avoiding the convergence to local minima possessed from the remaining probabilistic methods like neural networks. In this examination, we used the Gaussian part work for change. Using piece work the supportive request can be made, by the unit of non-straight models or data may get possible, where the non-direct models can be changed into a high dimensional future space. The Gaussian piece limit of nonlinear SVM moreover gives the ideal game plan of collection and hypothesis. we have different SVM classifiers applied and the results are compared. The training samples are chosen randomly from the dataset and k-fold cross-validation is applied to justify the robustness of the proposed system. The k-fold cross-validation is very helpful to avoid over fitting.

Experimental Results
The dataset consists of 160 Brain MR Images with (140 abnormal and 20 normal). The dataset contains T2-weighted images of the brain; those are collected from the Harvard Medical School Website. We choose T2-weighted images instead of T1-weighted images because T2 Weighted images are higher in contrast and pleasant in perception in comparison with T1 Weighted images and positron emission topographies (pet) modalities like endoscopy, tactile imaging, and thermography. To validate the outcome of our algorithm, we used a dataset collected from the Harvard Medical School Website, which included sampling images of different patients. The sample preliminary results obtained from the proposed technique and that are depicted in Table 1(a) and (b) and Figure 3 (i) to (iv), Each step indicates how the extraction of the tumor is processed. The proposed PCA+DWT+SVM with the Gaussian Kernel method are better than the linear, poly, and quadratic kernel SVMs. The Gaussian kernel takes the exponential function form, which enlarges the distance from the samples to the proposed that the remaining can't reach. So, the GRB kernel can apply to other industrial fields also. We compared our method with the SVM different kernels GRB, LIN, and HPOL. PCA+DWT+KSVM (LIN) gives 95% accuracy, PCA+DWT+KSVM (HPOL) gives 96.1% accuracy and the proposed method PCA+DWT+KSVM (GRB) gives 97.8% accuracy. The GRB kernel performed best among the three kernels.
Figu r e 3 S e g m e n t a t i o n results: (i) input MRI image, (ii) Otsu Threshold image, (iii) segmented tumor region, and (iv) silhouette graph for the clusters.