The purpose of this project was to identify and use a parameter to process grayscale mammogram images of various background tissue classifications: Fatty, Fatty Glandular, & Dense Tissue. This classification is used when radiologists analyze mammograms and need to consider if tissue density will obscure any abnormalities such as lesions or tumors. This is because both normal physiological structures such as glandular tissue and fibrous connective tissue. and abnormal morphologies such as calcifications and tumors will appear very bright on the mammogram while less dense fatty tissue will appear black. Therefore, it was appropriate to program a classifier that can manipulate pixel intensity levels to best visualize and identify masses.
Step 1: Organizing Mammogram Data
One of the first things I realized I needed to handle was organizing the data in a very clear, concise, and accessible way. These are the variables I extracted from the mini-MIAS Database of mammograms. I created two arrays. One containing 4 columns:
- Image Number:
- x coordinate of mass
- y coordinate of mass
- Mass Radius: (This defined an approximate size for the mass
The second array contained classification information:
- Type of Background Tissue: Fatty (F), Fatty Glandular(G), Dense (D)
- Description of Mass: Well-defined (CIRC), spiculated (SPIC), ill defined other (MISC) Architectural distortion (ARCH), Asymmetry (ASYM), Normal (NORM)
- Diagnoses: Benign (B), Malignant (M)
Since the aim of this project was to determine the best threshold for each type of background tissue not all the information was necessary. However, you may expand your project to include texture analysis and test your classifier against the known mass descriptions.
Side Note: The database from which I got the diagnosed Mammogram Images organized the information about each mammogram in a text file separate from the images. It was mildly difficult for me to extract the data from a text file and organize into array forms, but the following link was very helpful in figuring all that out. Alternatively, just adjust the code I pasted above for your purposes.
Mammogram File Format:
mdb001 G CIRC B 535 425 197
mdb002 G CIRC B 522 280 69
TextScan Help: https://www.mathworks.com/help/matlab/ref/textsca...
Mammogram Database: http://peipa.essex.ac.uk/info/mias.html
Step 2: Image Processing
Well, the second thing that came up when I was figuring out how to identify masses was that for many abnormal mammograms I could not visually tell where the abnormality was or how large it was. Obviously, as I am not a experienced radiologist, it was expected. However, the most straightforward way to find abnormalities (according to my lengthy google searches) was to look at concentrations of bright and dark areas. I primarily used the adapthisteq function to enhance the image contrast and then imbinarize to convert the image to a binary image to experiment with different threshold levels.
- adapthisteq: This function transforms intensity values of grayscale and rgb images using contrast limited adaptive histogram equalization. In other words, it adjusts the histogram of intensity values to a specified type of distribution. The mathworks link for this function is attached below for further reading.
- imbinarize: creates a binary image from a gray scale image by assigning all pixels above a certain intenisty to 1s and the pixels below that value a 0. I used this function to test the optimal threshold to reduce background tissue noise.
Step 3: Thresholding Code
A for loop is used to binarize the mammogram with varying thresholds. To give a bigger picture view, the for loop contains the code from Step 3 to Step 7. So each binary image will be analyzed for abnormalities. Additionally, this for loop is encased in another for loop that imports a new mammogram image from the database in each iteration.
Step 4: Finding Abnormalities for Each Binary Image
I further processed the binary images using the strel function in conjunction with imopen to remove background noise. The binary image from the previous step is inverted and filtered using the neighborhood defined by SE. Then I used bwlabel to label any areas which had at least 8 connected pixels.
The region props function was used to find the centroid and area properties of each spot identified by bwlabel.
Then all spots larger than 500 pixels were identified using ismember. The centroids for the identified spots were plotted on an image that only displayed the spots larger in area than 500.
Area Identified = ismember(Labeled, indicies(sortedAreas>500));
Spots = Identified>0;
Step 5: Plotting the Diagnosed Mass Location and Size for Visual Comparison
I wanted to see if the spots found by bwlabel were correct. I did this in two ways. I first analyzed the accuracy of my classifier by doing a visual comparison. I simply plotted the actual size and location of the abnormality (red circle) and the location determined by the code (blue x) on the pre-processed mammogram image. The six images above show the effects of increasing the grayscale threshold value.
Step 6: Implementing the Second Comparison Method
The second way I tested the classifier and the threshold values were by determining if the locations found by the classifier were within a certain distance from the diagnosed abnormality coordinates. I saved the thresholds for which at least one of the identified points were within 1.5*r from the known abnormality to a separate text file called Mammogram Data. The purpose for this was to find the minimum threshold needed for my classifier to identify the abnormality.
Step 7: Analyzing Collected Data
I ran the program on all the abnormal mammogram images and I was left with a huge text file of data. In order to find the best threshold for each type of tissue I organized the data by tissue type and plotted a histogram of the threshold values for each tissue type. The proper threshold value was decided upon which threshold provided the most accurate results for each tissue type. I saved this data to upload to my classifier.
Step 8: Making Your Own Classifier!
After I found the most appropriate threshold values for each tissue type, I edited my original code to have a user input the image number and tissue type to choose the threshold for the mammogram image. I then plotted the diagnosed mammogram location with the found locations on the original mammogram images. I wanted to make this more fun so I programmed a function to crop a circular region surrounding ROI. The user would be instructed to pick a center point and several points that best encompass the ROI. I attached both matlab files here.
Step 9: Improvements? Any Thoughts?
As I was writing this instructable I begin to see many improvements I could make to the classifier such as finding ways to distinguish between different types of masses identified based on texture analysis or improving my testing for accuracy section of the SandBoxProject. file. Since this was a project with a deadline I had to stop somewhere, but I'm hoping that I'll be able to use the image processing skills I learned in other applications. Also, I attached the file that was used to batch process all the abnormal mammogram images.