CAPI - Grupo de Clasificación de Patrones y Análisis de Imágenes - Microcalcification cluster diagnosis in digitized mammograms

Microcalcification cluster diagnosis in digitized mammograms

Written by Ramón Gallardo Caballero

Article Index
Microcalcification cluster diagnosis in digitized mammograms
Our Proposal
Methodology
System implementation
Results
Future developments

Page 3 of 6

Methodology

The current source data for out work is the mammographic database know as Digital Database for Screening Mammography (DDSM). Developed by the Island Vision Group at the University of South Florida may be the most extensive and better quality free to use database for research purposes. It comprises about 2500 complete cases, providing the four typical views in a mammographic study (left and right cranio-caudal and medio-lateral-oblique). Additionally it provides useful information for the case as age, film type, scanner, etc. But the most interesting feature for our work is that it provides what is call ground truth marks when a breast presents a biopsy proven abnormality, specifying its type and distribution in the ACR internationally accepted nomenclature named BIRADS.

There exist other databases in this field, as can be MIAS or Nijmegen; but they are unavailable or its distribution is restricted. MIAS group provides a reduced version free of charge (miniMIAS), but its low spatial and spectral resolution makes it useless for microcalcification detection problems.

Data source

Although chosen database seems very complete, it presents some characteristics which we think suitable to modify. The first one, although easily addressable consist on the fact that all information for each case is provided by text files. Due to the huge amount of cases and mammograms available, this seems to be an unpractical format for our work. So the first step was to insert all DDSM provided information in a SQL database.

The second characteristic to modify wasn’t immediate; DDSM cases come from different U.S.A medical centres. Digitizing process was carried out in each centre using available hardware at the institution, that’s why we have four different types of cases depending on the used scanner. DDSM cases are provided in raw mode, so scanned grey level is provided using the spectral scale of the machine. Each scanner has different calibration ranges even one of them has logarithmic response so we need to convert each mammogram to a common magnitude: optical density. This parameter has physic significance and is widely used in this field, so using calibration parameters provided by DDSM authors we can work with all mammograms independently of the scanning machine.

Dataset prototypes generation

This was probably one of the slowest phases of this development because we propose, as a first approximation, to carry out pixel level diagnostic. So, with the help of an experienced radiologist we made a pixel labelling work using predefined classes over mammogram regions, once converted to optical density. These set of regions correspond to all ROIs defined in the DDSM database but also include some manually selected regions which contain significant mammogram structures like vascular calcifications or artifacts.

The set of classes to study includes not only microcalcifications belonging to a cluster (hence malignancy indicatives) but also benign microcalcifications, large rod-like calcifications, round calcifications, lucent–centered calcifications, healthy tissue and several kind of artifacts found.

Totally we have inserted a training set of more than 4600 microcalcification prototypes, in excess than 6700 benign and more than 100000 healthy or benign prototypes.

Due to the high number of available prototypes and foreseen the following training step, we decided to build different training sets varying different prototypes percentages while including all malignant microcalcification prototypes. This later training sets will be used in the following training steps carried out in the project.

ICA base matrices procurement

The ICA feature extractor subsystem need to be generated using samples from the space to be modelled (in our case mammograms). These samples are squared regions whose side sizes indicate the ICA analysis dimension. The typical modelling strategy consist on take random samples from the modelling space, but in this case after some tests and considering the small number of abnormal zones in relation to the normal ones, we decided to centre our samples in pixels belonging to the prototypes dataset.

Once ICA matrices are generated, the feature extraction task is reduced to matrix multiplication operations. Hence for example in the following image we show the ICA decomposition for a small microcalcification cluster.

ICA expansion

Neural classifiers training

Training is carried out over set of prototypes which ensures the possibility to obtain a sample of at least 51 pixels of side. Each dataset tries to model a specific characteristic, but each includes the total number of malignant prototypes.

Training is carried out in the distributed cluster, which allows us to carry out around 42 simultaneous training processes. The neuronal structure used was a multilayer perceptron with only one hidden layer whose size varies from 50 to 200 neurons. The training process is carried out for each size of the hidden layer, increasing the number of neurons up to the maximum foreseen, and for each size several different seed repetitions are computed. The optimum network obtained in the process along with its performance parameters is stored for a later use.

Performance analysis

Performance analysis is carried out at three different levels. The first one is done at prototype level, evaluating the overall success rate and for the objective class (malignant microcalcification or microcalcification, depending on the chosen configuration).

The second level evaluates system diagnosing performance for overlays previously generated, comparing results with real diagnostics stored in the database. In this case as well as in the third level we need to add a new element to our system: a region of interest (ROI) generator.

A region of interest is a squared, circular or free hand enclosure mark which surrounds or indicates the zone in which an abnormality extends. In our case the proposed ROI generator involves various phases:

Calcification individualization
Density map generation
Density map filtering
ROI individualization

After ROI has been created, we must fix diagnostic criteria for ROIs built in overlays and in mammograms. For overlays we have proposed two different criterions. The first one considers a succeeded ROI finding if the stored overlay contains a malignant abnormality too. The second criterion requires an effective overlap between generated and stored ROIs. As can be seen this later criterion is more restrictive than the first one and although the first is also acceptable we have decided to use the second in our works.

Relating to ROI diagnostic evaluation for mammogram generated ROIs, although the generation process is the same as the one used for overlays, diagnosis criterions must be slightly altered. We have proposed again two different criterions, the first one establish that a generated ROI is a true positive if it overlaps with a real malignant ROI stored in the database. The second proposed criterion requires a complete inclusion of the real ROI into the generated one. As can be seen this criterion can lead to application problems if the generated ROI grows less than expected causing a false negative result.

<< Prev - Next >>

Last Updated ( jueves, 25 noviembre 2010 )

[ Back ]

Main menu

Research