Skip to main content

TOMPEI-CMMD

The Cancer Imaging Archive

TOMPEI-CMMD | TOMPEI-CMMD Dataset

DOI: 10.7937/wezw-bh22 | Data Citation Required | 65 Views | Analysis Result

Cancer Types Location Subjects Related Collections Size Supporting Data Updated
Breast Cancer Breast 1,363 25.41MB Segmentations, Clinical 2025/01/24

Summary

The TOMPEI-CMMD dataset adds the following image analyses to the original CMMD dataset on TCIA:

  1. Segmentations of 1385 breast lesions with supporting clinical data.
  2. Accuracy Improvements: Corrected images with obvious errors in the labeling of the left (L) and right (R), as well as the CC and MLO orientations, resulting in a more accurate dataset (see clinical data spreadsheet).
  3.  Addition of Breast Cancer Location Information: Unlike the original data, which contained only per-image labels, the TOMPEI-CMMD clinical data spreadsheet includes mask information in the clinical data spreadsheet. This allows researchers to use the precise location data of breast cancer within images, facilitating detailed studies of disease characteristics and treatment efficacy.

The source CMMD collection consisted of 3,728 breast images, including MLO (mediolateral oblique) and CC (craniocaudal) views, from 1,775 Chinese patients. The patients were examined between July 2012 and January 2016. Images were captured using a GE Senographe DS mammography system. Images from the left and right breasts of the patients were treated as independent breast images in the TOMPEI-CMMD dataset. The dataset exclusively used MLO views from the original CMMD dataset, excluding 1,127 CC views. This resulted in 2,601 breast images derived from 1,775 Chinese patients who were initially included for radiological assessment in the TOMPEI-CMMD dataset.

A board-certified radiologist with 20 years of experience in breast imaging assessed the radiological interpretation of all 2,601 breast images in the TOMPEI-CMMD dataset with prior knowledge of the final outcomes of malignant/benign diagnoses annotated in the original CMMD dataset. For lesions recognized as breast cancer, the radiologist documented the radiological findings, including mass, calcification, focal asymmetric density, and distortion, along with their specific locations.

Segmentation was performed by five radiological technologists who collaborated with the radiologist to annotate the images in accordance with evaluations made by the radiologist.

Each breast image in the TOMPEI-CMMD dataset was independently annotated using segmentation masks for the radiological findings, including tags for masses (blue), calcifications (yellow), fibroadenomas (red), distortions (green), focal asymmetric densities (purple), and lipomas (light blue). Breast images with multiple radiological findings were assigned multiple tags.

Segmentation masks for the radiological findings were annotated as JSON files. The JSON files contain tags linked to unique identifiers and attributes including ‘_id’ of the patient; ‘type’ and ‘color’ for the radiological findings; ‘cgPoints’, coordinate values surrounding the annotation masks; and ‘x’ and ‘y’ values representing the lesion’s location.

Following the radiological assessment, 140 breast images with mammographically undetectable lesions from patients with pathologically proven breast cancer were excluded from the TOMPEI-CMMD dataset. In addition, 25 breast images were excluded because of insufficient image quality. Consequently, 2,436 breast images were included in the final TOMPEI-CMMD clinical spreadsheet.

The analysis dataset comprises 1385 breasts with lesions for 1363 patients; breasts without lesions were not included. The 1385 json annotation files have 1773 segmentation masks for the following: 255 Benign lesions, 1515 malignant lesions, and 3 lesions that were subsequently excluded from the published paper. Details are provided in the clinical data spreadsheet.

Data Access

Version 1: Updated 2025/01/24

Title Data Type Format Access Points Subjects Studies Series Images License Metadata
Segmentations of breast lesion, MLO view Segmentation JSON and ZIP 1,363 1,385 CC BY 4.0
Clinical Data Classification, Demographic, Diagnosis, Pathology Detail XLSX 1,363 CC BY 4.0

Collections Used In This Analysis Result

Title Data Type Format Access Points Subjects Studies Series Images License Metadata
Source Data for Segmentations, CMMD MG DICOM 1,363 1,363 1,363 4,162 CC BY 4.0 View

Collections Used In This Analysis Result

Related Collections
Related Datasets
CMMD
No related Analysis Results found: Submit your proposal!
Legend: Collections| Analysis Results

Additional Resources For This Dataset

The following resources have been made available by the data submitters.  These are not hosted or supported by TCIA, but may be useful to researchers utilizing this collection.

Citations & Data Usage Policy

Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:

Data Citation

Kashiwada, Y., Takaya, E., Hiroya, M., Matsuda, N., Yashima, T., Kobayashi, T., Tamiya, G., & Ueda, T. (2025). TOMPEI-CMMD Dataset (Version 1) [Dataset]. The Cancer Imaging Archive. https://doi.org/10.7937/WEZW-BH22

Acknowledgements

The volunteers from the Department of Clinical Imaging in Tohoku University Graduate School of Medicine – JSPS KAKENHI Grant Number JP20H03738 – Joint research funds with NEC corporation – Tohoku University grant of clinical translational research for young investigator

Related Publications

Publications by the Dataset Authors

The authors recommended the following as the best source of additional information about this dataset:

No other publications were recommended by dataset authors.

Research Community Publications

TCIA maintains a list of publications that leveraged this dataset. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.