TOMPEI-CMMD | TOMPEI-CMMD Dataset
DOI: 10.7937/wezw-bh22 | Data Citation Required | 65 Views | Analysis Result
Location | Subjects | Size | Updated | |||
---|---|---|---|---|---|---|
Breast Cancer | Breast | 1,363 | Segmentations, Clinical | 2025/01/24 |
Summary
The TOMPEI-CMMD dataset adds the following image analyses to the original CMMD dataset on TCIA: The source CMMD collection consisted of 3,728 breast images, including MLO (mediolateral oblique) and CC (craniocaudal) views, from 1,775 Chinese patients. The patients were examined between July 2012 and January 2016. Images were captured using a GE Senographe DS mammography system. Images from the left and right breasts of the patients were treated as independent breast images in the TOMPEI-CMMD dataset. The dataset exclusively used MLO views from the original CMMD dataset, excluding 1,127 CC views. This resulted in 2,601 breast images derived from 1,775 Chinese patients who were initially included for radiological assessment in the TOMPEI-CMMD dataset. A board-certified radiologist with 20 years of experience in breast imaging assessed the radiological interpretation of all 2,601 breast images in the TOMPEI-CMMD dataset with prior knowledge of the final outcomes of malignant/benign diagnoses annotated in the original CMMD dataset. For lesions recognized as breast cancer, the radiologist documented the radiological findings, including mass, calcification, focal asymmetric density, and distortion, along with their specific locations. Segmentation was performed by five radiological technologists who collaborated with the radiologist to annotate the images in accordance with evaluations made by the radiologist. Each breast image in the TOMPEI-CMMD dataset was independently annotated using segmentation masks for the radiological findings, including tags for masses (blue), calcifications (yellow), fibroadenomas (red), distortions (green), focal asymmetric densities (purple), and lipomas (light blue). Breast images with multiple radiological findings were assigned multiple tags. Segmentation masks for the radiological findings were annotated as JSON files. The JSON files contain tags linked to unique identifiers and attributes including ‘_id’ of the patient; ‘type’ and ‘color’ for the radiological findings; ‘cgPoints’, coordinate values surrounding the annotation masks; and ‘x’ and ‘y’ values representing the lesion’s location. Following the radiological assessment, 140 breast images with mammographically undetectable lesions from patients with pathologically proven breast cancer were excluded from the TOMPEI-CMMD dataset. In addition, 25 breast images were excluded because of insufficient image quality. Consequently, 2,436 breast images were included in the final TOMPEI-CMMD clinical spreadsheet. The analysis dataset comprises 1385 breasts with lesions for 1363 patients; breasts without lesions were not included. The 1385 json annotation files have 1773 segmentation masks for the following: 255 Benign lesions, 1515 malignant lesions, and 3 lesions that were subsequently excluded from the published paper. Details are provided in the clinical data spreadsheet.
Data Access
Version 1: Updated 2025/01/24
Title | Data Type | Format | Access Points | Subjects | License | Metadata | |||
---|---|---|---|---|---|---|---|---|---|
Segmentations of breast lesion, MLO view | Segmentation | JSON and ZIP | 1,363 | 1,385 | CC BY 4.0 | — | |||
Clinical Data | Classification, Demographic, Diagnosis, Pathology Detail | XLSX | 1,363 | CC BY 4.0 | — |
Collections Used In This Analysis Result
Title | Data Type | Format | Access Points | Subjects | License | Metadata | |||
---|---|---|---|---|---|---|---|---|---|
Source Data for Segmentations, CMMD | MG | DICOM | Requires NBIA Data Retriever |
1,363 | 1,363 | 1,363 | 4,162 | CC BY 4.0 | View |
Additional Resources For This Dataset
The following resources have been made available by the data submitters. These are not hosted or supported by TCIA, but may be useful to researchers utilizing this collection.
- Sample Python code to view, overlay, and inspect selected TOMPEI-CMMD dataset files: https://github.com/javasparrows/TOMPEI-CMMD
Citations & Data Usage Policy
Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:
Data Citation |
|
Kashiwada, Y., Takaya, E., Hiroya, M., Matsuda, N., Yashima, T., Kobayashi, T., Tamiya, G., & Ueda, T. (2025). TOMPEI-CMMD Dataset (Version 1) [Dataset]. The Cancer Imaging Archive. https://doi.org/10.7937/WEZW-BH22 |
Acknowledgements
The volunteers from the Department of Clinical Imaging in Tohoku University Graduate School of Medicine – JSPS KAKENHI Grant Number JP20H03738 – Joint research funds with NEC corporation – Tohoku University grant of clinical translational research for young investigator
Related Publications
Publications by the Dataset Authors
The authors recommended the following as the best source of additional information about this dataset:
Research Community Publications
TCIA maintains a list of publications that leveraged this dataset. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.