SAROS | SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data
DOI: 10.25737/SZ96-ZG60 | Data Citation Required | 117 Views | 1 Citations | Analysis Result
Location | Subjects | Size | Updated | |||
---|---|---|---|---|---|---|
Adenocarcinoma, Breast, Corpus Endometrial Carcinoma, COVID-19(non-cancer), Cutaneous Melanoma, Ductal Adenocarcinoma, Head and Neck Carcinomas, Head and Neck Squamous Cell Carcinoma, Healthy Controls (non-cancer), Kidney Cancer, Liver Hepatocellular Carcinoma, Lung Adenocarcinoma, Lung Cancer, Lung Squamous Cell Carcinoma, Melanoma, Non-small Cell Lung Cancer, Soft-tissue Sarcoma, Squamous Cell Carcinoma, Stomach Adenocarcinoma, Uterine Corpus Endometrial Carcinoma | Breast, Chest, Extremities, Head-Neck, Kidney, Liver, Lung, Pancreas, Skin, Stomach, Uterus | 882 |
ACRIN-NSCLC-FDG-PET
CPTAC-LSCC
Soft-tissue-Sarcoma
NSCLC Radiogenomics
Lung-PET-CT-Dx
NSCLC-Radiomics
LIDC-IDRI
TCGA-LUAD
TCGA-STAD
Anti-PD-1_MELANOMA
TCGA-UCEC
CPTAC-CM
TCGA-LUSC
ACRIN-FLT-Breast
Anti-PD-1_Lung
HNSCC
QIN-HEADNECK
CPTAC-LUAD
C4KC-KiTS
Head-Neck Cetuximab
TCGA-LIHC
CPTAC-PDA
NSCLC-Radiomics-Genomics
ACRIN-HNSCC-FDG-PET-CT
Pancreas-CT
TCGA-HNSC
COVID-19-NY-SBU
|
Segmentations | 2024/03/07 |
Summary
Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (References: Koitka 2020 and Haubold 2023). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255). 900 CT series from 882 patients were randomly selected from the following TCIA collections (number of CTs per collection in parenthesis): ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET/CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1). A script to download and resample the images is provided in our GitHub repository: https://github.com/UMEssen/saros-dataset The annotations are provided in NIfTI format and were performed on 5mm slice thickness. The annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file. The labels which were modified or require further commentary are listed and explained below: For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency.Body Regions
Body Parts
Data Access
Version 2: Updated 2024/03/07
The segmentations of 91 cases were updated to improve the segmentation quality. In some cases, some bones (mostly the ribs) were incorrectly annotated as “muscle”. These mistakes were revised and the segmentation accuracy of these areas was improved.
Title | Data Type | Format | Access Points | Subjects | License | |||
---|---|---|---|---|---|---|---|---|
SAROS Segmentations | Segmentation | NIFTI and ZIP | 900 | 1,709 | CC BY 4.0 | |||
Segmentation Information Spreadsheet | Radiomic Feature | CSV | CC BY 4.0 |
Collections Used In This Analysis Result
Title | Data Type | Format | Access Points | Subjects | License | |||
---|---|---|---|---|---|---|---|---|
Source Images ACRIN-HNSCC-FDG-PET/CT (48), Anti-PD-1_MELANOMA (2), HNSCC (17), Head-Neck Cetuximab (12), QIN-HEADNECK (94), TCGA-HNSC (1) | CT | DICOM | Requires NBIA Data Retriever |
174 | 174 | 174 | 56,400 | TCIA Restricted |
Source Images ACRIN-FLT-Breast (32), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), C4KC-KiTS (175), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), LIDC-IDRI (133), NSCLC Radiogenomics (7), Pancreas-CT (58), Soft-tissue-Sar | CT | DICOM | Requires NBIA Data Retriever |
614 | 626 | 632 | 126,796 | CC BY 3.0 |
Source Images NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20) | CT | DICOM | Requires NBIA Data Retriever |
76 | 76 | 76 | 8,807 | CC BY-NC 3.0 |
Source Images COVID-19-NY-SBU (1), Lung-PET-CT-Dx (17) | CT | DICOM | Requires NBIA Data Retriever |
18 | 18 | 18 | 2,654 | CC BY 4.0 |
Additional Resources For This Dataset
- A script to download and resample the images in GitHub repository: https://github.com/UMEssen/saros-dataset
Citations & Data Usage Policy
Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:
Data Citation |
|
Koitka, S., Baldini, G., Kroll, L., van Landeghem, N., Haubold, J., Sung Kim, M., Kleesiek, J., Nensa, F., & Hosch, R. (2023). SAROS – A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data (SAROS) (Version 2) [Data set]. The Cancer Imaging Archive. https://doi.org/10.25737/SZ96-ZG60 |
Acknowledgements
To the entire annotation lab team at the Institute for Artificial Intelligence in Medicine (IKIM, University Hospital Essen), we express our profound gratitude for your meticulous efforts in data segmentation. Your dedication ensures accuracy and efficiency, paving the way for this collection. Thank you for your invaluable contribution.
To all collections that shared their data and made it possible that we could prepare the segmentations: thank you! Your contributions made it possible to provide an open available segmentation dataset for CT based body composition analysis.
Related Publications
Publications by the Dataset Authors
The authors recommended the following as the best source of additional information about this dataset:
Publication Citation |
|
Koitka, S., Baldini, G., Kroll, L., van Landeghem, N., Pollok, O. B., Haubold, J., Pelka, O., Kim, M., Kleesiek, J., Nensa, F., & Hosch, R. (2024). SAROS: A dataset for whole-body region and organ segmentation in CT imaging. In Scientific Data (Vol. 11, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41597-024-03337-6 |
Research Community Publications
TCIA maintains a list of publications that leveraged this dataset. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.