Pseudo-PHI-DICOM-Data | A DICOM dataset for evaluation of medical image de-identification
DOI: 10.7937/s17z-r072 | Data Citation Required | 489 Views | 3 Citations | Image Collection
Location | Species | Subjects | Data Types | Cancer Types | Size | Status | Updated |
---|---|---|---|---|---|---|---|
Various | Human | 21 | Other, MG, MR, DX, CR, CT, PT | Various | Public, Complete | 2021/04/07 |
Summary
Open access or shared research data must comply with (HIPAA) patient privacy regulations. These regulations require the de-identification of datasets before they can be placed in the public domain. The process of image de-identification is time consuming, requires significant human resources, and is prone to human error. Automated image de-identification algorithms have been developed but the research community requires some method of evaluation before such tools can be widely accepted. This evaluation requires a robust dataset that can be used as part of an evaluation process for de-identification algorithms. We developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM image information objects were selected from datasets published in TCIA. Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM data elements to mimic typical clinical imaging exams. The evaluation dataset was de-identified by a TCIA curation team using standard TCIA tools and procedures. We are publishing the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (result of TCIA curation) in advance of a potential competition, sponsored by the National Cancer Institute (NCI), for de-identification algorithm evaluation, and de-identification of medical image datasets. The evaluation dataset published here is a subset of a larger evaluation dataset that was created under contract for the National Cancer Institute. This subset is being published to allow researchers to test their de-identification algorithms and promote standardized procedures for validating automated de-identification.
Data Access
Version 2: Updated 2021/04/07
Note: Removed head imaging from 8 series.
Title | Data Type | Format | Access Points | Subjects | License | |||
---|---|---|---|---|---|---|---|---|
Images, Evaluation dataset | MG, MR, DX, CR, CT, PT | DICOM | Download requires NBIA Data Retriever |
21 | 22 | 26 | 1,693 | CC BY 4.0 |
Images, De-identified Evaluation dataset | MG, MR, DX, CR, CT, PT | DICOM | Download requires NBIA Data Retriever |
21 | 22 | 26 | 1,693 | CC BY 4.0 |
Patient Mapping by Evaluation and De-identified ID | Other | CSV | CC BY 4.0 | |||||
UID Mapping Evaluation/De-identified | Other | CSV | CC BY 4.0 |
Additional Resources for this Dataset
The NCI Cancer Research Data Commons (CRDC) provides access to additional data and a cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze, and visualize cancer research data.
- Imaging Data Commons (IDC) (Imaging Data)
Citations & Data Usage Policy
Data Citation Required: Users must abide by the TCIA Data Usage Policy and Restrictions. Attribution must include the following citation, including the Digital Object Identifier:
Data Citation |
|
Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Farahani, K., Prior, F. (2021). A DICOM dataset for evaluation of medical image de-identification (Pseudo-PHI-DICOM-Data) [Data set]. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/s17z-r072 |
Detailed Description
There are 21 patients, 22 studies, 26 series but the patient ids, study instance uids, and series instance uids are different between the 2 datasets thus resulting in a double count.
Acknowledgements
We would like to acknowledge the National Cancer Institute for funding and actively participating in the project that generated the evaluation datasets being published here and the TCIA curation team, led by Ms. Geri Blake, who curated this data. Original data came from multiple institutions and multiple TCIA image collections.
Related Publications
Publications by the Dataset Authors
The authors recommended the following as the best source of additional information about this dataset:
Publication Citation |
|
Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Freyman, J., Blake, G., Tarbox, L., Farahani, K., Prior, F. (2021). A DICOM dataset for evaluation of medical image de-identification, Nature Scientific Data. DOI: 10.1038/s41597-021-00967-y. |
No other publications were recommended by dataset authors.
Research Community Publications
TCIA maintains a list of publications that leveraged this dataset. If you have a manuscript you’d like to add please contact TCIA’s Helpdesk.
Previous Versions
Version 1: Updated 2021/01/31
Title | Data Type | Format | Access Points | License | ||||
---|---|---|---|---|---|---|---|---|
Images, Evaluation dataset | DICOM | Download requires NBIA Data Retriever |
||||||
Images, De-identified Evaluation dataset | DICOM | Download requires NBIA Data Retriever |
||||||
Patient Mapping Evaluation/De-identified | CSV | |||||||
UID Mapping Evaluation/De-identified | CSV |