Note: When contributing to this page please use Firefox as your browser to allow use of the "Rich Text" WYSIWYG interface for editing this page. Internet Explorer, Chrome, and Safari only allow editing the wiki markup code.
Definition of project
The goal of this project is to create a survey of Publicly Available InVivo Medical Imaging Archives and the underlying software capabilities. It is generally agreed that there is a need for public medical imaging archives to provide the biomedical research community, industry, and academia with access to images that support:
- Lesion detection and classification
- Accelerated diagnostic imaging decision
- Quantitative imaging assessment of drug response
The purpose of this project is to provide a practical guide for the community which allows them to:
- to assess existing software and instantiations that are appropriate to their research or clinical needs.
- to locate relevant publicly available data for research
We encourage any feedback from the wider community that may help improve this information or correct any misconceptions stated below. The survey is divided into two sections:
- Publicly hosted biomedical imaging archives which are populated with actual data which researchers, teachers, industry, etc may wish to utilize
- Image archive software solutions which one could download and use to host their own DICOM image data sets
Publicly Hosted Biomedical Imaging Archives
The following table is a list of publicly accessible DICOM based biomedical image archives. Following the table is an analysis of what was found upon reviewing each archive between August and October of 2010. Information is subsequently updated. See the wiki page history for a full log of the changes. Please notify kirbyju@mail.nih.gov or freymanj@mail.nih.gov to report errors, additions, etc.
| ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Supporting Institution(s) | Cancer Imaging Program | Cancer Imaging Program, caBIG | NIAMS, caBIG | WUSTL, BIRN | Lab of NeuroImaging UCLA (LONI) | BIRN | BIRN | Lung Cancer Alliance, Kitware | Optical Society of America, Kitware | NIH, NDAR | Kitware | NIH |
Content Type | In Vivo Cancer Imaging (see full Collection list) | In Vivo Cancer Imaging (see full collection list) | Osteoarthritis | Biomedical images and meta data | ADNI (Alzheimers), | Brain scans | ELUDE (Elderly Depression), | Patient-contributed Medical scans | Optical | Normal brain development | Biomedical images and meta data | Autism - standard phenotypic data, imaging and genomic/pedigree data related to human subjects |
Archive Software | NBIA, AIM Data Service (XML image metadata), and XNAT (coming soon) | Image Data Archive | custom | custom |
| |||||||
Login account required | Yes. Accounts are free and available to anyone. | For advanced site features or limited access data sets, but is not required for accessing public data | Yes | For accessing limited access data sets, but not for public data | Yes, via email address. Used only for providing links for downloading data. |
| For accessing limited access data sets, but not for public data | No | For accessing limited access data sets, but not for public data | Yes | For accessing limited access data sets, but not for public data |
|
Explicit data sharing policy | Yes, with options for uploading fully open or limited access data sets | Yes, with options for uploading fully open or limited access data sets | Yes, found here | No |
|
|
|
|
|
|
|
|
Number of Registered Users (or NA) | 1,341 |
|
|
|
|
|
|
|
|
|
|
|
Accepting new data | Yes (learn more) | Yes, with approval from NCI CBIIT |
| Yes, users can register accounts and upload data |
|
|
| Yes, through Lung Cancer Alliance | Yes, through the Optical Society of America |
| Yes, users can register accounts and upload data |
|
Central curation/review | Yes, a trained staff visually inspects every image before making them visible | Yes, performed by CBIIT staff |
| No |
|
|
| Yes, performed by Lung Cancer Alliance | Yes, performed by the Optical Society of America |
| Yes, performed by Kitware staff and peer reviews |
|
Availability/Uptime | ~99%, hosted on a redundant production system at WUSTL | ~99%, hosted on a redundant production system at NCI CBIIT |
|
|
|
|
| ~99%, hosted on a production server at Kitware | ~99.9%, hosted on a production server at OSA |
| ~99%, hosted on a production server at Kitware |
|
Project- or Collection- based groupings? | Yes | Yes | Yes | Yes |
|
| Yes | Yes | Yes |
| Yes |
|
Size of Current Volume | 7.3TB | ~2TB | ~7.5TB |
|
|
|
| ~2GB | ~50GB |
| ~60GB |
|
Number of patients/subjects with imaging | ~30,000 |
|
|
|
|
|
|
|
|
|
| 200 |
number of DICOM Tags query-able | ~90 via NBIA | ~90 | ~90 |
|
|
|
|
|
|
|
|
|
Data submission/download methods | Submission via DICOM or HTTPS protocols using CTP. Download via Java Webstart client | Submission via DICOM or HTTPS protocols using CTP. Download via Web (zip), FTP, Java Webstart client | Submission via DICOM or HTTPS protocols. Download via Web (zip), FTP, Java Webstart client | Submission via Web UI or DICOM protocol. Download via Web (zip) or Java applet. |
|
|
| Submission via Web UI, | Submission via Web UI, |
| Submission via Web UI, DICOM push, MIDASDesktop, WebDAV. |
|
Helpdesk Support | Yes, the TCIA Helpesk supports both end users and submitters |
|
|
|
|
|
|
|
|
| ||
Affiliation with Journal | No | No | No | No |
|
|
|
| Yes |
|
|
|
Intended Audience(s) | Cancer researchers, engineers and developers, professors |
|
|
|
|
|
|
|
|
|
|
|
Image archive software solutions
Below is a list of image archive solutions that can be deployed by interested parties wishing to build their own DICOM based biomedical image archive. This list omits some of the archives above in cases where we could not find any information about how one might download and deploy their own instance of the software.
| |||
---|---|---|---|
Interface/GUI | Web | Web | Web/Desktop Application |
Query types/flexibility | Simple (9 parameters), Advanced (10 more parameters), Dynamic (boolean query of up to 90 DICOM tags) | Limited subset of DICOM tags out of the box but is highly configurable for adding the ability to query on just about any kind of meta data you wish to provide | Customizable, search by any tags registered in the system |
Role Based Security | Yes | Yes | Yes |
Public access option (no login req) | Yes | Yes | Yes |
Active Development | Yes, NCI CBIIT | Yes, WUSTL Neuroinformatics Research Group | Yes, Kitware |
License | Open source - NBIA License Agreement Details | Non-restrictive (BSD) open-source license - XNAT License Agreement Details | non-restrictive (BSD) open-source license |
Supports Federated Implementation | Yes, can discover other nodes on the caGrid | Not currently, but there are plans to add this functionality eventually | No |
API available | Yes, caGrid | Yes, REST | Yes, REST, OAI-PMH |
Supported image formats | DICOM | Automated import of DICOM and ECAT. Custom importers can be implemented for other formats. Any file type can be uploaded through the API and web interface. | DICOM and other ITK-based format |
Supported metadata formats | XML, Zip | XML | XML |
Transfer protocols (import/export) | DICOM, HTTPS | DICOM, HTTPS | DICOM, HTTPS |
Controlled Vocabulary | Follows caBIG standards (caDSR/EVS) | XNAT Schema | NIH Mesh and Dublin Core |
Deployment Support | Yes, CBIIT Application Support or via NBIA User Listserv | XNAT Google Discussion group, monthly developer tcons, biannual user conference | Yes, MIDAS mailing list |
Support Operating Systems | Linux, Windows, Mac | Linux, Windows, Mac | Linux, Windows, Mac |
Data submission options | Submission to NBIA is performed by a java tool called CTP developed by John Perry at the RSNA. CTP has options to import data from a hard drive or directly from a PACS or DICOM Workstation. | Direct upload is available through the web UI, direct DICOM transfer, scripts using REST API. | Direct upload via web UI, direct DICOM transfer via push, MIDASDesktop transfer (includes command line tools), WebDAV support. |
Standard of De-Identification | Incorporates DICOM de-identification standards from The Attribute Confidentiality Profile (DICOM PS 3.15: Appendix E) via CTP. | Built-in de-identification language based on DICOM Browser can be configured to comply with DICOM PS 3.15: Appendix E and other standards. | No, but pre-storage filters can be run automatically |
Support for multi-site submissions | Yes | Yes | Yes |
NBIA
https://gforge.nci.nih.gov/frs/?group_id=312
Xnat
MIDAS
http://www.kitware.com/products/midas.html
Additional Details
caBIG
NBIA
- Open source, federated model for grid based image sharing.
- Allows for "nodes" to be deployed in other institutions, which can then be connected via the grid so they are query-able from any NBIA server.
- Includes both public data access without login as well as option to build secure role-based Collections which require special access via logging in
- Simple search query options include node selection, modality, contrast enhancement, anatomical site, image slice thickness, Collections (pre-defined sets of related images), the date the images were made available, and whether or not they have associated annotations
- Advanced and Dynamic search options available to search upwards of 70 additional DICOM tag attributes
- Images can be viewed on the web site via JPG thumbnails or using the Cine functionality
- Allows for storage of annotation files and meta data, though there are plans to move most of this functionality to AIM data services in the near future
- Downloads are provided in 3 formats, either via a Java webstart client (no size requirement issues, images are not zipped) or HTTP (zipped, for downloads <3GB in size) or an emailed link to an FTP site (zipped, for downloads >3GB in size)
- Currently 3 major nodes hosted available to the public:
- The Cancer Imaging Archive (TCIA)
- http://cancerimagingarchive.net
- Funded by the Cancer Imaging Program within the Division of Cancer Treatment and Diagnosis, this NBIA installation is being hosted by Washington University in St. Louis
- Publicly launched in June, 2011 with 3 purpose-built Collections (see full Collection list)
- Approximately 230GB of image data on over 1,200 patients (in publicly accessible Collections) as of July 2011
- TCIA provides full curation support staff to review all incoming data to ensure proper de-identification by utilizing the newly released DICOM Supplement 142 standard for preventing distribution of PHI or overly aggressive methods which render data useless to researchers
- CIP/CBIIT
- http://imaging.nci.nih.gov
- Contains ~25 purpose built image collections, ~15 of which are publicly accessible to anyone - more info on the contents of each collection can be found at https://wiki.nci.nih.gov/x/lRWy
- Collections range in size from as small as a handful of patients up to larger collections such as LIDC-IDRI (1010 patients) or the colonoscopy collections (over 800 patients each)
- Approximately 2 TB of image data consisting of nearly 3,000 patients (in publicly accessible Collections) as of July 2011
- NIAMS
- https://niams-imaging.nci.nih.gov/
- Contains approximately 7.5TB of image data on nearly 5,000 osteoarthritis patients as of July 2011
- Modalities include MRI and CR
- The Cancer Imaging Archive (TCIA)
LONI
LONI homepage: http://www.loni.ucla.edu/
Image Data Archive - http://www.loni.ucla.edu/Research/Databases/
- Uses LONI Image Data Archive, not sure if this can be deployed and hosted elsewhere for other archives if desired. Did not immediately notice any place to download the code.
- Requires secure login, has role based access to various Projects in their archive.
- Allows search based on project (prior to the simple query page) via Simple or Advanced query. Simple query contains subject ID, modality, research group, series description, weighting, sex, age, slice thickness, and acquisition plane. Advanced query offers a wide variety of 27 different search criteria pertaining to the patient, project, clinical info, study info, and image attributes. (Note: advanced query only seems to be an option with ADNI data set)
- Viewing images in the web site launches "LONI Image Viewer", a java based viewer which lets you scan through image slices in axial/sagittal/coronal orientations as well as zoom, pan, flip (horizontal/vertical) and adjust brightness and contrast.
- Can add specific series/studies to custom collections, collections can then be downloaded or shared. Download can be in native DICOM format or in NiFTI/ANALYZE/MINC formats? I've heard of NiFTI, not sure what the other options are. Download is managed through a java applet, came with images and a couple XML files containing some patient and equipment metadata.
- Data collections include:
- Alzheimer's Disease Neuroimaging Initiative (ADNI) - Consists of both image data (MRI and PET), image meta-data, and additional clinical/genetic/numeric summary data for 895 patients
- BRIN - Logged in with my "ADNI (GUEST)" access this Project only appeared to have one 3 year old patient, no description of this data set is on the "Available Data" page
- Cryosection Imaging (CRYO) - 3 patients, histology modality
- Public Anonymized Dataset (PAD) - 3 "normal control" patients, MRI modality
- International Consortium for Brain Mapping (ICBM) - 851 "normal control" patients, modalities including MRI, fMRI, MRA, DTI, and PET. I do not have access to this data set currently so did not review its contents.
- Australian Imaging Biomarkers & Lifestyle Flagship Study of Ageing (AIBL) - 285 patients including controls, MCI, and AD subject groups. Modalities include MRI and PET. I do not have access to this data set currently so did not review its contents.
BIRN
BIRN data portal: http://www.birncommunity.org/resources/data/
BIRN is connected with multiple institutions which host multiple archives using different software and containing different data sets.
- Function BIRN Data Repository - http://fbirnbdr.nbirn.net:8080/BDR/
- Contains 5 data sets according to the BIRN data page (but it's actually 4).
- On the home page it displays a description of each data set and then presents pre-packaged data files in a drill down tree style menu which lists what appears to be individual patients in some cases, individual image studies in others.
- Querying across the data is possible using meta data. It does not appear query against DICOM tags is possible.
- The site itself seems to be a custom web site, not something that could be easily used for other archives.
- Downloads happen in the browser. You choose what you want to download, it is processed on the server and there is an alert that notifies you when your "job" is ready for download. Testing a 100mb patient download I was notified pretty quickly that my job was done, but getting the job page to load took about a minute before I was presented with a small table of information and buttons to actually download the file. Pressing the download button resulted in saving a .TAR file to my computer. TAR structure contained many subdirectories sorted by study then series and so on. Images contained DCM file extension and there were also a couple XML files included with some meta-data.
- The data sets include the following (but the last one is not yet posted, may never be since this was done in 2007):
- BrainScape Resting State fMRI Dataset 1 - This dataset includes seventeen healthy subjects with four resting state fixation scans plus one T1 scan and one T2 scan.
- BrainScape Resting State fMRI Dataset 2 - This dataset includes ten healthy subjects scanned 3 times with 3 conditions: eyes open, eyes closed, and fixating in addition to two anatomical scans (T1 and T2).
- Neuroimaging Calibration Study (Phase II) - The FBIRN multi-site dataset of subjects with schizophrenia and controls includes functional MRI images, behavioral data, demographic, and clinical assessments on 253 subjects from around the US.
- Traveling subjects study (Phase I) - This dataset includes five healthy subjects imaged twice at each of ten FBIRN MRI scanners on successive days.
- NIRL Imaging Database - http://nirlarc.duhs.duke.edu/nirle/
- Uses XNAT for the archive software (unsure what version or if any major customizations were made)
- Contains both image data and meta-data
- Includes the following data sets:
- Efficient Longitudinal Upload of Depression in the Elderly (ELUDE) - The ELUDE dataset is a longitudinal study of late-life depression at Duke University. There are 281 depressed subjects and 154 controls included (435 total patients). An MR scan of each subject was obtained every 2 years for up to 8 years (total of 1093 scans). Clinical assessments occurred more frequently and consists of a battery of psychiatric tests including several depression-specific tests.
- Multisite Imaging Research In the Analysis of Depression (MIRIAD) - A multiple institution study of structural MRI, including raw PD and T2 MRIs and derived measures of white matter changes, basal ganglia and other regions. Demographic and extensive clinical assessment data is available for each case. (100 patients)
- XNAT Central- http://central.xnat.org/
- As of Aug 10, 2010 the site contains 169 projects, 2567 patients. However the quality and access of these appears to vary greatly, ranging from well curated data sets to pure garbage sets that were made by people testing how Central XNAT works.
- A subset of the collections mentioned on the BIRN data web page (http://www.birncommunity.org/resources/data/) are hosted here including the following:
- mBIRN Calib - http://central.xnat.org/app/template/XDATScreen_report_xnat_projectData.vm/search_element/xnat:projectData/search_field/xnat:projectData.ID/search_value/Calib
- This data set consists of spoiled gradient-recalled echo magnetic resonance imaging data from 5 healthy volunteers (four males and one female) scanned twice at four sites having 1.5T systems from different vendors (Siemens, GE, Marconi Medical Systems).
- OASIS - http://central.xnat.org/app/template/XDATScreen_report_xnat_projectData.vm/search_element/xnat:projectData/search_field/xnat:projectData.ID/search_value/CENTRAL_OASIS_CS
- This set consists of a cross-sectional collection of 416 human subjects aged 18 to 96, including individuals with Alzheimer's disease; T1-weighted MRI scans obtained in single scan sessions are included. Additionally, a reliability data set is included containing 20 nondemented subjects imaged on a subsequent visit within 90 days of their initial session.
- mBIRN Calib - http://central.xnat.org/app/template/XDATScreen_report_xnat_projectData.vm/search_element/xnat:projectData/search_field/xnat:projectData.ID/search_value/Calib
Insight Journal
http://www.insight-journal.org/midas/
A MIDAS based image archive which contains a number of data sets contributed from NAMIC, NLM, Kitware, the Insight Software Consortium, and more. "Communities" listed on the site include:
- NLM Imaging Methods Assessment and Reporting (IMAR)- According to the site, "The goal of the NLM's IMAR program is to provide data, methods, and computational resources for the quantitative comparison of image segmentation, registration, and computer-aided diagnosis methods." The main page says it contains 2 private/limited access data sets and 1 public data set.
- Retrospective Image Registration Evaluation (RIRE) - Supposedly limited access (perhaps the full data set is?), but seems to have 18 test cases of rigid multi modality head scans and 1 training case accessible to the public.
- Public Data Standards - Contains 5 cases of "Livers and liver tumors with expert hand segmentations", 4 cases called "Zebrafish time series", and ~110 cases under "Designed Database of MR Brain Images of Healthy Volunteers" which is accompanied by some patient demographics.
- National Alliance for Medical Image Computing (NAMIC)- Provides 10 different Collections/Communities consisting of ~13 GB of data
- Full listing of their data sets can be found in this subsection of the Insight Journal archive - http://www.insight-journal.org/midas/community/view/17
- Kitware- Provides a variety of public medical images
- The site includes 7 items and indicates that "Most items in this collection consist of DICOM scans that have been anonymized and 3D reconstructions of those DICOM data in the MetaImage format."
- Full listing found at http://www.insight-journal.org/midas/collection/view/9
- Insight Software Consortium (ISC)- "Featuring software, data, and articles of the Insight Software Journal and the ISC."
- Consists of 2 Communities, one which has ISC related papers and another which contains actual image and meta data.
- Image-Guided Surgery Toolkit (IGSTK) - Contains two image collections, one of which is "CIRS Multi-modality Phantom - CT Image" and the other being "Ultrasound 3D CIRS 57 phantom data set"
- Common Toolkit (CTK) - 5 items including what appears to be 3 cases from JHU, a "Head Axial DICOM" case, and "MRI Phantom - 6 directional acquisitions".
Lung Cancer Alliance: Give A Scan
From their homepage-
*Give A Scan* is the world's first patient-powered, publicly available archive of images and clinical data on lung cancer patients. All the data has been donated by patients in order to encourage more researchers to focus on lung cancer and to accelerate progress in the early detection, diagnosis and treatment of lung cancer which is now the leading cause of cancer death worldwide.
As of August 2010 the archive contains 9 "communities" which appear to be 9 patients with lung cancer totaling approximately 1GB of data. The site provides some meta data information about the images and clinical info about the subjects. Images are hosted in DICOM format. The archive can be browsed by Community/patient/study/series or searched by modality and other image meta data.
The archive is hosted by the Kitware image archive solution called MIDAS: http://www.kitware.com/products/midas.html
Optical Society of America (OSA)
http://midas.osa.org/midaspub/
This archive is a collection of optical images. Like the Lung Cancer Alliance archive it is also hosted using Kitware's MIDAS archive software. This archive hosts 6 top level "communities" which contain anywhere from 4 to 242 items within them. In this case it seems not all of the data is DICOM but some of it is. Three of the Communities appear to be unrelated demonstration collections not tied to OSA as they contain lesion sizing data sets of lung images.
WebMIRS
http://archive.nlm.nih.gov/proj/webmirs/
WebMIRS is the National Library of Medicine's (NLM) tool used for hosting two related datasets and related spine x-ray images which are part of the National Health and Nutrition Examination Survey (NHANES). The site requires registration and login through a java web client to view the data set. Most of the data is text based, but there are spine x-rays for some of the patients. The client allows for searching but not really browsing. The user must enter a boolean search query in order to retrieve any patient results. It does not appear that it is possible to actually download the images, rather that you can only view them in the WebMIRS client.
NA-MIC Image Gallery
National Alliance for Medical Image Computing (NA-MIC) Image Gallery: http://www.na-mic.org/publications/gallery
As of October 2010 this consisted of ~383 images. All images appear to be JPGs or similar compressed file types rather than actual DICOM. The purpose of this gallery seems to be to create a repository for images, charts, and figures referenced in publications submitted to NA-MIC's publication database (http://www.na-mic.org/publications). Images can be browsed by patient/study/series or searched by modality and a number of other image based features.
Pediatric MRI Data Repository
Part of the National Database for Autism Research (NDAR) program.
According to http://ndar.nih.gov/ndarpublicweb/aboutNDAR.go#federation-
The Pediatric MRI Data Repository will be the first in this series to be made available to ASD researchers, in the summer of 2010. At that time, investigators will be able to perform a single query in the NDAR portal to view results across multiple datasets.
The original Pediatric MRI Data Repository is located at https://nihpd.crbs.ucsd.edu/nihpd/info/index.html. Access to the data requires filling out multiple forms and faxing them to an office at NIH to receive permission. I have not yet requested access at this time to find out exactly what's in the archive, however some information about their quality control processes reveal a little about the image protocols and can be learned about here: https://nihpd.crbs.ucsd.edu/nihpd/info/quality_control.html
NIH Image Bank
The NIH Image Bank is located at http://media.nih.gov/imagebank/index.aspx
According to http://media.nih.gov/imagebank/about.aspx-
The NIH Image Bank contains images from the collections of the 27 institutes and centers that comprise the National Institutes of Health. Contents include general biomedical and science-related images, clinicians, computers, patient care-related images, microscopy images, and various exterior images.
The point of the image bank appears to be more for promotional and marketing images. I did not notice any high quality medical images of actual patients or DICOM files which might be usable for research purposes.
European Organization for Research and Treatment of Cancer (EORTC)
http://www.eortc.be/services/forms/erp/default.aspx
According to their site investigators can make requests for access to data collected as part of EORTC trials after the primary end point has been published on. I did not see any place that outlined what trials are being conducted, what trials have been completed and reached their publication of primary endpoint, or exactly what types of data are collected. However in the PDF on this page which outlines their data sharing policy in more detail it lists in the section "4.3 Data Transfer" that "data will preferentially transferred in the form of an ASCII file (with .dat extension), with associated SAS programs to load the data into SAS." I saw no mention of how they handle images in this section and thus assume they may not collect or distribute any.