NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As discussed so far, cancer research is needed across domains. To serve this need, the National Cancer Institute Clinical and Translational Imaging Informatics Project (NCI CTIIP) team plans to meet it by creating a data mashup interface, along with other software and standards, that accesses The Cancer Genome Atlas (TCGA) clinical and molecular data, The Cancer Imaging Archive (TCIA) in-vivo imaging data, caMicroscope pathology data, a pilot data set of animal model data, and relevant imaging annotation and markup data.

The TCGA project is producing a comprehensive genomic characterization and analysis of 200 types of cancer and providing this information to the research community. TCIA and the underlying National Biomedical Image Archive (NBIA) manage well-curated, publicly-available collections of medical image data. The linkages between TCGA and TCIA are valuable to researchers who want to study diagnostic images associated with the tissue samples sequenced by TCGA.

Although TCGA and TCIA comprise a rich, multi-domain data set, they are in an infrastructure that provides limited query capability. Researchers want to query both databases together to identify cases based on all available data types. Moreover, without common data standards, it's impossible to integrate these data sets with those from digital pathology and co-clinical/small animal model environments.

To address these limitations, the CTIIP team is developing a unified query interface to make it easier to analyze data from different research domains. This interface, plus related open-source software and data standards, would then be applied to co-clinical, small animal model data, and provide a common platform and data engine for the hosting of “pilot challenges.” These pilot challenges will proactively facilitate biological and clinical research across the clinical, pre-clinical, and digital pathology imaging research domains.

The common informatics infrastructure will provide researchers with analysis tools they can use to directly mine data from multiple high-volume information repositories, creating a foundation for research and decision support systems to better diagnose and treat patients with cancer.

CTIIP is composed of the following sub-projects. Each project is discussed on this page.

Sub-Project NameDescription
Digital Pathology and Integrated Query SystemAddress the interoperability of digital pathology data, improve integration and analytic capabilities between TCIA and TCGA, and raise the level of interoperability to create the foundation required for pilot demonstration projects in each of the targeted research domains: clinical imaging, pre-clinical imaging, and digital pathology imaging.
DICOM Standards for Small Animal Imaging; Use of Informatics for Co-clinical TrialsAddress the need for standards in pre-clinical imaging and test the informatics created in the Digital Pathology and Integrated Query System sub-project for decision support in co-clinical trials.
Pilot ChallengesChallenges will be designed to develop knowledge-extraction tools and compare decision-support systems for the three research domains, which will now be represented as a set of integrated data from TCIA and TCGA. The pilot challenges would use limited data sets for proof-of-concept, and test the informatics infrastructure needed for more rigorous “Grand Challenges” that could later be scaled up and supported by extramural initiatives.

The Importance of Data Standards

The common infrastructure that will result from CTIIP and its sub-projects depends on data interoperability, which is greatly aided by adherence to data standards. While image data standards exist to support communicating image data in a common way, the data standards that do exist for image data are inconsistently adopted. One reason for the lack of uniform adoption is that vendors of image management tools required for the analysis of imaging data have created these tools so that they only accept proprietary data formats. Researchers then make sure their data can be interpreted by these tools. The result is that images produced on different systems cannot be analyzed via the same mechanisms.

Another challenge for CTIIP with its goal of integrating data from complimentary domains is the lack of a defined standard for co-clinical and digital pathology data. Without a data standard for these domains, it is very difficult to share and leverage such data across studies and institutions. As part of the CTIIP project, the team will extend the DICOM model to co-clinical and small animal imaging.

NCI CBIIT has worked extensively for several years in the area of data standards for both clinical research and healthcare, working with the community and Standards Development Organizations (SDOs), such as the Clinical Data Interchange Standards Consortium (CDISC), Health Level 7 (HL7) and the International Organization for Standardization (ISO). From that work, Enterprise Vocabulary Services (EVS) and Cancer Data Standards Registry and Repository (caDSR) are harmonized with the Biomedical Research Integrated Domain Group (BRIDG), Study Data Tabulation Model (SDTM), and Health Level Seven® Reference Information Model HL7 RIM models. Standardized Case Report Forms (CRFs), including those for imaging, have also been created. The CBIIT project work provides the bioinformatics foundation for semantic interoperability in digital pathology and co-clinical trials integrated with clinical and patient demographic data and data contained in TCIA and TCGA.

Within the three research domains that CTIIP intends to make available for integrative queries, only one, clinical imaging, has made some progress in terms of establishing a framework and standards for informatics solutions. Those standards include Annotation and Image Markup (AIM), which allow researchers to standardize annotations and markup for radiology and pathology images, and Digital Imaging and Communications in Medicine (DICOM), which is a standard for handling, storing, printing, and transmitting information in medical imaging. For pre-clinical imaging and digital pathology, there are no such standards that allow for the seamless viewing, integration, and analysis of disparate data sets to produce integrated views of the data, quantitative analysis, data integration, and research or clinical decision support systems.

As part of the DICOM Standards for Small Animal Imaging; Use of Informatics for Co-clinical Trials sub-project, the long-term goal is to generate DICOM-compliant images for small animal research. Micro AIM (µAIM) is currently in development to serve the unique needs of this domain.

The following table presents the data that the CTIIP team is integrating through various means. This integration relies on the expansion of software features and on the application of data standards, as described in subsequent sections of this document.

DomainData Set
Clinical ImagingThe Cancer Genome Atlas (TCGA) clinical and molecular data
Clinical ImagingThe Cancer Imaging Archive (TCIA) in vivo imaging data
Pre-ClinicalSmall animal models
Digital PathologycaMicroscope

Digital Pathology and Integrated Query System

common informatics infrastructure that will result from this project will provide researchers with analysis tools they can use to directly mine data from multiple high-volume information repositories, creating a foundation for research and decision support systems to better diagnose and treat patients with cancer.

CTIIP is composed of the following sub-projects. Each project is discussed on this page.

Sub-Project NameDescription
Digital Pathology and Integrated Query SystemAddress the interoperability of digital pathology data, improve integration and analytic capabilities between TCIA and TCGA, and raise the level of interoperability to create the foundation required for pilot demonstration projects in each of the targeted research domains: clinical imaging, pre-clinical imaging, and digital pathology imaging.
DICOM Standards for Small Animal Imaging; Use of Informatics for Co-clinical TrialsAddress the need for standards in pre-clinical imaging and test the informatics created in the Digital Pathology and Integrated Query System sub-project for decision support in co-clinical trials.
Pilot ChallengesChallenges will be designed to develop knowledge-extraction tools and compare decision-support systems for the three research domains, which will now be represented as a set of integrated data from TCIA and TCGA. The pilot challenges would use limited data sets for proof-of-concept, and test the informatics infrastructure needed for more rigorous “Grand Challenges” that could later be scaled up and supported by extramural initiatives.

The Importance of Data Standards

The common infrastructure that will result from CTIIP and its sub-projects depends on data interoperability, which is greatly aided by adherence to data standards. While image data standards exist to support communicating image data in a common way, the data standards that do exist for image data are inconsistently adopted. One reason for the lack of uniform adoption is that vendors of image management tools required for the analysis of imaging data have created these tools so that they only accept proprietary data formats. Researchers then make sure their data can be interpreted by these tools. The result is that images produced on different systems cannot be analyzed via the same mechanisms.

Another challenge for CTIIP with its goal of integrating data from complimentary domains is the lack of a defined standard for co-clinical and digital pathology data. Without a data standard for these domains, it is very difficult to share and leverage such data across studies and institutions. As part of the CTIIP project, the team will extend the DICOM model to co-clinical and small animal imaging.

NCI CBIIT has worked extensively for several years in the area of data standards for both clinical research and healthcare, working with the community and Standards Development Organizations (SDOs), such as the Clinical Data Interchange Standards Consortium (CDISC), Health Level 7 (HL7) and the International Organization for Standardization (ISO). From that work, Enterprise Vocabulary Services (EVS) and Cancer Data Standards Registry and Repository (caDSR) are harmonized with the Biomedical Research Integrated Domain Group (BRIDG), Study Data Tabulation Model (SDTM), and Health Level Seven® Reference Information Model HL7 RIM models. Standardized Case Report Forms (CRFs), including those for imaging, have also been created. The CBIIT project work provides the bioinformatics foundation for semantic interoperability in digital pathology and co-clinical trials integrated with clinical and patient demographic data and data contained in TCIA and TCGA.

Within the three research domains that CTIIP intends to make available for integrative queries, only one, clinical imaging, has made some progress in terms of establishing a framework and standards for informatics solutions. Those standards include Annotation and Image Markup (AIM), which allow researchers to standardize annotations and markup for radiology and pathology images, and Digital Imaging and Communications in Medicine (DICOM), which is a standard for handling, storing, printing, and transmitting information in medical imaging. For pre-clinical imaging and digital pathology, there are no such standards that allow for the seamless viewing, integration, and analysis of disparate data sets to produce integrated views of the data, quantitative analysis, data integration, and research or clinical decision support systems.

As part of the DICOM Standards for Small Animal Imaging; Use of Informatics for Co-clinical Trials sub-project, the long-term goal is to generate DICOM-compliant images for small animal research. Micro AIM (µAIM) is currently in development to serve the unique needs of this domain.

The following table presents the data that the CTIIP team is integrating through various means. This integration relies on the expansion of software features and on the application of data standards, as described in subsequent sections of this document.

DomainData Set
Clinical ImagingThe Cancer Genome Atlas (TCGA) clinical and molecular data
Clinical ImagingThe Cancer Imaging Archive (TCIA) in vivo imaging data
Pre-ClinicalSmall animal models
Digital PathologycaMicroscope

Digital Pathology and Integrated Query System

The goal of this sub-project is to create a digital pathology image server that can accept images from multiple domains and run integrative queries on that data. Using this server, data can be selected from distinct imaging sources and made accessible for image algorithms.

The TCGA project is producing a comprehensive genomic characterization and analysis of 200 types of cancer and providing this information to the research community. TCIA and the underlying National Biomedical Image Archive (NBIA) manage well-curated, publicly-available collections of medical image data. The linkages between TCGA and TCIA are valuable to researchers who want to study diagnostic images associated with the tissue samples sequenced by TCGA.

Although TCGA and TCIA comprise a rich, multi-domain data set, they are in an infrastructure that provides limited query capability. Researchers want to query both databases together to identify cases based on all available data types. Moreover, without common data standards, it's impossible to integrate these data sets with those from digital pathology and co-clinical/small animal model environments.

To address these limitations, the CTIIP team is developing a unified query interface to make it easier to analyze data from different research domains. This interface, plus related open-source software and data standards, would then be applied to co-clinical, small animal model data, and provide a common platform and data engine for the hosting of “pilot challenges.” These pilot challenges will proactively facilitate biological and clinical research across the clinical, pre-clinical, and digital pathology imaging research domainsThe goal of this sub-project is to create a digital pathology image server that can accept images from multiple domains and run integrative queries on that data. Using this server, data can be selected from distinct imaging sources and made accessible for image algorithms.

TCIA has released an Application Programmatic Interface that provides a REST API to TCIA metadata and image collections. This API is built using a middleware platform called Bindaas.

...