NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another challenge for CTIIP with its goal of integrating data from complimentary domains is the lack of a defined standard for co-clinical and digital pathology data. Without a data standard for these domains, it is very difficult to share and leverage such data across studies and institutions. As part of the CTIIP project, the team has extended the DICOM model to co-clinical and small animal imaging. The long-term goal is to generate DICOM-compliant images for small animal research.

Within the three research domains that CTIIP intends to make available for integrative queries, only one, clinical imaging, has made some progress in terms of establishing a framework and standards for informatics solutions. Those standards include Annotation and Image Markup (AIM), which allow allows researchers to standardize annotations and markup for radiology and pathology images, and Digital Imaging and Communications in Medicine (DICOM), which is a standard for handling, storing, printing, and transmitting information in medical imaging. For pre-clinical imaging and digital pathology, there are no such standards that allow for the seamless viewing, integration, and analysis of disparate data sets to produce integrated views of the data, quantitative analysis, data integration, and research or clinical decision support systems.

As part of the DICOM Standards for Small Animal Imaging; Use of Informatics for Co-clinical Trials sub-project, the long-term goal is to generate DICOM-compliant images for small animal research. MicroAIM (µAIM) is currently in development to serve the unique needs of this domain.

µAIM is currently in development to serve the unique needs of the pre-clinical domain.

The following table presents the data that the The following table presents the data that the CTIIP team is integrating through various means. This integration relies on the expansion of software features and on the application of data standards, as described in subsequent sections of this document.

DomainData SetApplicable Standard
Clinical ImagingThe Cancer Genome Atlas (TCGA) clinical and molecular data N/A
Clinical ImagingThe Cancer Imaging Archive (TCIA) in vivo imaging dataDICOM
Pre-ClinicalSmall animal models

N/A

A standard exists but has not been adopted

(ask Ulli)

Digital PathologycaMicroscope

DICOM A standard exists is applicable but has not been adopted

(ask Ulli)

AllAnnotations and markup on imagesAIMµAIM is in development

Digital Pathology and Integrated Query System

The goal One of the goals of this foundational sub-project is to create a digital pathology image server that can accept whole slide images from multiple vendors and display them despite the proprietary formats they were created in. They is accomplished by integrating the OpenSlide

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
libraries with caMicroscope.

Using this server, which is an extended version of caMicroscope, researchers can select data from different imaging data sets and use them in image algorithms. The first data sets domains and run integrative queries on that data. Using this server, which is an extended version of caMicroscope, researchers can select data from different imaging data sets and use them in image algorithms. The first data sets that are being integrated on this image server are TCGA and TCIA.

The TCGA project is producing a comprehensive genomic characterization and analysis of 200 types of cancer and providing TCGA finalized tissue collection with matched tumor and normal tissues from 11,000 patients, allowing for the comprehensive characterization of 33 cancer types and subtypes, including 10 rare cancers, and has provided this information to the research community. TCIA and the underlying National Biomedical Image Archive (NBIA) manage well-curated, publicly-available collections of medical image data. The linkages between TCGA and TCIA are valuable to researchers who want to study diagnostic images associated with the tissue samples sequenced by TCGA. TCIA currently supports over 40 active research groups including researchers who are exploiting these linkages.

Although TCGA and TCIA comprise a rich, complementary, multi-discipline data set, they are in an infrastructure that provides limited ability to query the data. Researchers want to query both multiple databases at the same time to identify cases based on all available data types. While TCGA and TCIA are DICOM-compliant, digital pathology and co-clinical/small animal model environments do not share the same data standards or do not use them consistently.

To address these limitations, the CTIIP team is To address these limitations, the CTIIP team is developing an Integrated Query System to make it easier to analyze data from different research disciplines represented by TCGA, TCIA, and co-clinical/small animal model data. The lack of common data standards will not be a hindrance to data analysis, since the server that the unified query interface is on will accept whole slides without recoding. The unified query interface will also provide a common platform and data engine for the hosting of “pilot challenges," which are described in more detail below. Pilot challenges will advance biological and clinical research in a way that also integrates the clinical, co-clinical/small animal model, and digital pathology imaging disciplines.

Digital Pathology

Digital pathology, unlike its more mature radiographic counterpart, has yet to standardize on a single storage and transport media. In addition, each pathology-imaging vendor produces its own image management systems, making image analysis systems proprietary and not standardized. The result is that images produced on different systems cannot be analyzed via the same mechanisms. Not only does this lack of standards and the dominance of proprietary formats impact digital pathology, but it prevents digital pathology data from integrating with data from other disciplines.

The purpose of the digital pathology component of CTIIP is to support data mashups between image-derived information from TCIA and clinical and molecular metadata from TCGA. The team is using OpenSlide, a vendor-neutral C library, to extend the software of caMicroscope, a digital pathology server, to provide the infrastructure for these data mashups. The extended software will support some of the common formats adopted by whole slide vendors as well as basic image analysis algorithms. With the incorporation of common whole slide formats, caMicroscope will be able to read whole slides without recoding, which often introduces additional compression artifacts, and provide a logical bridge from proprietary pathology formats to DICOM standards. With caMicroscope's support for basic image analysis algorithms, researchers can use this tool to enable analytic and decision support using digital pathology images from TCIA and NBIA.

Data federation, a process whereby data is collected from different databases without ever copying or transferring the original data, is part of the new infrastructure as well. It will make it possible to create integrative queries using data from TCIA and TCGA. The software used to accomplish this data federation is Bindaas. Bindaas is middleware that is also used to build the back-end infrastructure of caMicroscope. The team is extending Bindaas with a data federation capability that makes it possible to query data from TCIA and TCGA.

Image annotations also require standards so that they can be read by different imaging disciplines along with the rest of the image data. caMicroscope will also be extended to include image annotation and markup features using the micro-Annotation and Image Markup (μ-AIM).

Integrated Query System

 

Digital Pathology

Digital pathology, unlike its more mature radiographic counterpart, has yet to standardize on a single storage and transport media. In addition, each pathology-imaging vendor produces its own image management system, making image analysis systems proprietary and not standardized. The result is that images produced on different systems cannot be viewed and analyzed via the same mechanisms. Not only does this lack of standards and the dominance of proprietary formats impact digital pathology, but it prevents digital pathology data from integrating with data from other disciplines.

The team is using OpenSlide, a vendor-neutral C library, to extend the software of caMicroscope, a digital pathology server. The extended software will support some of the common formats adopted by whole slide vendors as well as basic image analysis algorithms. With the incorporation of common whole slide formats, caMicroscope will be able to read whole slides without recoding, which often introduces additional compression artifacts, and provide a logical bridge from proprietary pathology formats to DICOM standards.

Image markups and annotations also require standards so that they can be read by different imaging disciplines along with the rest of the image data. caMicroscope will also be extended to include image annotation and markup features using the micro-Annotation and Image Markup (μ-AIM).

With caMicroscope's support for basic image analysis algorithms, researchers can use this tool to enable analytic and decision support using digital pathology images.

Integrated Query System

The purpose of the integrative query component of CTIIP is to support data mashups between images, image-derived information, and clinical, pre-clinical, and genomic data. Co-clinical data and clinical data such as patient information and outcome will also be accessible through the Integrated Query System.

To make data accessible and comparable, it must first be collected in a structured fashion. For example, TCGA relies on Common Data Elements, which are the standard elements that structure TCGA data. Second, data comparisons require common data vocabularies. For example, when a tumor is described in a human or an animal, one of a discrete number of approved vocabulary options must be used to describe the tumor.

Data federation, a process whereby data is collected from different databases without ever copying or transferring the original data, is part of the new infrastructure. The software used to accomplish this data federation is Bindaas. Bindaas is middleware that is also used to build the backend infrastructure of caMicroscope. The team is extending Bindaas with a data federation capability that makes it possible to query data from TCIA and TCGATo make data comparable, it must first be collected in a structured fashion. For example, TCGA relies on Common Data Elements, which are the standard elements that structure TCGA data. Second, data comparisons require common data vocabularies. For example, when a tumor is described in a human or an animal, one of a discrete number of approved vocabulary options must be used to describe the tumor.

The Integrated Query System will access multiple data types in a federated fashion, meaning that the original data will reside in independent systems. The Integrated Query System will provide an interface  scientists can use to select the data types they want to combine, or "mash up," based on their own research questions.

...

As part of the Small Animal/Co-clinical Improved DICOM Compliance and Data Integration sub-project of CTIIP, the NCI supported the development of a DICOM supplement for small animal imaging. The group of people contributing to it, Working Group 30, completed Supplement 187: Preclinical Small Animal Imaging Acquisition Context

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
, in 2015.The goal of this sub-project is to directly compare data from co-clinical animal models to real-time clinical data from TCIA and TCGA. This was accomplished by developing Supplement 187 to accommodate small animal imaging and identifying a pilot co-clinical data set to integrate with TCIA and TCGA, which is in process.

Supplement 187 Data Elements

...

The Pilot Challenges sub-project of CTIIP will make a set of integrated data from TCIA and TCGA publicly available to researchers who will participate in three complementary "pilot challenge" projects. (this only happened in the first challenge to figure out which image was from which tumor–look at Miccai ) These pilot challenges proactively address research questions that compare the decision support systems for clinical imaging, co-clinical imaging, and digital pathology. As opposed to a more rigorous "grand" challenge, each pilot challenge will function as a proof of concept to learn how to scale challenges up in the future. Each challenge will use the informatics infrastructure created in the Digital Pathology and Integrated Query System sub-project and allow participants to validate and share algorithms on a software clearinghouse platform such as HUBZero.

A team from Massachusetts General Hospital will guide the pilot challenges, using the Medical Imaging Challenge Infrastructure (MedICI), a system that supports medical imaging challenges.

...