NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data repositories are important tools in cancer research, providing safe and sustainable locations to store data, providing access to input data for meta-analyses, and allowing researchers to collaborate and share information across a common resource.

The problem is that scientists are generating data...or repositories are often not flexible enough to store data that do not conform to known standards. Genomics, for example, benefits from community genomics standards groups that develop standard programmatic interfaces for managing, describing, and annotating genomic data (attribution: https://gdc.cancer.gov/about-data/data-standards). Emerging data types such as...do not yet have data storage standards.

Frederick National Lab's Data Coordinating Center (DCC) stores and manages access to data generated in support of cancer research and is supported by the NCI's Center for Strategic Scientific Initiatives. The datasets data currently in the DCC are conform to the standard Investigation-Study-Assay tab-delimited format (ISA-TAB) format, which describes a scientific investigation, its study or studies, and each study's assay(s).

The DCC's goal is to store emerging data types in addition to those that comply with ISA-TAB. Emerging data types include those for which standards for data storage do not yet exist like they do for genomics. To enable this, the The DCC was designed according to the guiding principles of FAIR: Findable, Accessible, Interoperable, and Reusable, metadata standards such as ISA-TAB, and best practices of the academic cancer research community, 

The Frederick National Lab DCC team, led by Andrew Quong

 

 

 

 

sponsors a diverse array of projects that generate datasets that vary in content and format, yet are related across certain defining characteristics or metadata. Integrated management of the datasets across all sponsored projects make the data more accessible, easily accessed, and potentially reused by the cancer research community. 

The CSSI Data Coordinating Center (CSSI DCC) stores and manages access to data generated in support of cancer research funded or supported by the CSSI. This data is in the standard Investigation-Study-Assay tab-delimited format (ISA-TAB) format, which describes a scientific investigation, its study or studies, and each study's assay(s). For more information on the ISA-TAB format, refer to the following section, What is ISA-TAB?, as well as the ISA-TAB specification Exit Disclaimer logoImage Removed .

The CSSI DCC Portal is the , developed the CSSI DCC Portal, the repository for CSSI DCC data. It serves the following purposes:

  • Provides a common location and web access to data from disparate data types including gene expression results from Next Generation Sequencing, microarray experiments, histopathological images, metabolomics data and proteomics data, allowing for easy access by multiple collaborators and researchers located at different geographic locations. Is flexible enough to handle new and unspecified data types.
  • Stores the data in one common location so that you can make biological insights that would otherwise be missed by having data in multiple locations.
  • Applies the information gained from one study to multiple studies and projects.
  • Allows you to search the metadata from each study to identify datasets of interest.
  • Develops data storage and data mining modules that can be applied across studies, avoiding duplication of effort and saving costs.
  • Develops and/or adopts common vocabularies, data standards, and ontologies for data representation, storage, and comparison.