NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data repositories are important tools in cancer research, providing . They provide safe and sustainable locations to store data, providing provide access to input data for meta-analyses, and allowing allow researchers to collaborate and share information across a common resource.

Frederick National Lab's Data Coordinating Center (DCC) stores and manages access to data generated in support of cancer research. Currently, the datasets in the DCC are funded or supported by the NCI's Center for Strategic Scientific Initiatives. This data is in the standard Investigation-Study-Assay tab-delimited format (ISA-TAB) format, which describes a scientific investigation, its study or studies, and each study's assay(s).

In addition to its compliance with the ISA-TAB standard, the DCC was designed according to the guiding principles of FAIR: Findable, Accessible, Interoperable, and Reusable. The team, led by Andrew Quong, interacts with the academic research community to discover and employ best practices. The team's goal is to enable repositories for emerging data types in addition to those that comply with ISA-TAB, where standards for data storage do not yet exist, like they do for genomics.

 

Big picture to wrap in: We try to do things according to FAIR - a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable principles and metadata standards. We have interactions with the academic research community to use best practices. Our goals is to enable repositories for emerging data types, where the approach to data storage is not done in a standard way like it is with genomics.

 

 

sponsors a diverse array of projects that generate datasets that vary in content and format, yet are related across certain defining characteristics or metadata. Integrated management of the datasets across all sponsored projects make the data more accessible, easily accessed, and potentially reused by the cancer research community. 

The problem is that repositories are often not flexible enough to store data that do not conform to known standards. Genomics, for example, benefits from community genomics standards and open industry standards. Many other fields of study are without such standards, yet generate significant amounts of data. Lacking standards prohibits these data sets from being more accessible, easily retrieved, and potentially reused by the research community.

FNLCR supports the Center for Strategic Scientific Initiatives (CSSI), an innovations center within NCI, by managing science and technology initiatives to support development programs critical to nearly all cancer and biomedical research programs in new higher risk areas, including data management, high definition single cell analysis, immuno-mass cytometry, clinical proteomics, and antibody characterization that may question existing paradigms and lead to hypothesis testing. A knowledgeable team of software developers in DSITP, led by Uma Mudunuri and managed by Andrew Quong, have developed the CSSI DCC Portal, a repository for CSSI DCC data and other emerging data types without a standard approach to data storage. Experimental details or metadata of the data currently in the DCC conform to the standard InvestigationThe CSSI Data Coordinating Center (CSSI DCC) stores and manages access to data generated in support of cancer research funded or supported by the CSSI. This data is in the standard Investigation-Study-Assay tab-delimited formatformat (ISA-TAB) format, which describes a scientific investigation, its study or studies, and each study's assay(s). For more information on the ISA-

The primary goals of the FNLCR CSSI Data Coordination Center (DCC) are to facilitate access to research data for the greater cancer research community.  Facilitating this is the DCC's design approach, which follows FAIR (Findable, Accessible, Interoperable, and Reusable) principles for scientific data management and stewardship, applies metadata standards, and involves interactions with the research community to seek out and apply their best practices.

 

The CSSI DCC Portal is powerful and TAB format, refer to the following section, What is ISA-TAB?, as well as the ISA-TAB specification Exit Disclaimer logoImage Removed .The CSSI DCC Portal is the repository for CSSI DCC data. It serves the following purposes:

  • Provides a common location and web access to data from disparate data types including gene expression results from Next Generation Sequencing, microarray experiments, histopathological images, metabolomics data, and proteomics data, allowing for easy access by multiple collaborators and researchers located at different geographic locations. Is It is flexible enough to handle new and unspecified data types.
  • Provides a mechanism to identify and search for data sets stored at different locations but generated through the same project; for example, clinical and genomics data sets from a project might be deposited to a database of Genotypes and Phenotypes (dbGaP) or Genomics Data Commons (GDC), imaging files might be located at The Cancer Imaging Archives (TCIA), and cell motility information might be uploaded to the CSSI DCC.
  • Supports storing Stores the data in one common location so that you can make biological insights that would otherwise be missed by having data in multiple locations.
  • Applies the information gained from one study to multiple studies and projects.
  • Allows you to search the metadata from each study to identify datasets of interest.
  • Develops data storage and data mining modules that can be applied across studiesprojects, avoiding duplication of effort and saving costs.
  • Develops and/or adopts common vocabularies, data standards, and ontologies for data representation, storage, and comparison.

See Neo4j - Rapidly Prototyping a Semantic Graph for more information about technologies used in CSSI DCC.

 The CSSI DCC Portal project team includes:

MemberRole

Uma Mudunuri

IT Lead

Paul Donovan

Team Lead/Solutions Architect

David Mott

Developer

Mahesh Yelisetti

QA Analyst

Rajani Kuchipudi

QA Analyst

Paul Aiyetan

Bioinformatician

Carolyn Klinger

Technical Documentation

Deb Hope

Subject Matter Expert

Corinne Zeitler

Project Manager

Andrew Quong

Project Director