Insite Article Draft

Data repositories are important tools in cancer research. They provide safe and sustainable locations to store data, provide access to input data for meta-analyses, and allow researchers to collaborate and share information across a common resource.

The problem is that repositories are often not flexible enough to store data that do not conform to known standards. Genomics, for example, benefits from community genomics standards groups and open industry standards. Many other fields of study are without such standards, yet generate significant amounts of data, without a standard way to store it that would make the data more accessible, easily accessed, and potentially reused by the research community.

The Center for Strategic Scientific Initiatives Data Coordinating Center (CSSI DCC) stores and manages access to data generated in support of cancer research funded or supported by the CSSI. The Frederick National Lab team, led by Andrew Quong (see team members below), developed the CSSI DCC Portal, a repository for CSSI DCC data and also emerging data types without a standard approach to data storage. The data currently in the DCC conform to the standard Investigation-Study-Assay tab-delimited format (ISA-TAB) format, which describes a scientific investigation, its study or studies, and each study's assay(s).

The DCC's goal is to use the CSSI DCC Portal to store emerging data types in addition to ISA-TAB. Facilitating this is the DCC's design approach, which follows FAIR (Findable, Accessible, Interoperable, and Reusable) principles for scientific data management and stewardship, applied metadata standards, and involved interactions with the research community to seek out and apply their best practices.

The CSSI DCC Portal is powerful and serves the following purposes:

Provides a common location and web access to data from disparate data types including gene expression results from Next Generation Sequencing, microarray experiments, histopathological images, metabolomics data and proteomics data, allowing for easy access by multiple collaborators and researchers located at different geographic locations. Is flexible enough to handle new and unspecified data types.
Stores the data in one common location so that you can make biological insights that would otherwise be missed by having data in multiple locations.
Applies the information gained from one study to multiple studies and projects.
Allows you to search the metadata from each study to identify datasets of interest.
Develops data storage and data mining modules that can be applied across studies, avoiding duplication of effort and saving costs.
Develops and/or adopts common vocabularies, data standards, and ontologies for data representation, storage, and comparison.

See Neo4j - Rapidly Prototyping a Semantic Graph for more information about technologies used in CSSI DCC.

The CSSI DCC Portal project team includes:

Member	Role
Andrew Quong	?
Corinne Zeitler	Project manager
Uma Mudunuri	Technical lead
Paul Donovan	Lead software developer
Deb Hope	Data ?
Mahesh Yelisetti	QA lead
Rajani Kuchipudi	QA analyst
Carolyn Klinger	Technical writer
David Mott	?

(Do you want to say something about the response to the Portal thus far or future uses of it?)

Content

Space Tools