The CSSI DCC Portal is a public repository of experiment-related information describing cancer research investigations. You can use the portal to browse, search, and access data generated through CSSI funded projects and other user uploaded data sets. This data is in ISA-Tab format, which organizes investigation, study, and assay data according to the rules in the ISA-Tab specification .   

Each data set contains three files--investigation, study, and assay--that conform to the ISA-Tab structure and naming conventions. Within this structure are fields that are standard for each type of file, though null values are allowed; that is, not every data set includes values for each field. The portal allows you to filter these fields in an interactive way so that you can visualize the data in a pie chart or list.

You can search investigations, studies, and assays using any keyword. You can download selected files, the entire data archive, or only the metadata associated with a study. You can also upload and publish your own investigation data to the portal.

The following sections provide detailed instructions on how to browse, search, and download data.

Accessing Investigation Data

Browsing Investigations

Understanding the Pie Charts

Understanding the Investigations List

Adding and Removing Fields

Exploring Investigation Details

You can continue exploring investigation data by clicking a link to investigation, study, or assay details. Links to these details are on the Browse Investigations page or on the Search Investigations page, after you search on a keyword or phrase. These details include counts of studies, assays, samples, and files. The metadata available for an investigation determines if other entities, such as sources and collections, are available for counts. From these details pages, you can visualize the structure of the investigation and download selected study files, download the full archive, and download only the metadata.

To explore investigation details

  1. Browse investigations or search investigations until you find an investigation, a study, or an assay in which you are interested.
    The search results appear.
    Example investigation as listed in search results.

  2. Click the link corresponding with the investigation, study, or assay you are interested in exploring.
    The respective investigation details, study details, or assay details page appears.

  3. You can do the following from the details pages:

Investigation Details Page

Study Details Page

The study details page shows the investigation name at the top followed by a visualization of the investigation filename, study filename and number of samples in the study, and assay filename and number of files (and total file size) in the assay. Below the visualization are links to download all or part of the investigation, its identifier, and its description. All of the icons on the page are clickable links.

Assay Details Page

The assay details page shows the study name at the top followed by a visualization of the investigation filename, study filename and number of samples in the study, and assay filename and number of files (and total file size) in the assay. Below the visualization are links to download all or part of the assay, its file name, measurement type, and technology type. In the Visualize and Select section, the relationship of the study to its processes, and its processes to its files, are depicted in clickable icons. Click any icon to further filter the investigation data and download only a selected portion of it.

Assay Details page

Visualizing and Filtering Data

Once you browse or search the CSSI DCC data sets and reach a selected investigation details, study details, or assay details page, you can continue exploring the data. The ISA-Tab format is hierarchical, with investigation components becoming more granular as you proceed down the hierarchy. The largest organizing entity is the investigation, which holds one or more studies. Each study includes one or more assays. Assays are composed of samples, which in turn are composed of protocols. Data files are often associated with a protocol.

The following diagram depicts the hierarchy, without the samples and protocols.

Structure of ISA data model as described in the text on this page.

Visualizing and Filtering Investigations

The investigation details page shows icons that represent the relationship of the investigation to its studies and assays. In the case of PSON Cell Line Genomic Characterization - mRNA, the investigation has one study and one assay.
Investigation Details page

You cannot filter data currently at the investigation level any further in the CSSI DCC Portal. You can only download the investigation's full data at this point, or start exploring its studies and assays.

Selecting Multiple Objects

If any investigation has more than one study and assay, you can select which ones you want to filter by clicking the Select Multiple Objects box.

For example, the following investigation has many studies and assays, so it displays horizontally with a slider on the side to move up and down. It also has zoom out and zoom in buttons to see more or fewer objects.

Investigation with multiple studies and assays

Select the boxes next to the objects you want to visualize and filter. If you download these objects, all of these objects will be included in the download.

Visualizing and Filtering Study Data

If you select the study in the PSON Cell Line Genomic Characterization - mRNA investigation or arrive at any other Study Details page through a search, you can visualize the study's file structure and filter on any field.

You can also select multiple objects to visualize and filter within a single investigation.

To visualize and select study data

  1. Open a Study Details page.

    An example Study Details page.

  2. In the Visualize and Select area, click one of the entities.

    The hierarchy of entities for studies according to the ISA-Tab standard is as follows, from less granular to more granular:

    Source > Protocol > Sample

    For example, click Source. Metadata for that source appears.

    Follow this same procedure if you want to filter on Protocol or Sample instead of Source.


    Metadata dialog box for the selected source.

  3. From the Source Name list, click the arrow to open the list of values. Each value is a Source Name from the study file, which in this case is s_mrna.txt.
    Source Name drop-down list within the Metadata dialog box.
  4. Click one or more values in the list to select them. Each value appears immediately below the Source Name box. To clear a selection, click the value again.
    Metadata for Source with multiple source names selected
  5. Click Filter Results.
    The Source Count reduces from the 40 original values in the unfiltered study file, s_mrna.txt, to the three selected in this procedure.
    Source Count 3, sample collection, Samples 3
  6. Options you have at this point include continuing to explore this data set, downloading the metadata of the three values you selected, downloading the selected data of the three values you selected (which includes the metadata), or clicking Clear Filters to filter a different entity.

Visualizing and Filtering Assay Data

If you select the assay in the PSON Cell Line Genomic Characterization - mRNA investigation or arrive at any other Assay Details page through a search, you can visualize the assay's file structure and filter on any field.

You can also select multiple objects to visualize and filter within a single investigation.

To visualize and select assay data

  1. Open an Assay Details page.

    A sample Assay Details page.

  2. In the Visualize and Select area, click one of the entities. Note that the width of some visualizations require you to scroll by clicking the arrows at the bottom of the page.

    The hierarchy of entities for assays according to the ISA-Tab standard is as follows, from less granular to more granular:

    Sample > Protocol > Data File

    For example, click the sample. Metadata for the sample appears.

    Follow this same procedure if you want to filter on protocol instead of sample.


    Metadata dialog box for the selected sample.

  3. From the Sample Name list, click the arrow to open the list of values. Each value is a Sample Name from the assay file, which in this case is a_mrna_transcription_profiling_nucleotide_sequencing.txt.
    Drop-down list within the Metadata dialog box.
  4. Click one or more values in the list to select them. To select multiple values, click one, wait for it to appear on the Metadata for Sample page, and then click the arrow again to select another value. To clear a selection, click that value again.
    Metadata dialog box with selected sample names.

    Some entities do not have associated metadata fields on which you can filter the study or assay. In that case, when you click the icon for that entity, you see a message letting you know that no metadata fields are available.

  5. Click Filter Results.
    The Sample Count reduces from the 39 original values in the unfiltered assay file, a_mrna_transcription_profiling_nucleotide_sequencing.txt, to the 5 selected in this procedure.
    Assay Visualization, 5 Samples Selected
  6. Options you have at this point include continuing to explore this data set, downloading the metadata of the 3 values you selected, downloading the selected data of the 3 values you selected (which includes the metadata), or clicking Clear Filters to filter a different entity.

Searching Investigation Data

You can search for investigations, studies, or assays in the CSSI DCC Portal by:

Performing a Basic Search

Searching by Keyword or Phrase

Searching ISA Tab Fields

Searching by Related Terms

Viewing Detailed Search Results

In the search results, click View next to a matching term to see where in the investigation this term occurs.

Investigation listing in search result with cell as the matching term

A detailed search results window appears. The matching term, cell, is highlighted in the metadata of each file in the ISA-Tab archive.

A detailed search results window.

To close the window, click Ok at the bottom or the x in the upper-right corner of the window.

Saving Search Parameters

Managing Saved Search Parameters

Downloading Investigation Data

After reaching an investigation entity (such as investigation, study, assay, source, protocol, sample, or data file), you can download the full data, selected metadata, or selected data associated with that entity. The full data is always associated with the investigation as a whole. All of the data currently in the portal is public.

Also note that if you selected multiple objects to visualize and filter, a download includes all of those objects.

Metadata describes the structure of the data collected in an investigation and translates to the file columns, field definitions, and placeholders that appear in a spreadsheet. It is in Investigation-Study-Assay tab-delimited format (ISA-Tab), which is based on the ISA-Tab specification .

The data files often contain image files and spreadsheets and can be a large file size.

Each download option has a button on the Investigation Details, Study Details, and Assay Details pages.

Download Full Data (13.48 GB zip) Download Selected Metadata  Download Selected Data

An investigation's full data file is always available for download. However, due to the processing resources required, downloads via the Add to Download button are currently limited to 30GB.

Downloading Full Data

Downloading Selected Metadata

An entity's metadata shows how that entity is structured. Metadata files are usually small text files, so you can download them directly to your computer. You do not need to log in before downloading metadata files.

To download selected metadata

  1. Filter a study or an assay until you reach a selection of investigation data you are interested in downloading.
  2. On the Investigation Details page, click Download Selected Metadata.
    Your browser prompts you to open or save the .zip file. Follow your browser's instructions to open or save the file.
    FireFox open or save download dialog box

    It may be useful to rename the .zip file as you save it to include the name of the investigation so that you can identify it more easily. For example, miRNA_metadata.zip.

    The archive file is in compressed format. When you download it, it may be a single compressed folder or .zip file. When you open it, you see at minimum three text files at the root of the folder or file. An example of these follow. 

       a_10290.txt

       i_10290.txt

       s_10290.txt

    The text files describe the investigation, study or studies, and assay or assays. In this example, 10290 represents the file identifier but in practice, each file identifier may be named differently, even in the same investigation. Only the a, i, or s prefix is required. If you download full data or selected data, the archive may also contain other files or folders as appropriate for the investigation; for example, images. If you download metadata, the archive only includes the a, i, and s files.

Downloading Selected Data

Downloading Large Files with Globus

Managing Your Downloads

Installing the DCC Download Manager

Using the DCC Download Manager

Using a Download Summary