NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 171 Next »

Contents of this Page

The CSSI DCC Portal is a public repository of experiment-related information describing cancer research investigations. You can use the portal to browse, search, and access data generated through CSSI funded projects and other user uploaded data sets. This data is in ISA-Tab format, which organizes investigation, study, and assay data according to the rules in the ISA-Tab specification Exit Disclaimer logo .   

Each data set contains three files--investigation, study, and assay--that conform to the ISA-Tab structure and naming conventions. Within this structure are fields that are standard for each type of file, though null values are allowed; that is, not every data set includes values for each field. The portal allows you to filter these fields in an interactive way so that you can visualize the data in a pie chart or list.

You can also search investigations, studies, and assays using any keyword. You can download selected files, the entire data archive, or only the metadata associated with a study.

This chapter provides detailed instructions on how to browse, search, and download data.

Browsing Investigations

You can browse and explore investigation data contained in investigation, study, and assay files.

To browse investigations

  • From the CSSI Data Portal home page, click the Browse button or select Investigations > Browse.
    Browse button
    The Browse Investigations page appears, showing a list of all investigations currently included in this release of the CSSI DCC Portal, and pie charts at the top showing fields from the investigations. Note that you can control how many investigations appear in the list by selecting a value from the Max Display box.

    A subset of the page appears in the following screenshot.

    Browse Investigations page showing filters and the CTCs and PSON Cell Line Genomic Characterization - Exome investigations

Understanding the Pie Charts and Investigations List

The Browse Investigation page has two interactive components:

  • Pie charts that show fields from the investigations. When you first open the page or reset it to clear all selections, three pie charts appear at the top. These pie charts represent three of the fields, Study Protocol Name, Study Assay Technology Type, and Study Assay Measurement Type, that occur in the metadata of the 8 investigations currently in the portal (as of January 2017). The values for those fields, as well as a count for each value, appear in the pie charts. The count represents the number of times that field value occurs in all of the investigations currently in the portal.

    For example, in the default pie charts represented in the above screenshot, the Study Assay Measurement Type pie chart shows that 1 investigation used genome sequencing, 6 used imaging assay, and 2  used transcription profiling. Note that there are 9 field values and only 8 investigations because 1 investigation listed 2 values for Study Assay Measurement Type. You can determine which investigation that was by clicking on each pie slice and reviewing the details in the list below.

    Click one or more field values in the pie charts ("slices") to filter the data by that/those field value(s). The more field values you select, in one or more pie charts, the more narrowly you filter the investigation data and the fewer investigations match your selections. The investigation list refreshes each time you filter the data in this way. You can also customize which fields appear in the pie charts and how many pie charts appear by adding and removing fields.

    In the following screenshot, a user has selected at least one field value in each pie chart. The selected values are Scaning in the Study Protocol Name chart, CNV analysis in the Study Assay Measurement Type chart, and Next Generation Sequencing in the Study Assay Technology Type chart. Only one investigation matches all of these selections and appears in the list below. To reduce the amount of filtering, you can click Reset All to return to the default pie charts, or reset on an individual pie chart. You can also return to the default view by selecting Investigations > Browse.

    Browse Investigations page with values selected in each pie chart
  • A list of investigation details below the pie charts. The Investigations list shows details associated with the investigations that match your pie chart selections. For example, in the following screenshot, a user has selected one field value in the Study Protocol Name pie chart: plate cells. The number 4 next to the label plate cells means that four investigations are associated with the Study Protocol Name field value of plate cells. Those four have a null value for Study Assay Technology Type, which is the second pie chart, and all use the same Study Assay Measurement Type of imaging assay.

    The fields in the Investigations list match the pie chart selections. The list includes only those 4 investigations that have a Study Protocol Name field value of plate cells. Details for each investigation in the list includes the same fields represented by the pie charts: Study Protocol Name, Study Assay Technology Type, and Study Assay Measurement Type, plus the study name and description. The list also shows additional data about each investigation, such as the other keywords used in the Study Protocol Name field. You can return to the default (full) list at anytime by selecting Investigations > Browse.

    If you add or remove fields from the pie charts, the Investigations list fields immediately reflect the same changes.

    Browse Investigations page

Adding and Removing Fields

You can customize which pie charts appear at the top of the page. Since the pie charts control how you filter the investigation data, you may prefer something other than, or in addition to, the three default fields of Study Protocol Name, Study Assay Technology Type, and Study Assay Measurement Type.

You can add and remove fields

To add or remove fields

  1. On the Browse Investigations page, click .
    The Select menu appears to the left of the pie charts.
    Select Menu
  2. Expand fields by clicking the plus signs. Note that in the STUDY PROTOCOLS and Study Assays section, the field values of Study Protocol Name, Study Assay Technology Type, and Study Assay Measurement Type are already selected. These field values are the ones that appear in the three default pie charts on the Browse Investigations page. If you clear these check boxes, those pie charts disappear from the page. When you reset all of the pie charts or select Browse > Investigations, however, they reappear.


    Fields available for selection

  3. Click any of the field values to select them. For example, select Study Protocol Type.
    The pie charts immediately update to include one for Study Protocol Type.
    Browse Investigation page showing four pie charts

    Correspondingly, the Investigations list updates to show Study Protocol Type in the fields available for the investigation.

    Investigations List for CTCs Investigation

    You could also opt to clear all of the check boxes except for Study Protocol Type. In this case, that is the only pie chart that would appear.

    Browse Investigation page showing only one pie chart

    The field also disappears from the Investigations list.

    Study Protocol List Field Removed from Investigations List

     
  4. When you have selected or cleared as many fields as you want from the pie charts, click Hide Select Menu.
    The Select menu moves back to its original position.

Exploring Investigation Details

You can continue exploring investigation data by clicking a link to investigation, study, or assay details. Links to these details are on the Browse Investigations page or on the Search Investigations page, after you search on a keyword or phrase. These details include counts of studies, assays, samples, and files. The metadata available for an investigation determines if other entities, such as sources and collections, are available for counts. From these details pages, you can visualize the structure of the investigation and download selected study files, download the full archive, and download only the metadata.

To explore investigation details

  1. Browse investigations or search investigations until you find an investigation, a study, or an assay in which you are interested.
    The search results appear.

  2. Click the link corresponding with the investigation, study, or assay you are interested in exploring.
    The respective investigation details, study details, or assay details page appears.

  3. You can do the following from the details pages:

Investigation Details Page

The investigation details page displays information about the investigation, in the following order:
  • The investigation name is at the top. 

  • A visualization includes the following icons and information:
    • Icon and filename for the investigation node.
    • Icon, filename, and number of samples for each study node.
    • Icon, filename, number of files, number of links (external file links), and total file size for each assay node.

    All of the icons in this visualization are clickable links.

  • Below the visualization are buttons to download all or part of the investigation.

  • Below the download buttons are the investigation identifier and its description.

The currently selected icon has a green box behind its icon in the visualization. When the investigation is selected, its studies and assays are also selected. In the following screenshot, the investigation is selected.

An example investigation details page

If an investigation has many studies or assays, the visualization is horizontal and can be zoomed in and out. For example, the following investigation has 29 study files with 10 assay files each, totaling 290 nodes. The zoom out and zoom in buttons are below the visualization, as indicated below.

The investigation details page with zoom out and zoom in buttons.

Study Details Page

The study details page shows the investigation name at the top followed by a visualization of the investigation filename, study filename and number of samples in the study, and assay filename and number of files (and total file size) in the assay. Below the visualization are links to download all or part of the investigation, its identifier, and its description. All of the icons on the page are clickable links.

Assay Details Page

The assay details page shows the study name at the top followed by a visualization of the investigation filename, study filename and number of samples in the study, and assay filename and number of files (and total file size) in the assay. Below the visualization are links to download all or part of the assay, its file name, measurement type, and technology type. In the Processes and Filters section, the relationship of the study to its processes, and its processes to its files, are depicted in clickable icons. Click any icon to further filter the investigation data and download only a selected portion of it.

Assay Details page

Visualizing and Filtering Data

Once you browse or search the CSSI DCC data sets and reach a selected investigation details, study details, or assay details page, you can continue exploring the data. The ISA-Tab format is hierarchical, with investigation components becoming more granular as you proceed down the hierarchy. The largest organizing entity is the investigation, which holds one or more studies. Each study includes one or more assays. Assays are composed of samples, which in turn are composed of protocols. Data files are often associated with a protocol.

The following image depicts the hierarchy, without the samples and protocols.

Structure of ISA data model as described in the text on this page.

Visualizing and Filtering Investigations

The investigation details page shows icons that represent the relationship of the investigation to its studies and assays. In the case of PSON Cell Line Genomic Characterization - mRNA, the investigation has one study and one assay.

You cannot filter data currently at the investigation level any further in the CSSI DCC Portal. You can only download the investigation's full data at this point, or start exploring its studies and assays.

Visualizing and Filtering Study Data

If you select the study in the PSON Cell Line Genomic Characterization - mRNA investigation or arrive at any other Study Details page through a search, you can visualize the study's file structure and filter on any field.

To visualize and select study data

  1. Open a Study Details page.

  2. In the Visualize and Select area, click one of the entities.

    Study Hierarchy

    The hierarchy of entities for studies according to the ISA-Tab standard is as follows, from less granular to more granular:

    Source > Protocol > Sample

    For example, click Source. Metadata for that source appears.

    Follow this same procedure if you want to filter on Protocol or Sample instead of Source.


  3. From the Source Name list, click the arrow to open the list of values. Each value is a Source Name from the study file, which in this case is s_mrna.txt.
  4. Click one or more values in the list to select them. Each value appears immediately below the Source Name box. To clear a selection, click the value again.
    Metadata for Source with multiple source names selected
  5. Click Filter Results.
    The Source Count reduces from the 40 original values in the unfiltered study file, s_mrna.txt, to the three selected in this procedure.
    Source Count 3, sample collection, Samples 3
  6. Options you have at this point include continuing to explore this data set, downloading the metadata of the three values you selected, downloading the selected data of the three values you selected (which includes the metadata), or clicking Clear Filters to filter a different entity.

Visualizing and Filtering Assay Data

If you select the assay in the PSON Cell Line Genomic Characterization - mRNA investigation or arrive at any other Assay Details page through a search, you can visualize the assay's file structure and filter on any field.

To visualize and select assay data

  1. Open an Assay Details page.

  2. In the Visualize and Select area, click one of the entities. Note that the width of some visualizations require you to scroll by clicking the arrows at the bottom of the page.

    Assay Hierarchy

    The hierarchy of entities for assays according to the ISA-Tab standard is as follows, from less granular to more granular:

    Sample > Protocol > Data File

    For example, click the sample. Metadata for the sample appears.

    Follow this same procedure if you want to filter on protocol instead of sample.


  3. From the Sample Name list, click the arrow to open the list of values. Each value is a Sample Name from the assay file, which in this case is a_mrna_transcription_profiling_nucleotide_sequencing.txt.
  4. Click one or more values in the list to select them. To select multiple values, click one, wait for it to appear on the Metadata for Sample page, and then click the arrow again to select another value. To clear a selection, click that value again.

    Some entities do not have associated metadata fields on which you can filter the study or assay. In that case, when you click the icon for that entity, you see a message letting you know that no metadata fields are available.

  5. Click Filter Results.
    The Sample Count reduces from the 39 original values in the unfiltered assay file, a_mrna_transcription_profiling_nucleotide_sequencing.txt, to the 5 selected in this procedure.
    Assay Visualization, 5 Samples Selected
  6. Options you have at this point include continuing to explore this data set, downloading the metadata of the 3 values you selected, downloading the selected data of the 3 values you selected (which includes the metadata), or clicking Clear Filters to filter a different entity.

Searching Investigation Data

You can search for investigations, studies, or assays in the CSSI DCC Portal by:

Search by Keyword or Phrase

You can search all investigations, studies, and assays in the CSSI DCC Portal by keyword or phrase. Search words fewer than 4 characters are treated as full words. Search words greater than 4 characters are treated as both full words and partial words, if applicable. A search looks for matches in all file columns, field definitions, and placeholders in those investigations, studies, and assays.

To search investigation data by keyword or phrase

  1. From the CSSI Data Portal home page, click the Search button or select Investigations > Search.
    Search button

    The Search Investigations page appears.
    Search Investigations
  2. In the Search box, enter a keyword or phrase.

    In the Search for box, select the context of the search. Options include Investigations, Studies, or Assays. Your search will be restricted to the context you select.

    Search results appear below the search criteria area.

    In the following example of search results, the keyword for the search is cell and the context is Investigations. All investigations that include the word cell anywhere in the investigation metadata appear in these search results.

    Search Investigations page, search results
    If the context were Studies or Assays, the results would include information specific to these components, as follows

    • Search results from a search with a context of Studies display the study name, name of the investigation the study is associated with, number of assays, number of files, and description. In these cases, the investigation name and study names are identical.

    • Search results from a search with a context of Assays display the assay filename, name of the study that the assay is associated with, name of the investigation the assay is associated with, number of files (such as image files), and description.

  3. To further narrow your results, you can filter by Investigation Public Release Date. Click Filters to show the From and To date selection boxes. Click Search. Note that if you do not enter a search term but select dates in this filter that the CSSI DCC Portal searches all investigations that match the dates you selected.
    Investigations matching your search appear on the page.

Search by Related Terms

You can search all investigations, studies, and assays in the CSSI DCC Portal by terms found in related ontologies, for example, the Ontology for Biomedical Investigations Exit Disclaimer logo and NCI Thesaurus (NCIt). A search looks for synonyms and subclasses of the keyword you entered in the ontology(ies) you selected and matches them to investigations, studies, and assays in the CSSI DCC Portal. The matching ontological terms can exist in file columns, field definitions, and placeholders of the investigations, studies, or assays.

To search investigation data by related terms

  1. From the CSSI Data Portal home page, click the Search button or select Investigations > Search.
    Search button

    The Search Investigations page appears.
    Search Investigations
  2. In the Search box, enter a keyword or phrase. Search results appear immediately, using the default context of Investigations.
  3. To change the context, from the Search for list, select Studies or Assays.
  4. To further narrow your results, you can filter by Investigation Public Release Date. Click Filters to show the From and To date selection boxes. Click Search. Note that if you do not enter a search term but select dates in this filter that the CSSI DCC Portal searches all investigations that match the dates you selected.
    Investigations matching your search appear on the page.
  5. Under Search for related terms, select Ontology for Biomedical Investigations, NCI Thesaurus, or both. By default, both are selected but if you change the selection, the search results refresh.
  6. Under Include, select either Synonyms Only or Synonyms and Subclasses.
    • Synonyms include terms in an ontology that are closely related to the keyword or phrase you entered.
    • Subclasses are lower categories associated with the ontology term.

    If any ontological synonyms or subclasses of the keyword you selected exist in the CSSI DCC Portal, those terms appear in the Synonyms or Subclasses box, respectively.

    In the following example, the keyword is neoplasm and its synonym found in one or both of the selected ontologies is tumor. The subclasses of tumor that appear in the search results appear in the Subclasses box. You can click any of those terms and refresh the search results to show only those investigations, studies, or assays that contain those terms. In the following example, only cc is selected. Note that the selected term has a solid black outline while terms that are not selected have a dotted line.

    Search Investigations page, subclass cc selected
    To expand your ontological search to include all synonyms and subclasses, even if they do not appear in the search results, click the click here link. 
    Search Investigations page showing neoplasm as the keyword and tumor as the synonym, along with several subclasses
    The page refreshes and shows all synonyms and subclasses, even if they're not in the search results. You can return to the view of synonyms and subclasses only in the search results by clicking click here again.
    Search Investigations page showing the keyword of neoplasm and all synonyms and subclasses, even if not in the search results

  7. Click any investigation name, study name, or assay filename to explore the data further.

Downloading Investigation Data

After reaching an investigation entity (such as investigation, study, assay, source, protocol, sample, or data file), you can download the full data, selected metadata, or selected data associated with that entity. The full data is always associated with the investigation as a whole. All of the data currently in the portal is public.

Metadata describes the structure of the data collected in an investigation and translates to the file columns, field definitions, and placeholders that appear in a spreadsheet. It is in Investigation-Study-Assay tab-delimited format (ISA-Tab), which is based on the ISA-Tab specification Exit Disclaimer logo .

The data files often contain image files and spreadsheets and can be a large file size.

Each download option has a button on the Investigation Details, Study Details, and Assay Details pages.

Download Full Data (13.48 GB zip) Download Selected Metadata  Download Selected Data

  • Download Full Data downloads metadata and data files for the entire investigation. Since this can be a large file size, the file size appears on the button.
  • Download Selected Metadata downloads only the metadata of a selection you make after filtering a study an assay.
  • Download Selected Data downloads both the metadata and the data files of a selection you make after filtering a study or an assay.

An investigation's full data file is always available for download. However, due to the processing resources required, Selected Data downloads are currently limited to 30GB.

Downloading Full Data

You do not have to log in before downloading the full data from an investigation. If you are not logged in, when you request the full data, you are prompted to provide your email address. You will receive a link at that address you can use to access and download the data. If you are logged in, you have the option of using Globus to download the file.

To download full data

  1. Open the investigation you want to download.
    The Investigation Details page appears.
  2. Click the Download Full Data button.
    The Request Data Files dialog appears. It offers different options depending on whether or not you are logged in to CSSI DCC.

    • If you are not logged in, enter your email address and then click Download. A link to the archive will be emailed to you.
      Request Data Files dialog box with a box for email address and a Download button

    • If you are logged in, you have two options: 1) transfer the file with Globus, which is useful when the file is very large, or 2) download the file to your computer now.
      Request Data Files dialog box with a checkbox for transferring with Globus and a Download button
      If you choose to transfer the file with Globus, click the checkbox and then click Download. The Transfer Files page in Globus opens. For more information about transferring files with Globus, see Uploading Files with Globus or Globus Support Exit Disclaimer logo .

      If you choose to download the file now, just click Download.
      Your browser prompts you to open or save full_archive.zip.

      It may be useful to rename the .zip file as you save it to include the name of the investigation so that you can identify it more easily. For example, miRNA_full.zip.

      For example, the following dialog box appears in Google Chrome.
      Open Full Archive in FireFox

      Follow your browser's instructions to open or save the file.

Downloading Selected Metadata

Metadata files are usually small text files, so you can download them directly to your computer. You do not need to log in before downloading metadata files.

To download selected metadata

  1. Filter a study or an assay until you reach a selection of investigation data you are interested in downloading. You may be interested in only the metadata so that you can see how that selection of entities was structured.
  2. On the Investigation Details page, click Download Selected Metadata.
    Your browser prompts you to open or save the .zip file. Follow your browser's instructions to open or save the file.
    FireFox open or save download dialog box

    It may be useful to rename the .zip file as you save it to include the name of the investigation so that you can identify it more easily. For example, miRNA_metadata.zip.

    Archive File Contents

    The archive file is in compressed format. When you download it, it may be a single compressed folder or .zip file. When you open it, you see at minimum three text files at the root of the folder or file. An example of these follow. 

       a_10290.txt

       i_10290.txt

       s_10290.txt

    The text files describe the investigation, study or studies, and assay or assays. In this example, 10290 represents the file identifier but in practice, each file identifier may be named differently, even in the same investigation. Only the a, i, or s prefix is required. If you download full data or selected data, the archive may also contain other files or folders as appropriate for the investigation; for example, images. If you download metadata, the archive only includes the a, i, and s files.

Downloading Selected Data

Unable to render {include} The included page could not be found.

Downloading Large Files with Globus

Globus is a service that enables large file transfers securely. You must have an account with Globus and install Exit Disclaimer logo Globus Connect Personal to use it to download investigation files to CSSI DCC. If you do not already have an account, you are prompted to create one when you start the download process.

To download files using Globus

  1. If you haven't already, start Globus Connect Personal.
  2. Filter a study or an assay until you reach a the investigation data you are interested in downloading.
    The Request Data Files page appears. It looks slightly different if you are downloading the investigation's full data or selected data.
  3.  If you are downloading the investigation's full data, click the Transfer with Globus checkbox and then click Download.
    Request Data Files, full data
     You are prompted to log into Globus. Skip to step 5 to continue this procedure.
  4. If you are downloading selected data from the investigation, do the following.
    1. Click the Transfer with Globus checkbox.
      Request Data Files window

    2. Click Request Download.
      The Globus Upload window appears, asking you to confirm your Globus ID.
      Globus Upload window

    3. Enter your Globus ID (an email address) and click Confirm.
      The Download Requested window appears.

      Download Requested window. The system is processing your request. You will receive an email when your download is ready.

    4. Click Ok. Periodically check your email inbox for the link to the file on Globus. Click the link in the email.
      
  5. Review the Transfer Files page. This page appears no matter whether you are downloading full or selected data.
    One of the Endpoints you configured when you installed Globus Connect Personal is already populated, though you can change it.
    Globus Transfer Files page
  6.  Select the starting endpoint (on the right) where the file(s) you want to download reside(s). Narrow down to the path if necessary.
  7. Confirm or change the destination endpoint (on the left). 
  8. Click the left arrow button that points to the destination to begin the transfer request. 
    A message appears on the screen when the transfer request is submitted successfully.

    Globus Transfer Files page, transfer request submitted successfully
    You will receive an email when the request is granted and the transfer succeeds. 

  • No labels