NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The 'Edit Study' page has now reloaded and the status of the newly added source has changed to 'Loaded' under the Status column in the Data Sources table.
The status of the newly uploaded source now appears as 'Loaded' under the Status column.Image Modified
The status of the newly uploaded source now appears as 'Loaded' (highlighted in red) under the Status column.

  1. To see what obstacles may arise in the course of loading additional data, let's try another file. This one, named 'duplicate_annotations_tutorial.CSV', contains the same five fields as each of the previously loaded files, including 'PATIENT_ID'. After repeating the procedure in steps 3 through 8, the Edit Study page displays an error message stating, "Value already loaded: Subject 3 already has a value for Stratagene" above the 'Annotation Groups' heading; in addition, the status of the newly loaded file shows as 'Error' under the 'Status' column of the 'Subject Annotation Data Sources' table.

The status of the newly loaded annotation file shows as 'Error' on the 'Edit Study' page.Image Modified
After attempting to load the next annotation file 'duplicate_annotations_tutorial.CSV', the 'Edit Study' page shows the error message "Value already loaded: Subject 3 already has a value for Stratagene" (highlighted in red) and the status of the file shows as 'Error' (highlighted in blue).

To understand why this error is occurring, let's examine the contents of the new annotation file we just tried to load. A partial screenshot of the file appears below as viewed in a Microsoft Excel 2007 window.
"The newly loaded annotation fileImage Modified

Notice that this file contains not only new subjects (IDs 6000 to 6002), but also some of the same subjects (i.e., IDs 3, 5, and 10) from the previously loaded file "subject_annotation_DC_Lung_Study_111210.csv". In addition, the values in the 'Stratagene' field for these subjects are different in the new file than they were in the original file. This explains the 'Value Already Loaded' error message which occurs when we attempt to load the file – this message is another way of saying that the file we're trying to load contains duplicates of subjects from previously loaded files.

...

  1. We can't query the study unless it's already been deployed. To check whether this is the case, scroll all the way down to the bottom of the 'Edit Study' page, where you'll see a row of three buttons. If the study has been deployed, as is the case in our example, the left button labeled 'Deploy Study' will be grayed out and you will not be able to click on it. If, however, the study hasn't been deployed, the button will appear normally, and you can click on it to deploy the study.

"If your study hasn't yet been deployedImage Modified
The bottom of the 'Edit Study' page shows the 'Deploy Study' button (highlighted in red). In this example, the study has already been deployed so this button is grayed out. If your study hasn't yet been deployed, the button will appear normally, and you can click on it to deploy the study.

  1. Now that we've loaded our clinical data into the study, let's query it. To get started, click on the link 'Search Demo Study for ICR Folks' under the menu 'DEMO STUDY FOR ICR FOLKS' in the navigation panel to the left.

Click on the link 'Search Demo Study for ICR Folks' to perform a query on the annotation data you just uploaded.Image Modified
Click on the link 'Search Demo Study for ICR Folks' (highlighted in red) to perform a query on the annotation data you just uploaded.

...

As an example, let's say we want to query the data for all male subjects located at the 'MI' study site. In this case, our two query criteria are 'Site' and 'Gender', and their respective query values are 'MI' and 'Male'. We can formulate the query by first clicking on the 'Add' button to the right of the drop-down list under the 'Define Query Criteria' heading.

"To begin formulating your query of study dataImage Modified
To begin formulating your query, click on the 'Add' button (highlighted in red).

  1. Next, click on the drop-down list that appears below the 'Add' button. The list contains three items: 'Site', 'Stratagene', and 'Survival in Months'. Click on 'Site'.

Click on 'Site' from the Annotations drop-down list to select it as a query criterion.Image Modified
Click on 'Site' (highlighted in red) from the Annotations drop-down list to select it as a query criterion.

  1. Once you click on 'Site', another two drop-down lists will appear to the right of the original one. Click on the third (rightmost) list to bring up the different values for Site and click on 'MI' from this list.

Click on 'MI' in the drop-down list of values for the Site field.Image Modified
Click on 'MI' (highlighted in red) in the drop-down list of values for the Site field.

...

To add Gender as a field, go back to the original drop-down list (the one at the top), click on it again, click on 'Demographic' in the list, and then click on the 'Add' button to the right of the list.

"To add Gender as a fieldImage Modified
Select Demographic (highlighted in red) from the drop-down list, then click on the Add button (also highlighted in red).

  1. Next, a new drop-down list labeled 'Demographic' will appear below the one labeled 'Annotations – Default'. Click on this new list, then click on 'GENDER'.

Select 'GENDER' from the 'Demographic' drop-down list.Image Modified
Click on 'GENDER' in the 'Demographic' drop-down list.

  1. Once you click on 'Gender', another two drop-down lists will appear to the right of the original one. Click on the third (rightmost) list to bring up the different values for Gender and click on 'Male' from this list.

Select 'Male' from the 'Demographic' drop-down list.Image Modified
Click on 'Male' (highlighted in red) in the third (rightmost) drop-down list labeled 'Demographic'.

  1. Now that we've fully defined our query, we're ready to run it. Click on the 'Run Query' button at the bottom of the page to see the results.

Click on the 'Run Query' button at the bottom of the page to view the query results.Image Modified
Click on the 'Run Query' button (highlighted in red) to see results.

...

You can sort these results in numerical order of subject ID by clicking on the 'Subject ID' heading above the right table column.

You can sort query results by clicking on the 'Subject ID' heading above the right column.Image Modified
You can sort query results by clicking on the Subject ID heading (highlighted in red) above the right column.

  1. You can customize the display of query results by clicking on the 'Results Type' tab at the top of the page and selecting additional fields to be displayed via the checklists for each annotation set. In this example, we checked off 'Stratagene' and 'Survival in Months' in the default annotation checklist.

You can select additional fields to be displayed in the query results from the checklists under the 'Results Type' tab.Image Modified
You can select additional fields (highlighted in red) to be displayed in the query results by selecting them from the checklists in the 'Results Type' tab, then clicking on the 'Run Query' button (also highlighted in red).

If you now click on the 'Run Query' button at the bottom right of the page, the results will be displayed again under the 'Query Results' tab, but this time with the additional columns Stratagene and Survival in Months, which correspond to the new fields we selected.

The updated query results include two additional columns which correspond to the two additional fields we selected under the 'Results Type' tab.Image Modified
The updated query results include two additional columns (highlighted in red) which correspond to the two additional fields we selected under the 'Results Type' tab.

  1. To save this query in caIntegrator for future reference, click on the 'Save query as..' tab at the top of the page, enter a name and description for the query in the respective fields, and click on the 'Save Query' button at the bottom.

"You can save the query by clicking on the 'Save query as..' tabImage Modified
You can save the query by clicking on the 'Save query as..' tab, entering a query name and description, and clicking on the 'Save Query' button (highlighted in red).

  1. Once the query is saved, the Search page will reload and the Study Data menu in the left navigation panel will expand to show the newly saved query 'Tutorial' under the 'My Queries' heading. You can click on the magnifying glass icon to the left of the Tutorial link to bring up the query results again, or on the pencil icon to edit the query criteria.

The 'Tutorial' query is now saved under the 'STUDY DATA' menu in the left navigation panel.Image Modified
The 'Tutorial' query (highlighted in red) is now saved under the 'STUDY DATA' menu in the left navigation panel and can be accessed at any time.

...

  1. To begin, navigate back to the 'Edit study' page for the 'Demo Study for ICR Folks'. If you forgot how to do this, you can refer to step 2 in this tutorial.
  2. On the 'Edit study' page, scroll down to the 'Genomic Data Sources' heading. The table below it shows that one source has already been loaded and mapped. To add another, start by clicking the 'Add New' button to the right of the heading.

Click on the 'Add New' button to begin adding a new genomic data source.Image Modified
Click on the 'Add New' button (highlighted in red) to begin adding a new genomic data source.

...

If your server hostname or any of the other values for your data source differ from the default values, then enter them into their respective fields, then click on the 'Save' button at the bottom of the page. (Remember that, if your study is private, you must enter the login credentials into the 'Username' and 'Password' fields.)

"Enter the values for your data source if they differ from the default valuesImage Modified
Enter the values for your data source if they differ from the default values, then click on the 'Save' button (highlighted in red). Don't forget to enter your caArray experiment ID – the ID for our example source is 'jacob-00182'.

  1. Back on the 'Edit Study' page, a new row has appeared in the 'Genomic Data Sources' table which corresponds to the new data source we just added. Our next step is to map the samples in this source to the subjects in our annotation source. To begin, click on the 'Map Samples' button under the 'Action' column at the right of the table.

Click on the 'Map Samples' button to map the samples to subjects from the annotation source we added previously.Image Modified
The newly added row (highlighted in red) in the Genomic Data Sources table corresponds to the new genomic data source we added in step 24. Click on the 'Map Samples' button (highlighted in blue) to map the samples to subjects from the annotation source we added in steps 3 to 8.

  1. The 'Edit Sample Mappings' page displays a list of unmapped samples, followed by another list mapping sample IDs to subject IDs. As you can see, the mapping list is empty, which means that none of the samples in this source have been mapped yet! The list of unmapped samples appears under the heading 'Unmapped Samples' and subheading 'Sample Name'. The numbers in this list represent the sample IDs of the unmapped samples.
  2. The 'Edit Sample Mappings' page shows a list of IDs for unmapped samples.Image Modified

The 'Edit Sample Mappings' page shows a list of IDs for unmapped samples (highlighted in red).

Your mapping CSV file must map the subject IDs in your annotations to the sample IDs in the unmapped samples list. A screenshot of the mapping file used in this tutorial, taken from a Microsoft Excel 2007 window, is shown below. The file is a table of two columns with no headings; the first column contains IDs of the subjects from the annotation source and the second column contains IDs from the unmapped samples list. Each subject in the left column corresponds to the sample in the right column. Note that the file doesn't map every single sample ID from the data source.

This CSV file (shown in Excel) maps the subject IDs from our annotation source (left column) to the sample IDs in our genomic source (right column).Image Modified
This CSV file maps the subject IDs from our annotation source (left column) to the sample IDs in our genomic source (right column).

To add your mapping CSV file to the study, click on the 'Choose File' button next to the 'Subject to Sample Mapping File' label.

Click on the 'Choose File' button to select a mapping file to open.Image Modified
Click on the 'Choose File' button (highlighted in red) to choose a mapping file to open.

In the Open dialog that follows, find your mapping file, click on it, and then click on the 'Open' button. (In our example, the mapping file is named 'mapping_file_tutorial.CSV'.)
"In this exampleImage Modified
To open your mapping file, click on the 'mapping_file_tutorial.CSV' file (highlighted in red), then click on the 'Open' button (highlighted in blue).

...

Since this information may be considered important to your study, we need a way of distinguishing between the cases and controls. The way that caIntegrator addresses this need is with a 'control training file' that lists the sample IDs of all the controls. Any sample that is not listed in this file comes from a case. The screenshot below shows a portion of an example training file in CSV format from a Microsoft Excel 2007 window.

This control training file (shown in Excel) lists the sample IDs of all the controls from our example data source.Image Modified
A portion of a control training file listing the sample IDs of all the controls from our example data source. You don't need to understand the format or nomenclature of the sample IDs – they were generated by the instrument or technician who ran the samples.

To add your control training CSV file to the study, click on the 'Choose File' button next to the 'Control Samples File' label.


click on the 'Choose File' button next to 'Control Samples File' to begin uploading your control training file.Image Modified
The filename of the mapping file we just uploaded now appears next to the 'Choose File' button for 'Subject to Sample Mapping File' (highlighted in red). Now click on the 'Choose File' button next to 'Control Samples File' (highlighted in blue) to begin uploading your control training file.

In the Open dialog that follows, find your mapping file, click on it, and then click on the 'Open' button. (In our example, the mapping file is named 'control_training_file_tutorial.CSV'.)

"Here we click on the 'control_training_file_tutorial.CSV' fileImage Modified
Click on the 'control_training_file_tutorial.CSV' file (highlighted in red), then click on the 'Open' button (highlighted in blue).

  1. Back on the 'Edit Sample Mappings' page, the filename of the control training file you just opened is now displayed to the right of the 'Choose File' button from step 26. Now enter a name for the control sample set in the 'Control Sample Set Name' text field (our example uses 'tutorial controls'), then click on the 'Map Samples' button to map your samples.

"Enter a title into the 'Control Sample Set Name' text fieldImage Modified
The filename of the control training file you just uploaded now appears to the right of the 'Choose File' button (highlighted in red). Enter a title into the 'Control Sample Set Name' text field (highlighted in blue), then click on the 'Map Samples' button (highlighted in green) to map your samples.

  1. Back on the 'Edit Study' page, the new mapping and control files we uploaded are now listed under the File Description column, while the Status has changed from 'Not mapped' to 'Ready to be loaded'. We are now done mapping our samples and are ready to query them.

The mapping file we uploaded now appears under the File Description column; the status has changed from 'Not mapped' to 'Ready to be loaded'.Image Modified
The mapping file we uploaded now appears under the File Description column and is highlighted in red, while the control file we uploaded is highlighted in green. Under the Status column, the status has changed from 'Not mapped' to 'Ready to be loaded' (highlighted in blue).

  1. To see what obstacles may arise in the course of loading mapping data, let's try another file. This one, named 'duplicate_mapping_file_tutorial.CSV', will replace the one we loaded in steps 26 to 28. A partial screenshot of this file, taken from a Microsoft Excel 2007 window, is shown below.

"In this mapping fileImage Modified
In this mapping file, the same sample (ID 191) is mapped twice, once to subject ID 5085 (highlighted in red) and again to subject ID 6000 (highlighted in blue).

...

Surprisingly, when we repeat the procedure for loading mappings with the 'duplicate_mapping_file_tutorial.CSV', caIntegrator does not display any error message, and its source's status shows as 'Ready to be loaded' in the 'Genomic Data Sources' table, as was the case with the previous mapping file we loaded successfully. Does this mean that caIntegrator allows multiple mappings of the same sample to different subjects?

"When loading an invalid mapping fileImage Modified
When loaded loading an invalid mapping file, caIntegrator does not display any error messages and shows the status of the invalidly mapped source as 'Ready to be loaded' (highlighted in red).

  1. As it turns out, when caIntegrator parses a mapping file in which the same sample is mapped to multiple subjects and encounters a sample ID that has already been mapped, it will overwrite the old mapping with the new one. We can confirm this by clicking on the 'Map Samples' button for the source we mapped and examining the 'Samples Mapped to Subjects' table on the 'Edit Sample Mappings' page.

"On the 'Edit Sample Mappings' pageImage Modified
"On the 'Edit Sample Mappings' pageImage Modified
On the 'Edit Sample Mappings' page, sample ID 191 is only mapped to a single subject (highlighted in red), even though the mapping file we just loaded mapped that same sample twice.

...

  1. On the 'Edit Study' page, click on the 'My Studies' drop-down list in the blue banner at the top, then click on 'Demo Study for ICR Folks'.

"Here we click on the 'My Studies' drop-down listImage Modified
Click on the 'My Studies' drop-down list (highlighted in red), then click on 'Demo Study for ICR Folks' (highlighted in blue).

  1. On the 'Welcome' page, click on the 'Search Demo Study for ICR Folks' link under the 'DEMO STUDY FOR ICR FOLKS' heading in the navigation panel at the left.

Here we click on 'Search Demo Study for ICR Folks' to begin querying the study.Image Modified
Click on 'Search Demo Study for ICR Folks' (highlighted in red) to begin querying the study.

  1. On the 'Search' page, click on the drop-down list under the 'Define Query Criteria' heading. The list shows the different criteria we can query the study by. Since we want to query genomic data, click on 'Gene Expression', then click on the 'Add' button to the right of the list.

"Click on the 'Define Query Criteria' drop-down listImage Modified
Click on the 'Define Query Criteria' drop-down list (highlighted in red), then click on 'Gene Expression' (highlighted in blue) and click on the 'Add' button (highlighted in green).

  1. When querying by gene name, you can either search for a gene symbol or for a fold change. In this example, we'll search by the gene symbol. Click on the 'Gene Name' drop-down list, then click on the 'Gene Name' list entry.

"Click on the 'Gene Name' drop-down listImage Modified
Click on the 'Gene Name' drop-down list (highlighted in red), then click on the 'Gene Name' list entry (highlighted in blue).

  1. In the gene symbol text field that appears to the right, type in 'EGFR' (the symbol for the epidermal growth factor gene), then click on the 'Run Query' button below.

"Type 'EGFR' into the 'Gene Symbol' text fieldImage Modified
Type 'EGFR' into the 'Gene Symbol' text field (highlighted in red), then click on the 'Run Query' button (highlighted in blue).

...

You can sort these results in numerical order of subject ID by clicking on the 'Subject ID' heading above the right table column.

Click on the 'Subject ID' column heading to sort the EGFR gene query results.Image Modified
Click on the Subject ID column heading (highlighted in red) to sort the EGFR gene query results.

  1. As it stands, these query results are not very useful, as they only show which subjects have EGFR expression data and don't show the actual data itself. To change this, click on the 'Results Type' tab at the top of the page, then click on the 'Gene Expression' radio button under the 'Select Results Type' heading. This will change the query results to display one or more numerical values which indicate the expression levels of the EGFR gene for each sample.

"Click on the 'Results Type' tabImage Modified
Click on the 'Results Type' tab (highlighted in red), then click on the 'Gene Expression' button (highlighted in blue).

  1. In the query results, we can choose to display every EGFR expression value for a given sample, or to display a single value which represents the median of that sample's values. For simplicity's sake, let's choose the latter option by clicking on the 'Gene' button next to 'Select Reporter Type', then clicking on the 'Run Query' button to display the results.

"Click on the 'Gene' button to display a single value representing each subject's EGFR expression levels in the query resultsImage Modified
Click on the 'Gene' button (highlighted in red) to display a single value representing each subject's EGFR expression levels in the query results, then click on the Run Query button (highlighted in blue) to display the results.

  1. Back on the 'Query Results' page, there are now two additional columns of data: Sample ID and EGFR. The value in the EGFR column represents the median of the gene's expression levels for the corresponding subject and sample. Note that the screenshot below only displays the first five results in the list; you can scroll down the list via the bar at the right to view the rest of the results.

Click on the 'Save query as…' tab to save the query results.Image Modified
The query results now show two additional columns: Sample ID and EGFR. The latter represents median EGFR expression values. Click on the 'Save query as…' tab (highlighted in red) to save these results for future reference.

To save this query in caIntegrator for future reference, click on the 'Save query as..' tab at the top of the page, enter a name and description for the query in the respective fields, and click on the 'Save Query' button at the bottom.

"Enter a query name and description in the respective text fieldsImage Modified
Enter a query name and query description in the respective text fields, then click on the 'Save Query' button (highlighted in red) to save the query for future reference.

  1. Once the query is saved, the Search page will reload and the Study Data menu in the left navigation panel will expand to show the newly saved 'Genomic Query' under the 'My Queries' heading. You can click on the magnifying glass icon to the left of the Query link to bring up the query results again, or on the pencil icon to edit the query criteria.

The newly saved 'Genomic query' is shown in the 'Study Data' menu under 'My Queries'.Image Modified
The newly saved 'Genomic query' (highlighted in red) is shown in the 'Study Data' menu under 'My Queries'.

...