This chapter describes the processes for searching studies within caIntegrator.
Topics in this chapter include:
The search and browse functions in caIntegrator allow you to search for subject annotation data, genomic data or imaging data that were uploaded into the application as part of a study. When gene expression and imaging data are uploaded into a caIntegrator study, mapping files that correlate sample IDs in those files to subject IDs (patient IDs) in the subject annotation data file must also be uploaded. When you launch a search, caIntegrator finds and integrates the subject annotation, genomic and imaging data based on the mapping files and the criteria that you define in the search query.
In a search query, you can specify criteria for just one of the data types, or configure complex search criteria that join two or three data types. The available criteria for the query were defined when the study was deployed.
The basic workflow for a study search follows these steps:
*\[Annotations\]* – Annotation data can be labeled 'default' or given the annotation 'group' name when annotation groups are specified by the manager, for example, chronologic, therapy, diagnosis, patient, or other annotation group types. This selection searches one or more uploaded CSV files for data identifiers or annotations (column headers) specified during study creation. |
*\[Genomic\]* – Genomic data can be gene expression or copy number data. This selection searches caArray experiments samples uploaded in the study for gene expression or copy number data by gene name, reporter ID, chromosome number, chromosome coordinates and/or segmentation values representing amplification or deletion. |
To initiate a search of all annotations and/or other data in a study, follow these steps:
On the left sidebar, under the first section that displays the study name, click *Search \[Study Name\]{*}. This opens a simple search query page with five tabs, shown in the following figure. !criteria tab80.png|vspace=4, alt="Search page"! |
Continue with:
#Annotation and Image Data Searches
#Gene Expression Data Searches
#Copy Number Searches
As long as you are still in the current query session, you can return to the Criteria, Columns and Sorting tabs to add, modify or remove data and display criteria and re-run the search. If you configure another query without saving the first, the first query will be lost. If you save the query, your current search criteria are saved. |
If the study manager defined the study's own annotation groups, then those group names are listed in the criteria drop-down list. If the study manager did not define the study's annotation groups when the study was created, then all annotations are placed, by default, in a group called "Annotations default". |
If the study includes imaging data, imaging annotations should be available in the Annotations list. |
When working with image data, if only an Imaging Mapping file was uploaded when the study was created and not an Image Series Annotation file, you cannot enter image search criteria. The search results will, however, display a link that allows you to view the associated images in NBIA. |
Continue with step in .
If you leave the gene symbols field blank, caIntegrator searches all gene symbols for a match to the other criteria you specify. |
caIntegrator provides three methods whereby you can obtain gene names for a gene expression search. See .
Additional fields display for the Expression Level selection.
Range Type –
>=: Greater than the entered value
<=: Less than the entered value
Inside Range: Looks for all matching values that occur between the two levels you enter
Outside Range: Looks for all matching values that are not in between the two levels that you enter.
The range of possible values is determined on the array side; the value is affected by the array type and the sample's behavior.
The default value of 100 is a fixed default and does not reflect any values on the array. It simply represents a starting point for the query.
Additional fields display for the Fold Change selection.
The fold change option appears only if genomic control samples have been uploaded to the study. Fold change identifies genes with expression differences compared to control samples, as defined when the study was deployed in caIntegrator. You can enter query values in greater/lesser-than-or-equal-to arguments.
For example, if you enter 2.0 in this field, after selecting Up in the previous field, the search will locate genes whose expression is 2 times (2-fold up regulation) the base value.
Continue with step in .
In some diseases, like cancer, cells that are abnormal can exhibit a change in the chromosomal structure in that parts of a chromosome can be amplified or deleted. 'Copy number' experiments that measure variation in genomic structure use molecular markers to detect amplification or deletion of chromosomal segments. Typically, copy number alteration experiments compare a genomic sample from a diseased tissue (for example, a tumor) to a control sample (for example, blood).
The Copy Number query option, as described in , appears only if copy number data have been uploaded to the study. A copy number search identifies patients or samples that have a copy number amplification or deletion in the genome range specified. Searches can be constructed with gene names, chromosome number and/or chromosome coordinates. You can enter query values in greater/lesser-than-or-equal-to arguments.
If you leave the gene symbols field blank, caIntegrator searches all gene symbols for a match to the other criteria you specify. |
caIntegrator provides three methods whereby you can obtain gene names for a copy number search. For information about selecting genes, see .
Additional fields display for the Segmentation selection.
Segmentation is the process of defining the chromosomal boundaries (coordinates) of the region deleted or amplified in the sample.
caIntegrator provides three methods whereby you can obtain gene names for a copy number search. For information about selecting genes, see .
The Bioconductor DNAcopy algorithm (see on page 68) identifies the location of the amplification or deletion and then reports it as the base pair at the start and stop of the segment. Each segment is then catalogued with chromosome number, start coordinate, stop coordinate, genes in the segment, and the segment mean value.
Additional fields display for the Calls selection.
CGHCalls calls aberrations for array CGH data using a six state mixture model.
If you leave the gene symbols field blank, caIntegrator searches all gene symbols for a match to the other criteria you specify. |
caIntegrator provides three methods whereby you can obtain gene names for a copy number search. For information about selecting genes, see .
For more information about CGHCalls, see Continue with step in .
You can specify columns for the way you want the search results to display either before or after you run the search. If you run the search directly from the Criteria tab before setting the results type/sorting features, by default only the Subject Identifiers display on the Search Results tab. You can then come back to the and to expand the display options and re-run the search, having set the display parameters.
For more information, see on page 65.
The selection you make on the Results Type tab determines whether caIntegrator displays search results for subject annotation or genomic data. It filters the search based on the criteria you set on the Criteria tab, whether it is annotation, gene expression or image series data type(s). In other words, if you select annotation criteria on the Criteria tab, but select Genomic on the Results Type tab, the data subset that displays on the Search Results tab is genomic data that is filtered by the annotation criteria you defined on the Criteria tab.
For subject annotations, the Patient or Subject Identifier displays by default in the search results. |
Results display in a gene expression data matrix. For more information, see on page 66.
Imaging – If imaging annotations have been added to the study, annotation elements also display on the lower right section of this page when you select Annotation. All elements listed are column headers in the image annotation data uploaded to the study. You can make multiple selections on this list.
If you select even one Image Annotation on the Results Type tab, the Image Series IDs display by default in the search results. If you select no Image Annotations on the Results Type tab, however, even if you have selected image series criteria on the Criteria tab, no image series IDs display in the search results. The fact that images can be located, however, in NBIA is indicated by two image-related buttons at the bottom of the Query Results page. You can open the images in NBIA, but they will be at StudyInstance UID level. See on page 79. |
Results display as tabular data. For more information, see on page 66.
The column selection is saved as part of the query if you save it. See .
On the Sorting tab, you can set the sort order for data columns in the query results. You can also indicate whether column contents are sorted in ascending or descending order.
The columns that display on the Sorting tab are those criteria that you selected on the for an Annotation Results type search.
Sorting is not applicable to copy number search results. For those results, no options are available on the Sorting tab. |
Sorting parameters are saved as part of the query if you choose to save it using the Save Query feature. See .
For information about the search results, see .
When you create a search query in caIntegrator, you can save the query for later use or edit it.
To save a query, follow these steps:
Once the query is saved, it is listed by its name under the Study Data > Queries > My Queries in the left sidebar, whenever the study to which the query applies is selected. Click on the saved query in this list to either edit or re-run the query. Click on the query name to retrieve query results. If you hover over the Name text for the query, a pop-up displays the query description.
To edit a query, follow these steps:
After running a search, you can export the result set or a subset as a tab-delimited text file. For more information, see on page 80.