NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel2

scrollbar:icons=false}

This chapter describes how to use caIntegrator tools to analyze data in subject annotation or genomic studies that have been deployed in caIntegrator.

Topics in this chapter include the following:

Table of Contents
minLevel2

Data Analysis Overview

Once a study has been deployed, you can analyze the data using caIntegrator analysis tools.

...

This topic opens from any of the three K-M plot tabs. For specific details about working with these tabs, see the following topics:

The Kaplan_Meier method analyzes comparative groups of patients or samples. In caIntegrator, the K-M method can compare survival statistics among comparative groups. You can configure the survival data in the application. For example, you might identify a group of patients with smoking history and compare survival rates with a group of non-smoking patients, or compare the survival data for two groups of patients with a specific disease type, based on Karnofsky scores. You could compare groups of patients with varying gene expression levels. You can also identify data sets using the query feature in the application, saving the queries, then configuring the K-M to compare groups identified by the queries.

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page.
  2. Under Analysis Tools on the left sidebar, select K-M Plot.
  3. Select the For Annotation tab at the top of the page, shown in the following figure.
    ”Fields for defining annotation data for a K-M plot”
  4. The groups to be compared in the K-M plot originate from one patient group. Varying data sets are based upon multiple values corresponding to the selected annotation. Define Patient Groups using these options described in the following table:

    Field

    Description

    Annotation Type

    Select the annotation type that identifies the patient group. Selections are based on the data in the chosen study.

    Annotation

    Select an annotation. Fields are based on the annotation type you select. For example, if you choose Subject, then you could select Gender or Radiation Type or any field that would distinguish the patients into groups based upon their values.

    Tip
    titleTip

    Only annotations that are defined with permissible values display in the drop-down list.

    Values

    Using conventional selection techniques, select two or more values which will be the basis for the K-M plot. Permissible (available) values or "No Values" correspond to the selected annotation.

    *Survival value

    Survival value is the length of time the patient lived. caIntegrator displays valid survival values entered for this study. Select the survival measure which is the unit of measurement for the survival value to be used for the plot.

  5. Click the Create Plot button.

caIntegrator generates the plot, as shown in the following figure, which then . The plot displays below the plot criteria.
”A K-M plot generated for groups based on annotations”

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. You must select a study with gene expression data.
  2. Under Analysis Tools on the left sidebar, select K-M Plot.
  3. Select the For Gene Expression tab, shown in the following figure.
    ”Fields for defining gene expression data for a K-M plot”
  4. For Gene Symbol, enter one or more gene symbols in the text box or click the icons to locate genes in the following databases. If you enter more than one gene in the text box, separate the entries by commas.

    caIntegrator provides three methods whereby you can obtain gene symbols for calculating a KM plot for gene expression. For more information, see #Choosing Genes.

    Field

    Description

    *Expression Type

    By Fold Change: Over-expressed/Under-expressed – Define This option allows you to define in the next two fields the over- and under-expression criteria , expressed in terms of fold-change. Fold change is the ratio of the measured gene expression value for an experimental sample to the expression value for the control sample.
    By Expression Level: This option allows you to run a KM gene expression plot when there is no control group nor reference data set. In the next two fields, enter values Overexpressed (above expression level) or Underexpressed (below expression level).

    Survival valueValue

    This field, which represents the length of time the patient lived, is required for both expression types. Select the survival measure which is the unit of measurement for the survival value to be used for the plot.

    Control Sample Sets

    This field is required only for fold change data. One or more control sets are created by the study manager when a study is deployed. Select the Control Sample Set you would like to use to calculate fold-change.

    Info
    titleNote

    If the study has more than one platform associated with it, the platform is inherently selected when you select the control set. Control sets are comprised of samples from only one platform.

    #Click the Create Plot button. caIntegrator generates the plot which then displays below the plot criteria.

After you have defined the criteria, caIntegrator generates the plot which then displays below the plot criteria.An example displays in the following figure.
”K-M plot generated from gene expression data.”

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. The queries you identify for the K-M plot must have been saved previously in caIntegrator.
  2. Under Analysis Tools on the left sidebar, select K-M Plot.
  3. Select the For Queries and Saved Lists tab, shown in the following figure. The criteria for the plot are described in the table below the figure.
    ”Fields for defining K-M plot parameters based on saved queries in caIntegrator”

    Field

    Description

    Queries

    Select Queries whose data you want to analyze from the All Available Queries panel and move them to the Selected Queries panel using the Add >> button. Note:Genomic queries do not appear in the lists; they cannot be selected for this type of K-M plot.

    Exclusive Subject in Queries

    Check the box if you want to exclude any subjects that appear in both (or all) queries selected for the plot, thus eliminating overlap.

    Add Additional Group...all other subjects

    Check the box to create an additional group of all other subjects that are not in selected query groups.

    Survival value

    The Survival value is the length of time the patient lived. Select the survival measure which is the unit of measurement for the survival value to be used for the plot.

  4. Click the Create Plot button.

caIntegrator generates the plot which then displays below the plot criteria. An example displays in the following figure.
K-M Plot comparing statistics between subjects in two queriesImage Modified

  • The number of subjects for each group is embedded in the legend of the graph below the plot.
  • A P-value is also generated for the selected groups; it displays at the bottom of the page. A low P-value generally has more significance than a high P-value.
  • For information regarding the P-value calculation, see #Creating Kaplan-Meier Plots.

...

Gene expression plots compare signal values from reporters or genes. This statistical tool allows you to compare values for multiple genes at a time, and ; it does not limit your comparison to only two sets of data to be compared. It also allows you to compare expression levels for selected genes against expression levels for a set of control samples designated at the time of study definition.

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. (You must select a study which has genomic data.)
  2. Under Analysis Tools on the left sidebar, select Gene Expression Plot. This opens a page with three tabs
  3. Select the For Annotation tab, shown in the following figure.
    Gene expression value tab for configuring gene expression annotation value plot

    Field

    Description

    Gene Symbol

    Enter one or more gene symbols in the text box or click the icons to locate genes in the following databases. If you enter more than one gene in the text box, separate the entries by commas.

    caIntegrator provides three methods whereby you can obtain gene symbols for calculating a gene expression plot. For more information, see

    {

    #Choosing Genes

    ]

    .

    Reporter Type

    Select the radio button that describes the reporter type:
    Reporter ID*--Summarizes expression levels for all reporters you specify.
    Gene Name--Summarizes expression levels at the gene level.
    Platform--This field displays only if the study has multiple platforms. Select the appropriate platform for the plot. The platform you select determines the genes used for the plot.

    Sample Groups

    Choose among the following options:
    Annotation Type--Select the annotation type. Selections are based on the data in the chosen study.
    Annotation--Select an annotation. Fields are based on the annotation type you select. For example, if you choose Subject, then you could select Gender or Radiation Type or any field that would distinguish the patients into groups based upon study values.
    Values--Using conventional selection techniques, select one or more values which will be the basis for the plot. Permissible (available) values or "No Values" correspond to the selected annotation.

    Add Additional Group...

    Define as follows:
    ...all other subjects – Check the box to create an additional group of all other subjects that are not in selected query groups.
    ...control group* – Check the box to display an additional group of control samples for this study. The control set should be composed of only samples which are mapped to subjects. See Uploading Control Samples.

  4. Click the Create Plot button.

...

  • You can recalculate the data display by changing the Plot Type above the graph. See #Understanding a Gene Expression Plot.
  • You can modify the plot parameters and click the Reset button to recalculate the plotto recalculate the plot.

See also See #Understanding a Gene Expression Plot.

Gene Expression Value Plot for Genomic Queries

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. (You must select a study which has genomic data.)
  2. Under Analysis Tools on the left sidebar, select Gene Expression Plot.
  3. Select the For Genomic Queries tab, shown in the following figure.
    ”Gene expression value tab for configuring gene expression genomic queries plot”

    Field

    Description

    Genomic Query

    Click on the genomic query upon which the plot is to be based.

    Reporter Type

    Select the radio button that describes the reporter type:
    Reporter ID--Summarizes expression levels for all reporters you specify.
    Gene Name--Summarizes expression levels at the gene level.

  4. Click the Create Plot button. caIntegrator generates the plot, as shown in the following figure, which . The plot displays below the plot criteria. Legends below the plot indicate the plot input.
    ”A gene expression plot (Mean) based on a genomic query”
  5. You can recalculate the data display by changing the Plot Type above the graph. See #Understanding a Gene Expression Plot.
  6. You can modify the plot parameters and click the Reset button to recalculate the plot.

See also See #Understanding a Gene Expression Plot.

Gene Expression Value Plot for Annotation and Saved List Queries

Data to be analyzed on this tab must have been saved as a subject annotation query, but it must have genomic data identified in the query. For more information, see Adding/Editing Genomic Data. For the genomic data, you must identify genes whose expression values are used to calculate the plot.

To generate a gene expression plot using an annotation querythe plot, follow these steps:

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. You must select a study saved as a subject annotation study, but which has genomic data.
  2. Under Analysis Tools on the left sidebar, select Gene Expression Plot.
  3. Select the For Annotation Queries and Saved Lists tab, shown in the following figure.
    ”Gene expression value tab for configuring gene expression annotation queries plot”

    Gene Symbol

    Enter one or more gene symbols in the text box or click the icons to locate genes in the following databases. If you enter more than one gene in the text box, separate the entries by commas.

    caIntegrator provides three methods whereby you can obtain gene symbols for calculating a gene expression plot. For more information, see #Choosing Genes.

    Reporter Type

    Select the radio button that describes the reporter type:

  • Reporter ID – Summarizes expression levels for all reporters you specify.
  • Gene Name – Summarizes expression levels at the gene level.|

    Platform

    This field displays only if the study has multiple platforms. Select the appropriate platform for the plot. The platform you select determines the genes used for the plot.

    Saved Queries

    Choose among the available saved queries and lists. Build your selections in the right panel by using the Add > and Remove < buttons.

    Info
    titleNote

    Wiki Markup
     The \[SL\] and \[Q\] prefixes to list names indicate "Subject Lists" or "Saved Queries". A "G" in the prefix indicates the list is Global. For more information, see on page 69.

    Exclusive Subjects...

    To remove subjects in your queries and lists selection from queries or lists you use subsequently for analysis, check the button. This allows you to use them exclusively for the current analysis.

    Add Additional Group...

    Define as follows:

  • ...all other subjects – Check the box to create an additional group of all other subjects that are not in selected query groups.
  • ...control group – Check the box to display an additional group of control samples for this study. The control set should be composed of only samples which are mapped to subjects. See Uploading Control Samples.|
  1. Click the Create Plot button.

caIntegrator generates the plot which then displays below the plot criteria in bar graph format which displays below the plot criteria. By default, caIntegrator displays the mean of the data below the plot criteria. Legends below the plot indicate the plot input. An example displays in the following figure.
”Gene expression plot based on annotation queries gene expression values”

  • You can recalculate the data display by changing the Plot Type above the graph. See #Understanding a Gene Expression Plot.
  • You can modify the plot parameters and click the Reset button to recalculate the plot.

See also See #Understanding a Gene Expression Plot.

Understanding a Gene Expression Plot

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. You must select a study saved as a subject annotation study, but which has genomic data.
  2. Click GenePattern Analysis in the left sidebar of caIntegrator. This opens the GenePattern Analysis Status page.
  3. In the GenePattern Analysis Status page, select Comparative Marker Selection (Grid Service) from the drop down list and click New Analysis Job. This opens the Comparative Marker Selection Analysis page, shown in the following figure.
    ”Comparative Marker Selection analysis parameters”
  4. Select or define CMS analysis parameters, described in the following table. An asterisk indicates required fields. The default settings are valid; they should provide valid results.

    CMS Parameter

    Description

    Job Name*

    Assign a unique name to the analysis you are configuring.

    Preprocess Server*

    A server which hosts the grid-enabled data GenePattern PreProcess Dataset module. Select one from the list and caIntegrator will use the selected server for this portion of the processing.

    Comparative Server*

    A server which hosts the grid-enabled data GenePattern Comparative Marker Selection module. Select one from the list and caIntegrator will use the selected server for this portion of the processing.

    Annotation Queries and Lists*

    All subject annotation queries and gene lists with appropriate data for the analysis are listed. Select and move two or more queries from the All Available Queries panel to the Selected Queries panel using the Add > and Remove < buttons.
    <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1cb721ee9a7cf6b2-c0c7d826-4cc44dbe-9beba893-6ec000151e9019d3bc711629"><ac:plain-text-body><![CDATA[Note: The [SL] and [Q] prefixes to list names indicate "Subject Lists" or "Saved Queries". A "G" in the prefix indicates the list is Global. For more information, see [Creating a Gene or Subject List

    https://wiki.nci.nih.gov/x/FoDnAg#4-ViewingQueryResults-CreatingaGeneorSubjectList].

    ]]></ac:plain-text-body></ac:structured-macro>

    Filter Flag

    Variation filter and thresholding flag

    Preprocessing Flag*

    Discretization and normalization flag

    Min Change*

    Minimum fold change for filter

    Min Delta*

    Minimum delta for filter

    Threshold*

    Value for threshold

    Ceiling*

    Value for ceiling

    Max Sigma Binning*

    Maximum sigma for binning

    Probability Threshold*

    Value for uniform probability threshold filter

    Num Exclude*

    Number of experiments to exclude (max & min) before applying variation filter

    Log Base Two

    Whether to take the log base two after thresholding; default setting is "Yes".

    Number of Columns Above Threshold*

    Remove row if n columns are not >= than the given threshold
    In other words, the module can remove rows in which the given number of columns does not contain a value greater or equal to a user defined threshold.

    Test Direction*

    The test to perform (up-regulated for class0; up-regulated for class1, two sided). By default, Comparative Marker Selection performs the two-sided test.

    Test Statistic*

    Select the statistic to use.

    Min Std*

    The minimum standard deviation if test statistic includes the min std option. Used only if test statistic includes the min std option.

    Number of Permutations*

    The number of permutations to perform. (Use 0 to calculate asymptotic P-values.) The number of permutations you specify depends on the number of hypotheses being tested and the significance level that you want to achieve (3). The greater the number of permutations, the more accurate the P-value.
    Complete – Perform all possible permutations. By default, complete is set to No and Number of Permutations determines the number of permutations performed. If you have a small number of samples, you might want to perform all possible permutations.
    Balanced – Perform balanced permutations

    Random Seed*

    The seed for the random number generator.

    Smooth P-values

    Whether to smooth P-values by using the Laplace's Rule of Succession. By default, Smooth P-values is set to Yes, which means P-values are always less than 1.0 and greater than 0.0.

    Phenotype Test*

    Tests to perform when class membership has more than 2 classes: one versus-all, all pairs.
    Note: The P-values obtained from the one-versus-all comparison are not fully corrected for multiple hypothesis testing.

    • Comparative Marker Selection analysis options
      Anchor
      RTF31353237303a205461626c65
      RTF31353237303a205461626c65

     

...

  1. Select the study whose data you want to analyze in the upper right portion of the caIntegrator page. You must select a study with copy number (either Affymetrix SNP or Agilent Copy Number) data.
  2. Click GenePattern Analysis in the left sidebar of caIntegrator. This opens the GenePattern Analysis Status page.
  3. In the GenePattern Analysis Status page, select GISTIC (Grid Service) from the drop down list and click New Analysis Job. This opens the GISTIC Analysis page, shown in the following figure.
    ”GISTIC analysis criteria”
  4. Select or define GISTIC analysis parameters, as described in the following table. You must indicate a Job Name, but you can accept the other defaults settings, which are valid and should produce valid results.

    GISTIC Parameters

    Description

    Job Name*

    Assign a unique name to the analysis you are configuring.

    GISTIC Service Type*

    Select whether to use the GISTIC web service or grid service and provide or select the service address. If the web service is selected, authentication information is also required

    GenePattern User Name/Password

    Include these to log into GenePattern for the analysis.

    Annotation Queries and Lists

    All annotation queries display in this list as well as an option to select all non-control samples. Select an annotation query if you wish to run GISTIC on a subset of the data and select all non-control samples if wish to include all samples.

    Select Platform

    This option appears only if more than one copy number platform exists in the study. Select the appropriate platform from the drop-down list ().

    Exclude Sample Control Set

    From the drop-down list, select the name of the control set you want to exclude from the analysis. Click None if that is applicable.

    Amplifications Threshold*

    Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified. Default = 0.1.

    Deletions Threshold*

    Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions. Default = 0.1.

    Join Segment Size*

    Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number. Default = 4.

    <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="55856734744d38d5-85a7d42e-4e8a46a8-a67396ab-8a1b99a8d293587cb011cbe2"><ac:plain-text-body><![CDATA[

    QV Thresh[hold]*

    Threshold for q-values. Regions with q-values below this number are considered significant. Default = 0.25.

    ]]></ac:plain-text-body></ac:structured-macro>

    Remove X*

    Flag indicating whether to remove data from the X-chromosome before analysis. Allowed values = {1,0}. Default = 1(yes).

    cnv File

    This selection is optional.
    Browse for the file. There are two options for the CNV file.
    Option #1 enables you to identify CNVs by marker name. Permissible file format is described as follows:
    A two column, tab-delimited file with an optional header row. The marker names given in this file must match the marker names given in the markers_file. The CNV identifiers are for user use and can be arbitrary. The column headers are:

...