NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be undergoing maintenance on Thursday, May 23rd between 1200 ET and 1300 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Edit Study page, shown in the following figure, displays the Name and Description that you entered for a new study, or for an existing study that you are editing.
Edit Study pageImage Modified

To continue creating a study or to modify a study, on the Edit Study page complete these steps:

  1. Enter or change(if editing) the name and/or description, if you choose.
  2. Check the checkbox to make the study publicly available, if appropriate.
  3. For the study log feature, click View Log or Edit Log. See for details about the log.
  4. Click Save.
    Info
    titleNote

    You can save the study at any point in the process of creating it. You can resume the definition and deployment process later.

  5. If you choose to add a logo for the study, click the Browse button corresponding to Logo File. Navigate for the file, then click Upload Now. Once you save the study (or its edit), the logo displays in the center of the page (). On the home page for the study, the logo displays in the upper left, above the sidebar.
    Example of a logo added to the caIntegrator browser on the Edit Study page

To continue, you can add subject annotation data sources, genomic data sources or imaging data sources.

...

  1. To enter a new name annotation, or any other information about the annotation definition, click the New button and enter the information described in the following table.

    Annotation Field

    Field Description

    Name

    Enter the name for the annotation.

    Definition

    Enter the term(s) that define the annotation.

    Keywords

    Insert keyword(s) that can be used to find the annotation in a search, separated by commas.

    Data Type

    Select a string (default), numeric, or date.

    Apply Max Number Mask

    This field is available only for numeric-type annotations, or when a new definition is created. This feature is unavailable when permissible values are present.
    Select the box and enter a maximum number for the mask, such as "80" for age. When you query results above the value of the mask, then the system displays the mask and not the actual age.
    Note: If you enter masks of both "max number" and "range", caIntegrator applies both masks at the same time.
    The Data Dictionary page now has a Restrictions column that shows restrictions whenever a mask has been applied.

    Apply Numeric Range Mask

    This field is available only for numeric-type annotations, or when a new definition is created. This feature is unavailable when permissible values are present.
    Select the box and enter a width of range for the mask, such as "5" representing blocks of 5 years. For example, if you enter a width of 5, the query only allows age blocks of 0-5, 6-10, 11-15, etc. When you query results above the value of the mask, then the system displays the mask and not the actual age ranges.
    Note: If you enter masks of both "max number" and "range", caIntegrator applies both masks at the same time.
    The Data Dictionary page now has a Restrictions column that shows restrictions whenever a mask has been applied.

    permissible value, annotation definition;annotation definition:permissible valuesannotation definition:field definition entriesPermissible/Non-permissible Values

    Note: The first time you load a file, before you assign annotation definitions (), these panels may be blank. If the column header for the data is already "recognizable" by caIntegrator, the system makes a "guess" about the data type and assigns the values to the data type in the newly uploaded file. They will display in the Non-permissible values sections initially. Use the Add and Remove buttons to move the values shown from one list to the other, as appropriate.
    When you select or change annotation definitions by selecting matching definitions (described in ), this may add (or change) the list of non-permissible values in this section.
    If you leave all values for a field in the Non-permissible panel, then when you do a study search, you can enter free text in the query criteria for this field.
    If there are items in the Permissible values list, then the values for this annotation are restricted to only those values. When you perform a study search, you will select from a list of these values when querying this field. If there are no items in the permissible values list then the field is considered free to contain any value.
    To edit a field's permissible values, you must change the annotation definition. You can do this even after a study has been deployed.
    Note: You cannot edit permissible values in an existing annotation definition. To change permissible values, you must create a new annotation.

...

  1. The matches from caDSR display some of the details of the search results. To view more details of a match, such as permissible values, click View, which opens caDSR to the term. If you click Select, the caDSR definition automatically replaces the annotation definition for this field with which you are working. {
    Note
    :title=cautionTake note
    titlecaution

    Take care before you add a caDSR definition that it says exactly what you want. caDSR definitions can have minor nuances that require specific and limited applications of their use.

  2. Once you have settled on an appropriate field definition for the annotation, click Save. This returns you to the Define Fields for Subject Data page.
    Info
    titleNote

    If you have not clicked Select for alternate definitions in this dialog box, then click Save to return to the Define Field...dialog box without making any definition changes.

  3. From the Define Fields for Subject Data page, be sure and designate the data types for each field in the file. Click Save on each page to save your entries or click New to clear the fields and start again. You will not be able to proceed until every field definition entry on the Fields for Subject Data screen has an entry, one as the unique Identifier and the remainder as annotations.

The Data From File columns on the page display the column header values of the first three rows you designated as "annotations".

Tip
titleTip

Saving your entries in this way saves the study by name and description, but does not deploy the study. See .

The Edit Study page now displays a "Not Loaded" status for the file whose annotations (column headers) you have defined. An example of a file whose annotations have been defined but not yet loaded is shown in the following figure.
Example file whose annotations have been defined but not yet loadedImage Modified

Status definitions:

  • Definition Incomplete – An annotation definition or definitions must be modified on the Define Fields for Subject Data page. This status may be displayed because an identifier has not been selected. See .
  • Not Loaded – The annotation definitions must be loaded before a study can be deployed. If an error appears after attempting to load a subject annotation source, cick the Edit Annotations button which takes you to the Define Fields for Subject Data page where the problematic annotations will appear in red. See .
  • Loaded – The annotation definitions are properly loaded.
  1. Click the Load Subject Annotation Source button in the Action section to load the data file you have configured, The Deploy Study button, to this point has been unavailable, but this step activates the button.
    Info
    titleNote

    You can add as many files as are necessary for a study. Patients 1-20 in first file, 21-40 in second file, or many patients in first file and annotations in second file, etc. As long as IDs are defined correctly, it works.

  • Click Deploy Study. caIntegrator now loads data from the file to the caIntegrator database, and the file status changes to "Loaded".
    Info
    titleNote

    You can change assignments even after the study is deployed, using the Edit feature. For more information, see .

The Manage Studies page opens when the study is deployed. The Deployed status is indicated on the Manage Studies page as well as the Edit Study page. For more information, see .
You can continue to perform other tasks in caIntegrator while deployment is in process.
See also .

Info
titleNote

You can repeatedly upload additional or updated subject annotations, samples, image data, array data to the study at later intervals. These later imports do not remove any existing data; they instead insert any new subjects or update annotations for existing subjects.

Defining Survival Values

survival values, defining;defining survival valuesSurvival value is the length of time a patient lived. If you plan to analyze your caIntegrator data to create a Kaplan-Meier (K-M) Plot, then during the Annotation Definition process described above in , you should do one of two things:

  1. Make sure that you have defined at least three fields set to the "date" Data Type. These will be matched to the following three properties during Survival Value definition.
  • Survival Start Date
  • Death Date
  • Last Followup Date
    Info
    titleNote

    Setting survival values is optional if you do not plan to use the K-M plot analysis feature or if you do not have this kind of data (survival values) in the file.

  1. It is also possible to generate KM plots if an Annotation Field Descriptor such as DAYSTODEATH has been set to Data Type 'numeric'. See .

For some applications, such as REMBRANDT and I-SPY, survival values are pre-defined in the databases when you load the data. In caIntegrator, however, you can review and define survival value ranges in a data set you are uploading to a study. To be able to do so, you need to understand the kind of data that can comprise the survival values.
To set up survival values, follow these steps:

  • On the Edit Study page, click Edit Survival Values. This opens the Survival Value Definitions dialog box, shown in the following figure.
    Survival Value Definition dialog boxImage Modified
  1. Click New to enter new survival value definitions.
  • OR -
    Click Edit to edit existing survival value definitions.
  1. The dialog box extends, now displaying radio buttons and three drop-down lists that show column headers for date metadata in the spreadsheet you have uploaded. displays survival value ranges that have already been added to a study.
    Survival Definitions exampleImage Modified

Survival values can be defined by Date or by Length of time in study. Select the radio button for the category that defines your survival data.

In the drop-down lists, select the appropriate survival value definitions for each field listed. You might want to refer to the column headers in the data file itself. Dates covered by the definitions are already in the data set. You cannot enter specific dates.

  • Survival Definition Type – Select whether the survival time is defined by dates or length of time subject was in the study.
  • Name – Enter a unique name that adequately describes the survival values you are defining here. Example: Survival from Enrollment Date or Survival from Treatment Start. The name you enter displays later when you are selecting survivals to create the K-M plot.
  • Survival Length Units – Select the appropriate units for this data.
  • Survival Start Date – Select the column header for this data.
  • Death Date – Select the column header for this data.
  • Last Followup Date – Select the column header for this data.

See also on page 82.
Updated the Edit Survival Value Definitions page, now has a radio button and 2 different types of ways to define survival values.

Adding/Editing Genomic Data

Info
titleNote

Genomic data that is parsed and stored in caArray can be analyzed in caIntegrator. Additionally, supplemental files in caArray that have not been parsed can be uploaded and analyzed in caIntegrator.

Once you have loaded subject annotation data and identified patient IDs, you can add either one or more sets of array genomic sample data from caArray, which caIntegrator maps by sample IDs to the patient IDs in the subject annotation data, covered in this section, or you can load imaging files from NBIA, also mapped by IDs to the patient data, covered in . You can also edit genomic data information that you have already added to the study. Genomic sample data and imaging data are independent of each other, so neither is required before loading the other.

It is essential that you are well acquainted with the data you are working with--the subject annotation data, and the corresponding array data in caArray.

caIntegrator supports a limited number of array platforms. For more information, see .

To add genomic data to your caIntegrator study, follow these steps:

  1. On the Edit Study page where you have selected and added the subject annotation data, click the Add New button under Genomic Data Sources. You can upload genomic data only from caArray.

This opens the Edit Genomic Data Source dialog box. Enter the appropriate information in the fields (). This fields are described below.
Edit Genomic Source dialog boxImage Modified

  • caArray Web URL – Enter the URL for the caArray to be used for Edit Genomic Source dialog box"the genomic data sources. This will enable a user to link to the referenced caArray experiment from the study summary page.
  • caArray Host Name – Enter the hostname for your local installation or for the CBIIT installation of caArray, If you misspell it, you will receive an error message.
  • caArray JNDI Port – Enter the appropriate server port. See your administrator for more information. Example: For the CBIIT installation of caArray, enter 8080.
  • caArray Username and caArray Password – If the data is private, you must enter your caArray account user name and password; you must have permissions in caArray for the experiment. If the data is public, you can leave these fields blank.
  • caArray Experiment ID – Enter the caArray Experiment ID which you know corresponds with the subject annotation data you uploaded. Example: Public experiment "beer-00196" on the CBIIT installation of caArray (). If you misspell your entry, you will receive an error message.
  • Vendor – Select either Agilent or Affymetrix
  • Data Type – Select Expression or Copy Number.
  • Platform – If appropriate, select the Agilent or Affymetrix platform.
    Info
    titleNote

    Because you can add more than one set of genomic data to a study, a study can also have multiple platforms, one for each set of genomic data.

  • Central Tendency for Technical Replicates – If more than one hybridization is found for the reporter, the hybridizations will be represented by this method.
  • Indicate if technical replicates have high statistical variability – If more than one hybridization is found, checking this box will display a ** in the genomic search results when a reporter value has high statistical variability.
  • Standard Deviation Type - When the checkbox for indicating if technical replicates have high statistical variability is checked, this parameter becomes available. Select in the drop-down the calculation to be used to determine whether or not to display a ** (see previous bullet point).
    --Relative{*}, which calculates the Relative Standard Deviation in percentage value
    --Normal{*}, which calculates the Standard Deviation in numeric value
  • Standard Deviation Threshold – When the checkbox for indicating if technical replicates have high statistical variability is checked, this parameter becomes available. This is the threshold at which the Standard Deviation Type is exceeded and the reporter is marked with a **.
  1. Click Save.

caIntegrator goes to caArray, validates the information you have entered here, finds the experiment and retrieves all the sample IDs in the experiment. Once this finishes, the experiment information displays on the Edit Study page under the Genomic Data Sources section ().
Genomic Data Sources section of the Edit Study pageImage Modified

  1. If you want to redefine the caArray experiment information, you can edit it. Click the Edit link corresponding to the Experiment ID. The Edit Genomic Data Source dialog box reopens, allowing you to edit the information.

Mapping Genomic Data to Subject Annotation Data

Because the goal of caIntegrator is to integrate data from subject annotation, genomic and imaging data sources, data from uploaded source files must be mapped to each other. Mapping files can map to caArray genomic data of two types: "imported and parsed" and that stored in supplemental files.

Creating a Mapping File

You, as the caIntegrator study manager, must create a Subject to Sample mapping file before following the actual mapping steps. This file provides caIntegrator with the information for mapping patients to caArray samples.

  1. Start with the 6-column mapping file template, described as follows:
    • All platforms – Raw (level 1) data cannot be mapped; only normalized, processed (level 2) data is acceptable.
      **The required six-column file format uses the following columns:
      • Subject ID
      • Sample ID
  • Name of supplemental file (if appropriate, as attached to the experiment in caArray)
  • Probe Header – Name of column header (in the supplemental file) which contains the probe IDs.
  • Value Header – Name of column header (in the supplemental file) which holds the level 2 data.
  • Sample Header – Name of column header (in the supplemental file) which holds the level 2 data.
    Info
    titleNote

    Only one of the last 2 columns is used: a single sample per file uses the Value Header column; multiple samples per file used Sample Header column. Unused columns are blank.

The following figure shows an example multiple sample mapping file in CSV format.
"Mapping file in CSV formatImage Modified

  1. When you use the mapping file, make sure you use the patient ID for mapping.
  2. Determine whether your data in caArray is "imported and parsed" or "supplemental". Fill in the 6-column mapping file according to the following standard:
    • Imported and parsed – Complete only the first two columns of the 6-column mapping file as described above. You can ignore the remaining columns.
    • Supplemental – Supplemental data comes in two flavors: "single sample per file" and "multiple samples per file". Only one of the last two columns is used. If the supplemental data format is:
    • Single sample per file – the column named "Sample_Header" can be left empty.
    • Multiple samples per file – the column named "Value_Header" can be left empty.
      Info
      titleNote

      Supplemental files from caArray for mapping data must be configured appropriately. For information, see on page 135.

The following steps use data of either type.

Steps for Mapping Genomic Data

To map the samples from the caArray experiment to the patients in the subject annotation data you uploaded, follow these steps:

  1. On the Edit Study page, click the Map Samples button. This opens the Edit Sample Mappings page, shown in the following figure.
    Edit Sample Mappings page showing some already mapped samplesImage Modified
  2. The first two caArray fields may be populated with the information for the instance of caArray to which you have access. You can, however, enter the following caArray information, if appropriate.
    • caArray Host Name – Enter the hostname for your local installation or for the CBIIT installation of caArray, If you misspell it, you will receive an error message.
    • caArray JNDI Port – Enter the appropriate server port. See your administrator for more information. Example: For the CBIIT installation of caArray, enter 8080.
    • caArray Username – Enter your caArray account user name and password; you must have permissions in caArray for the experiment if it is private. If the data is public, you can leave this field blank.
    • caArray Experiment ID – Enter the caArray Experiment ID which you know corresponds with the subject annotation data you uploaded. Example: Public experiment "beer-00196" on the CBIIT installation of caArray (). If you misspell your entry, you will receive an error message.
  3. Enter the Loading Type of the data file you plan to map. (File types are described in ).
  4. In the Subject to Sample Mapping File section, click Browse to navigate for the Sample Mapping CSV file that you created (described in ). This provides caIntegrator with the information for mapping patients to caArray samples.
  5. Click the Map Samples button.
    If the caArray data you have identified is imported and parsed, when you click the Map Samples button, the mapping takes place as the data is uploaded into caIntegrator. If the caArray data is supplemental, the mapping does not occur until the study is deployed.
    Mapped samples will be listed in the Samples Mapped to Subjects section. Unmapped samples show at the top of the caIntegrator page. They were loaded from caArray, but they are not in the mapping file. These are not used for integration.
    Info
    titleNote

    If you have already mapped samples, when you first open this page they are listed in the Samples Mapped to Subjects section. If you have not already mapped samples, all of the samples in the caArray experiment you selected are listed as unmapped, because caIntegrator does not know how these sample names correlate to the patient data in the subject annotation file until you upload the subject to sample mapping file.

  • Scroll down the page to see samples that are mapped to the patients in the subject annotation data, as shown in the following example.

...

  • Example of samples mapped to patients' dataImage Added

Uploading Control Samples

A Control Samples file is used to calculate fold change data, which compares "tumor" sample gene expression in the caArray experiment to the control samples to identify those that exhibit up or down gene regulation. Control samples can be the "normal" samples, but that is not necessarily the case.

To upload the control samples, follow these steps:

  1. On the Edit Sample Mappings page, click the Map Samples link.
  2. Click Browse to navigate for the control samples file, and click the Upload Control Samples File button. The control sets display at the top of the page once they have been uploaded, as shown in the following example.
    Example list of control samplesImage Removed Example list of control samplesImage Added
    The control samples now display toward the bottom of the page.

This information will be used when performing other tasks in caIntegrator, to be described in other sections.

Info
titleNote

If a Control Set is to be used in Gene Expression For Annotation, or Gene Expression plots for Annotation Query, then the control set should be composed of only samples which are mapped to subjects.

Configuring Copy Number Data

You can add copy number data for a genomic data source by uploading the mapping file. This allows you to configure parameters to be used when segmentation data is being configured.

The name specified in the third column of the mapping file is specific for each array manufacturer as follows:

  • Affymetrix – The third column of the mapping file must contain filenames that end in .cnchp. The corresponding experiment in caArray must have these files and the extensions must match .cnchp.
  • Agilent – The third column must name a file which contains level 2 copy number data. Level one copy number will not work. This file name is repeated for each line in the mapping file.

To add copy number data relating to the genomic data you are adding, follow these steps:

  1. In the Genomic Data Sources section, for the data you have already added, click Configure Copy Number Data button.
    Info
    titleNote

    This link is available only if you have uploaded copy number data and you are configuring a Copy Number data type (as indicated by the Data Type column on the Edit Study page).

    The Edit Copy Number page, shown in the following figure, opens.
    Edit Copy Number pageImage Modified
  2. Browse for and enter appropriate information to identify and retrieve the copy number mapping file. The fields are described in the following table. An asterisk indicates a required field.

    Field

    Description

    caArray Service Host Name

    Enter the hostname for your local installation or for the CBIIT installation of caArray, If you misspell it, you will receive an error message.

    caArray Experiment ID

    Enter the caArray Experiment ID which you know corresponds with the copy number data.

    Loading Type

    Enter the Loading Type of the data file you plan to map.

    Subject and Sample Mapping File

    Browse for the appropriate CN mapping file. The file must be a CSV file with 3 column format for mapping data files. Supplemental data uses 6 column-files.

    Bioconductor Service Type

    This is the type of bioconductor module that will be used for segmentation. Select between the two options: DNAcopy or CGHcall.

    caCGHcallcaCGHcall Service URL

    Enter the URL for the grid segmentation service used to access the caCGHcall service. For more information, see

    Call Level

    An input parameter to CGHcall. This is the number of discrete values used to represent the copy number level. Select between two options: 3 (consisting of discrete values of -1, 0, 1) or 4 (consisting of discrete values -1, 0, 1, 2)

    caDNACopycaDNACopy Service URL*

    Control for selecting the URL which hosts the caDNACopy grid service For more information, see

    Change Point Significance Level

    Significance levels for the test to accept change-points

    Early Stopping Criterion

    The sequential boundary used to stop and declare a change

    Permutation Replicates

    The number of permutations used for p-value computation

    Random Number Seed

    The segmentation procedure uses a permutation reference distribution. This should be used if you plan to reproduce the results.

  3. Click Save Segmentation Data Calculation Configuration for a genomic data source. On the screen upload a copy number mapping file (format: subject id, sample id, file name) and configure the parameters to be sent when computing segmentation data.
    Note
    titleBe Careful

    After a study has been deployed and the genomic source has been loaded, you cannot change these copy number parameters without reloading the data from caArray first.

Remapping Copy Number Data in a Deployed Study

...

Click the link to open a page that displays appropriately formatted web page links; an example is shown in the following figure.
An example of exernal linksImage Modified

Deploying the Study

...