NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be undergoing maintenance on Thursday, May 23rd between 1200 ET and 1300 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel2

Creating a Study – Overview

You can create a caIntegrator study by importing subject annotation study data, genomics data and imaging data, using a combination of spreadsheet/files and existing caGrid applications as source data. Each instance of caIntegrator can support multiple studies. As the manager creating a study, it is important that you understand the study well and that the data you wish to aggregate has been submitted to the applications whose data can be integrated in caIntegrator.

...

As you create the study, you define its structure, identifying the data sources and mapping the data between different source data. After the study has been created and deployed, you can perform analyses of the data in the study.

Configuring and Deploying a Study

Info
titleNote

Only a user with a Study Manager role can create a study.

...

  1. In the Study Management section of the left sidebar, click Create New Study.
  2. In the Create New Study dialog box that opens, provide a name and description for the study you are creating (). Create Study pageImage Modified

...

  1. Click Save.

This opens an Edit Study page where you can add identify data files for your study. See .

Creating/Editing a Study

The Edit Study page, shown in the following figure, displays the Name and Description that you entered for a new study, or for an existing study that you are editing. Edit Study page

...

To continue, you can add subject annotation data sources, genomic data sources or imaging data sources.

Viewing/Editing a Log

On the Edit Study page, as a study manager you can open a detailed log for the study.

  1. Click View Log on the Edit Study page to simply review an existing log. The log records all steps comprising activity in the study, with the most recent displaying at the top of the log.
  2. To edit a log, click Edit Log on the Edit Study page.
  3. Add an appropriate description/annotations to the individual log entries.
  4. Check the Update box next to the description, then click Save to save the edits. The descriptions will now be available when any user views the log.

See also on page 12.

Working with Annotations – An Overview

One of the most important factors in creating a study in caIntegrator is in properly annotating the data. Because the process can be relatively complex, you might want to review the steps for working with annotations.
Annotation workflow summary:

  1. Add an annotation group. This optional step is for users who have a rigid data dictionary of all annotations relevant to the study. This step can also be helpful in cases where a study has many annotations. For more information, see .
  2. Add subject annotation data. This consists of multiple sub-steps.
  3. Add a new subject annotation data sources file. This step uploads the file and starts the workflow for assigning uploaded data definitions. See , step .
    1. Edit the annotations. This step opens the Define Fields for Subject Data page. See , step .
    2. In the Define Fields for Subject Data page, review possible definitions in the annotation group associated with this study. See .
    3. Assign the visibility of each annotation definition. See , step .
    4. Locate and verify the assignment as "identifier" for one annotation. See .
    5. Review, verify and assign definitions for each annotation. You can do this in one of four ways:
      --Accept existing default definitions as described in the associated annotation group. See .
      --Create or manage definitions manually. See .
      --Search for and use definitions existing in other caIntegrator studies. see .
      --Search for and use definitions from caDSR. see .
  4. Load the Subject Annotation Source. Up until this point, you can periodically save your work with the annotations, but before you can deploy the study, you must complete this step.
  5. Deploy the study. See .

Adding An Annotation Group

This topic opens from both the Create Annotation Group page and the Edit Annotation Group page. If you plan to create a group, continue with this topic. If you plan to edit an annotation group, see .
annotation group:adding;adding:annotation groupAn annotation group is a group of annotation definitions configured in a CSV file. This feature is primarily meant for the Study Manager who knows that they have tightly restricted vocabulary definitions that are relevant to a study. In this optional step, you can review the uploaded Group Definition Source file before assigning the appropriate definition for your study.
To add an annotation group, follow these steps:

...

Info
titleNote

Annotation definitions by default are visible only to the Study Manager's group. They are not visible to all caIntegrator users, unless you change the visibility for each. See

Editing an Annotation Group

This topic opens from the Edit Annotation Group page. You may want to refer to if you are adding a group for the first time.
annotation group:editing;editing:annotation groupTo edit an annotation group, on the Edit Study page for a study with an existing annotation group, click the Edit Group button.

  1. You can change the Name and Description for the group.
  2. A list of annotation definitions applied to the original annotation group displays on the Edit Annotation Group page.
    1. In the drop-down list, you can select a different annotation group for the annotation definition.
    2. You can change the visibility for the annotation definition.
    3. Click Change Assignment to modify the properties of the annotation definition.
  3. Click Update Annotations to confirm your edits for the group.

Adding Subject Annotation Data

The Edit Study page, described in , opens after you save a new study or click to edit an existing study.

...

From this page you can initiate editing the annotations. In the Subject Annotation Data Sources section, click Edit Annotations corresponding to the subject annotations that have been uploaded for the study. This open the .

Define Fields Page for Editing Annotations

study:editing subject annotations;subject annotation:editing;editing:subject annotationThe Define Fields for Subject Data page opens when you click Edit Annotations in the Subject Annotation Data Sources or the Image Data Sources section of the Edit Study page (). The exception to this is if you have not yet imported annotations for the imaging data for the study, In that case, when you click the Edit Annotations button in the Imaging Data Sources section, a page opens where you can identify and upload image annotation data ().

...

  1. To indicate the unique identifier of choice, on the row showing the column header (PatientID in the figure, but other examples are subject identifier, sample identifier, etc), click Change Assignment in the Field Definition column.

Assigning An Identifier or Annotation*

assigning, annotation identifier;annotation:assigning identifierWhen you click Change Assignment on the Define Fields... page, the Assign Annotation Definition for Field Descriptor dialog box opens (). On this page you can change the column type and the field definition for the specific data field you selected.

...

  1. To enter a new name annotation, or any other information about the annotation definition, click the New button and enter the information described in the following table.

    Annotation Field

    Field Description

    Name

    Enter the name for the annotation.

    Definition

    Enter the term(s) that define the annotation.

    Keywords

    Insert keyword(s) that can be used to find the annotation in a search, separated by commas.

    Data Type

    Select a string (default), numeric, or date.

    Apply Max Number Mask

    This field is available only for numeric-type annotations, or when a new definition is created. This feature is unavailable when permissible values are present.
    Select the box and enter a maximum number for the mask, such as "80" for age. When you query results above the value of the mask, then the system displays the mask and not the actual age.
    Note: If you enter masks of both "max number" and "range", caIntegrator applies both masks at the same time.
    The Data Dictionary page now has a Restrictions column that shows restrictions whenever a mask has been applied.

    Apply Numeric Range Mask

    This field is available only for numeric-type annotations, or when a new definition is created. This feature is unavailable when permissible values are present.
    Select the box and enter a width of range for the mask, such as "5" representing blocks of 5 years. For example, if you enter a width of 5, the query only allows age blocks of 0-5, 6-10, 11-15, etc. When you query results above the value of the mask, then the system displays the mask and not the actual age ranges.
    Note: If you enter masks of both "max number" and "range", caIntegrator applies both masks at the same time.
    The Data Dictionary page now has a Restrictions column that shows restrictions whenever a mask has been applied.

    permissible value, annotation definition;annotation definition:permissible valuesannotation definition:field definition entriesPermissible/Non-permissible Values

    Note: The first time you load a file, before you assign annotation definitions (), these panels may be blank. If the column header for the data is already "recognizable" by caIntegrator, the system makes a "guess" about the data type and assigns the values to the data type in the newly uploaded file. They will display in the Non-permissible values sections initially. Use the Add and Remove buttons to move the values shown from one list to the other, as appropriate.
    When you select or change annotation definitions by selecting matching definitions (described in ), this may add (or change) the list of non-permissible values in this section.
    If you leave all values for a field in the Non-permissible panel, then when you do a study search, you can enter free text in the query criteria for this field.
    If there are items in the Permissible values list, then the values for this annotation are restricted to only those values. When you perform a study search, you will select from a list of these values when querying this field. If there are no items in the permissible values list then the field is considered free to contain any value.
    To edit a field's permissible values, you must change the annotation definition. You can do this even after a study has been deployed.
    Note: You cannot edit permissible values in an existing annotation definition. To change permissible values, you must create a new annotation.

Searching for Annotation Definitions

annotation:searching for definitions;searching:annotation definitionsAn An alternative to creating a new definition is to search for annotation definitions already present in caIntegrator studies or in caDSR.

...

Info
titleNote

You can repeatedly upload additional or updated subject annotations, samples, image data, array data to the study at later intervals. These later imports do not remove any existing data; they instead insert any new subjects or update annotations for existing subjects.

Defining Survival Values

survival values, defining;defining survival valuesSurvival value is the length of time a patient lived. If you plan to analyze your caIntegrator data to create a Kaplan-Meier (K-M) Plot, then during the Annotation Definition process described above in , you should do one of two things:

...

See also on page 82.
Updated the Edit Survival Value Definitions page, now has a radio button and 2 different types of ways to define survival values.

Adding/Editing Genomic Data

Info
titleNote

Genomic data that is parsed and stored in caArray can be analyzed in caIntegrator. Additionally, supplemental files in caArray that have not been parsed can be uploaded and analyzed in caIntegrator.

...

  1. On the Edit Study page where you have selected and added the subject annotation data, click the Add New button under Genomic Data Sources. You can upload genomic data only from caArray.

Wiki MarkupThis opens the Edit Genomic Data Source dialog box. Enter the appropriate information in the fields (). This fields are described below. !worddav946e5d24c73cdc9a7c76170b895e5e66.png|vspace=4, alt="! Edit Genomic Source dialog boxImage Added

  • caArray Web URL – Enter the URL for the caArray to be used for Edit Genomic Source dialog box"the genomic data sources. This will enable a user to link to the referenced caArray experiment from the study summary page.
  • caArray Host Name – Enter the hostname for your local installation or for the CBIIT installation of caArray, If you misspell it, you will receive an error message.
  • caArray JNDI Port – Enter the appropriate server port. See your administrator for more information. Example: For the CBIIT installation of caArray, enter 8080.
  • caArray Username and caArray Password – If the data is private, you must enter your caArray account user name and password; you must have permissions in caArray for the experiment. If the data is public, you can leave these fields blank.
  • caArray Experiment ID – Enter the caArray Experiment ID which you know corresponds with the subject annotation data you uploaded. Example: Public experiment "beer-00196" on the CBIIT installation of caArray (). If you misspell your entry, you will receive an error message.
  • Vendor – Select either Agilent or Affymetrix
  • Data Type – Select Expression or Copy Number.
  • Platform – If appropriate, select the Agilent or Affymetrix platform.
    Info
    titleNote

    Because you can add more than one set of genomic data to a study, a study can also have multiple platforms, one for each set of genomic data.

  • Central Tendency for Technical Replicates – If more than one hybridization is found for the reporter, the hybridizations will be represented by this method.
  • Indicate if technical replicates have high statistical variability – If more than one hybridization is found, checking this box will display a ** in the genomic search results when a reporter value has high statistical variability.
  • Standard Deviation Type - When the checkbox for indicating if technical replicates have high statistical variability is checked, this parameter becomes available. Select in the drop-down the calculation to be used to determine whether or not to display a ** (see previous bullet point).
    --Relative{*}, which calculates the Relative Standard Deviation in percentage value
    --Normal{*}, which calculates the Standard Deviation in numeric value
  • Standard Deviation Threshold – When the checkbox for indicating if technical replicates have high statistical variability is checked, this parameter becomes available. This is the threshold at which the Standard Deviation Type is exceeded and the reporter is marked with a **.

...

caIntegrator goes to caArray, validates the information you have entered here, finds the experiment and retrieves all the sample IDs in the experiment. Once this finishes, the experiment information displays on the Edit Study page under the Genomic Data Sources section (). Genomic Data Sources section of the Edit Study pageImage Modified

...

  1. If you want to redefine the caArray experiment information, you can edit it. Click the Edit link corresponding to the Experiment ID. The Edit Genomic Data Source dialog box reopens, allowing you to edit the information.

Mapping Genomic Data to Subject Annotation Data

Because the goal of caIntegrator is to integrate data from subject annotation, genomic and imaging data sources, data from uploaded source files must be mapped to each other. Mapping files can map to caArray genomic data of two types: "imported and parsed" and that stored in supplemental files.

Creating a Mapping File

You, as the caIntegrator study manager, must create a Subject to Sample mapping file before following the actual mapping steps. This file provides caIntegrator with the information for mapping patients to caArray samples.

...

The following figure shows an example multiple sample mapping file in CSV format.
"Mapping file in CSV formatImage Modified

  • AnchorRTF33393332323a204361707469RTF33393332323a204361707469Mapping file in CSV format, showing multiple samplesWhen you use the mapping file, make sure you use the patient ID for mapping.
  • Determine whether your data in caArray is "imported and parsed" or "supplemental". Fill in the 6-column mapping file according to the following standard:
  • Imported and parsed – Complete only the first two columns of the 6-column mapping file as described above. You can ignore the remaining columns.
  • Supplemental – Supplemental data comes in two flavors: "single sample per file" and "multiple samples per file". Only one of the last two columns is used. If the supplemental data format is:
  • Single sample per file – the column named "Sample_Header" can be left empty.
  • Multiple samples per file – the column named "Value_Header" can be left empty.
  • Supplemental files from caArray for mapping data must be configured appropriately. For information, see on page 135.

The following steps described in use data of either type.
AnchorRTF35343631363a204865616469RTF35343631363a204865616469

Steps for Mapping Genomic Data*

To map the samples from the caArray experiment to the patients in the subject annotation data you uploaded, follow these steps:

  • On the Edit Study page, click the Map Samples button. This opens the Edit Sample Mappings page (). "Edit Sample Mappings page showing some already mapped samplesImage Modified AnchorRTF37363939313a204361707469RTF37363939313a204361707469Edit Sample Mappings page showing some already mapped samples
  • The first two caArray fields may be populated with the information for the instance of caArray to which you have access. You can, however, enter the following caArray information, if appropriate.
  • caArray Host Name – Enter the hostname for your local installation or for the CBIIT installation of caArray, If you misspell it, you will receive an error message.
  • caArray JNDI Port – Enter the appropriate server port. See your administrator for more information. Example: For the CBIIT installation of caArray, enter 8080.
  • caArray Username – Enter your caArray account user name and password; you must have permissions in caArray for the experiment if it is private. If the data is public, you can leave this field blank.
  • caArray Experiment ID – Enter the caArray Experiment ID which you know corresponds with the subject annotation data you uploaded. Example: Public experiment "beer-00196" on the CBIIT installation of caArray (). If you misspell your entry, you will receive an error message.
  • Enter the Loading Type of the data file you plan to map. (File types are described in ).
  • In the Subject to Sample Mapping File section, click Browse to navigate for the Sample Mapping CSV file that you created (described in ). This provides caIntegrator with the information for mapping patients to caArray samples.
  • Click the Map Samples button.

...

  • If you have already mapped samples, when you first open this page they are listed in the Samples Mapped to Subjects section. If you have not already mapped samples, all of the samples in the caArray experiment you selected are listed as unmapped, because caIntegrator does not know how these sample names correlate to the patient data in the subject annotation file until you upload the subject to sample mapping file.
  • Scroll down the page to see samples that are mapped to the patients in the subject annotation data (). Example of samples mapped to patients' data

Uploading Control Samples

control samples, uploading;study:uploading control samples to;fold change:control samples fileA Control Samples file is used to calculate fold change data, which compares "tumor" sample gene expression in the caArray experiment to the control samples to identify those that exhibit up or down gene regulation. Control samples can be the "normal" samples, but that is not necessarily the case.
To upload the control samples, follow these steps:

...

  • This information will be used when performing other tasks in caIntegrator, to be described in other sections.
  • If a Control Set is to be used in Gene Expression For Annotation, or Gene Expression plots for Annotation Query, then the control set should be composed of only samples which are mapped to subjects.

Configuring Copy Number Data

study:configuring copy number data;copy number:configuring data;genomic data:adding copy number data to;configuring:copy number dataYou can add copy number data for a genomic data source by uploading the mapping file. This allows you to configure parameters to be used when segmentation data is being configured.
The name specified in the third column of the mapping file is specific for each array manufacturer as follows:

...

  • Browse for and enter appropriate information to identify and retrieve the copy number mapping file. The fields are described in the following table. . An asterisk* indicates a required field.
    Fields for retrieving a copy number mapping file

    .

    Field

    Description

    caArray Service Host Name

    Enter the hostname for your local installation or for the CBIIT installation of caArray, If you misspell it, you will receive an error message.

    caArray Experiment ID

    Enter the caArray Experiment ID which you know corresponds with the copy number data.

    Loading Type

    Enter the Loading Type of the data file you plan to map.

    Subject and Sample Mapping File

    Browse for the appropriate CN mapping file. The file must be a CSV file with 3 column format for mapping data files. Supplemental data uses 6 column-files.

    Bioconductor Service Type

    This is the type of bioconductor module that will be used for segmentation. Select between the two options: DNAcopy or CGHcall.

    caCGHcallcaCGHcall Service URL

    Enter the URL for the grid segmentation service used to access the caCGHcall service. For more information, see

    Call Level

    An input parameter to CGHcall. This is the number of discrete values used to represent the copy number level. Select between two options: 3 (consisting of discrete values of -1, 0, 1) or 4 (consisting of discrete values -1, 0, 1, 2)

    caDNACopycaDNACopy Service URL*

    Control for selecting the URL which hosts the caDNACopy grid service For more information, see

    Change Point Significance Level

    Significance levels for the test to accept change-points

    Early Stopping Criterion

    The sequential boundary used to stop and declare a change

    Permutation Replicates

    The number of permutations used for p-value computation

    Random Number Seed

    The segmentation procedure uses a permutation reference distribution. This should be used if you plan to reproduce the results.

    Anchor
    RTF32323739393a205461626c65RTF32323739393a205461626c65

     

  • Click Save Segmentation Data Calculation Configuration for a genomic data source. On the screen upload a copy number mapping file (format: subject id, sample id, file name) and configure the parameters to be sent when computing segmentation data.
  • After a study has been deployed and the genomic source has been loaded, you cannot change these copy number parameters without reloading the data from caArray first.

Remapping Copy Number Data in a Deployed Study

...

copy number:remapping data, deployed studyOccasionally you may need to remap copy number data in a deployed study. To do so, follow these steps:

...

See also .
2. hWorking with Imaging Data
study:working with imaging data;imaging data:working withOnce you have loaded subject annotation data and identified patient IDs, you can add either array genomic sample data from caArray which caIntegrator maps by sample IDs to the patient IDs in the subject annotation data, or you can upload image data from NBIA, also mapped by IDs to the subject data. Once you have configured an NBIA image data source for adding images, then you can import image annotation data for the images. Genomic sample data and imaging data are independent of each other, so neither is required before loading the other.
It is essential that you are well acquainted with the data you are working with--the subject annotation data, and the corresponding imaging data in NBIA.

Adding or Editing Imaging Data Files from NBIA

study:adding imaging data;imaging data:adding to study;adding:imaging data;NBIA:adding files to caIntegrator;editing imaging files;imaging data:editing NBIA images sourcesTo add images from NBIA to the study you are creating, follow these steps:

  • On the Edit Study page, under the Imaging Data Sources section click the Add New button.
  • If you have already provided an imaging data source, it is listed in this section of the Edit Study page. To edit the imaging data source, click the Edit button which opens the same dialog box described in the following steps.
  • In the Edit Imaging Data Source dialog box, configure the imaging data source in the fields (). Asterisks indicate required fields.. Edit Image Data Source dialog boxImage Modified AnchorRTF39333333373a204361707469RTF39333333373a204361707469Edit Image Data Source dialog box
  • NBIA Server Grid URL* – Enter the URL for the grid connection to NBIA.
  • NBIA Web URL *– Enter the URL of the web interface of the NBIA installation.
  • NBIA Username and NBIA Password. This information is not required, as currently all data in the NBIA grid is Public data.
  • Collection Name* – Enter the name/source for the collection you want to retrieve.
  • Current Mapping – If a mapping file has already been uploaded to the study to map imaging data, the file name displays here. AnchorRTF39363233363a2042756c6c65RTF39363233363a2042756c6c65
  • Select Mapping File Type** – Click to select the file type:
  • Auto – No file is required. Selecting this takes all subject annotation subject IDs and attempts to map them to the corresponding ID in the collection in NBIA. If the ID does not exist in NBIA, then no mapping is made for that ID.
  • By Subject – Requires a mapping file to be uploaded. The "subject annotation to imaging mapping file" must be in CSV format with two columns that map the caIntegrator subject annotation subject ID to the NBIA subject ID.
  • By Image Series – Requires a file to be uploaded. The subject annotation to imaging mapping file needs to be a two column mapping (CSV) from the caIntegrator subject annotation subject ID to the NBIA study instance UID.
  • Subject to Imaging Mapping File – Click Browse to navigate to the appropriate subject annotation to imaging mapping file. See Select Mapping File Type* field description.
  • If mapping files have already been uploaded for the data sources you are editing, the Image Mapping tables of the dialog box show the mapping from NBIA Image Series Identifier to caIntegrator Subject Identifier.
  • Click Save to upload the data from NBIA to caIntegrator.

...

  • Once the data is uploaded, you can add image annotations. For more information, see .

Adding or Editing Image Annotations

After you have configured an image data source with an NBIA Grid service and uploaded the image data, described in , you can load image annotations into caIntegrator from a file in CSV format or through an Annotations and Image Markup (AIM) service.

...

Click the link to open a page that displays appropriately formatted web page links (). An example of exernal links

Deploying the Study

When you are ready to deploy the study, click the Deploy Study button on the Edit Study page. caIntegrator retrieves the selected data from the data service(s) you defined and makes the study available to a study manager or to anyone else who may want to analyze the study's data. Using the Manage Studies feature, you can then configure and share data queries and data lists with all investigators who access the study.
Note that you can continue to work in caIntegrator while study is being deployed. See also .

Managing a Study

  • A user without management privileges has no access to this section of caIntegrator.

...

  • Click the Delete link to delete the corresponding study.

Managing Platforms

Integrator supports a limited number of array platforms, all of which originate from Agilent or Affymetrix. While they do not represent all of the platforms supported by caArray, caIntegrator must have array definitions loaded for the platforms it supports, and be able to properly load the data from caArray and parse it.
You can create a study without genomic data, but you cannot add genomic data to a caIntegrator study without a corresponding supported array platform. If you add more than one set of genomic data to the study, you can specify more than one platform for the study.
On the Manage Platforms page, you can identify, add or remove supported platforms.
To manage platforms in caIntegrator, follow these steps:

...