This chapter describes the processes for creating and managing studies in caIntegrator. Topics in this chaptersection include:

Creating a Study – Overview

You can create a caIntegrator study by importing subject annotation study data, genomics data and imaging data, using a combination of spreadsheet/files and existing caGrid applications as source data. Each instance of caIntegrator can support multiple studies. As the manager creating a study, it is important that you understand the study well and that the data you wish to aggregate has been submitted to the applications whose data can be integrated in caIntegrator.

As you create the study, you define its structure, identifying the data sources and mapping the data between different source data. After the study has been created and deployed, you can perform analyses of the data in the study.

Configuring and Deploying a Study

Only a user with a Study Manager role can create a study.

When you create a study:creating;creating:study;deploying studystudy, you must specify different data-types (subject annotation, array, image, etc), data sources (caGrid applications – caArray and NBIA) and map the data, (patient to sample, image series, etc.).

To create a new study, follow these steps:

  1. In the Study Management section of the left sidebar, click Create New Study.
  2. In the Create New Study dialog box that opens, provide a name and description for the study you are creating (). Create Study page
  3. Click Save.

This opens an Edit Study page where you can add identify data files for your study. See .

Creating/Editing a Study

The Edit Study page, shown in the following figure, displays the Name and Description that you entered for a new study, or for an existing study that you are editing. Edit Study page

To continue creating a study or to modify a study, on the Edit Study page complete these steps:

  1. Enter or change(if editing) the name and/or description, if you choose.
  2. Check the checkbox to make the study publicly available, if appropriate.
  3. For the study log feature, click View Log or Edit Log. See for details about the log.
  4. Click Save.

    You can save the study at any point in the process of creating it. You can resume the definition and deployment process later.

  5. If you choose to add a logo for the study, click the Browse button corresponding to Logo File. Navigate for the file, then click Upload Now. Once you save the study (or its edit), the logo displays in the center of the page (). On the home page for the study, the logo displays in the upper left, above the sidebar. Example of a logo added to the caIntegrator browser on the Edit Study page

To continue, you can add subject annotation data sources, genomic data sources or imaging data sources.

Viewing/Editing a Log

On the Edit Study page, as a study manager you can open a detailed log for the study.

  1. Click View Log on the Edit Study page to simply review an existing log. The log records all steps comprising activity in the study, with the most recent displaying at the top of the log.
  2. To edit a log, click Edit Log on the Edit Study page.
  3. Add an appropriate description/annotations to the individual log entries.
  4. Check the Update box next to the description, then click Save to save the edits. The descriptions will now be available when any user views the log.

See also on page 12.

Working with Annotations – An Overview

One of the most important factors in creating a study in caIntegrator is in properly annotating the data. Because the process can be relatively complex, you might want to review the steps for working with annotations.
Annotation workflow summary:

  1. Add an annotation group. This optional step is for users who have a rigid data dictionary of all annotations relevant to the study. This step can also be helpful in cases where a study has many annotations. For more information, see .
  2. Add subject annotation data. This consists of multiple sub-steps.
  3. Add a new subject annotation data sources file. This step uploads the file and starts the workflow for assigning uploaded data definitions. See , step .
    1. Edit the annotations. This step opens the Define Fields for Subject Data page. See , step .
    2. In the Define Fields for Subject Data page, review possible definitions in the annotation group associated with this study. See .
    3. Assign the visibility of each annotation definition. See , step .
    4. Locate and verify the assignment as "identifier" for one annotation. See .
    5. Review, verify and assign definitions for each annotation. You can do this in one of four ways:
      --Accept existing default definitions as described in the associated annotation group. See .
      --Create or manage definitions manually. See .
      --Search for and use definitions existing in other caIntegrator studies. see .
      --Search for and use definitions from caDSR. see .
  4. Load the Subject Annotation Source. Up until this point, you can periodically save your work with the annotations, but before you can deploy the study, you must complete this step.
  5. Deploy the study. See .

Adding An Annotation Group

This topic opens from both the Create Annotation Group page and the Edit Annotation Group page. If you plan to create a group, continue with this topic. If you plan to edit an annotation group, see .
annotation group:adding;adding:annotation groupAn annotation group is a group of annotation definitions configured in a CSV file. This feature is primarily meant for the Study Manager who knows that they have tightly restricted vocabulary definitions that are relevant to a study. In this optional step, you can review the uploaded Group Definition Source file before assigning the appropriate definition for your study.
To add an annotation group, follow these steps:

  1. On the Edit Study page for a study, Annotation Groups section, click the Add New button.
  2. On the Edit Annotation Group page that opens, enter a name for the annotation group.
  3. Enter a description (optional).
  4. Browse for the Group Definition Source CSV file.
    The CSV file must include columns with these column headers in the first row: File Column Name, Field Type, Entity Type, CDE ID, CDE Version, Annotation Def Name, Data Type, Permissible, and Visible. Subsequent rows in the file define each subject annotation column in the subject annotation file.
    1. If a subject annotation is defined by a CDE Public ID, values for the following columns are required: File Column Name, Field Type, Entity Type, CDE ID, and Visible; a value for CDE Version is optional.
      – OR –
    2. If a subject annotation definition is not defined by a CDE Public ID, values for the following columns are required: File Column Name, Field Type, Entity Type, Annotation Def Name, Data Type (String, Date, Numeric), Permissible (Yes or No), and Visible (Yes or No).
  5. Click Save. This uploads the file, whose name now displays on the Edit Study page under Annotation Groups.

When you open the Define Fields for Subject Data page (see ), the annotation definitions in the file you uploaded display on the page, available for assignment in the study. Additionally, you can view the definitions by viewing the annotation group listed in the first column of the matrix.

Annotation definitions by default are visible only to the Study Manager's group. They are not visible to all caIntegrator users, unless you change the visibility for each. See

Editing an Annotation Group

This topic opens from the Edit Annotation Group page. You may want to refer to if you are adding a group for the first time.
annotation group:editing;editing:annotation groupTo edit an annotation group, on the Edit Study page for a study with an existing annotation group, click the Edit Group button.

  1. You can change the Name and Description for the group.
  2. A list of annotation definitions applied to the original annotation group displays on the Edit Annotation Group page.
    1. In the drop-down list, you can select a different annotation group for the annotation definition.
    2. You can change the visibility for the annotation definition.
    3. Click Change Assignment to modify the properties of the annotation definition.
  3. Click Update Annotations to confirm your edits for the group.

Adding Subject Annotation Data

The Edit Study page, described in , opens after you save a new study or click to edit an existing study.

To add subject annotation metadata on this page, follow these steps:

  1. In the Subject Annotation Data Sources section of the page, click the Add New button. The page expands to reveal new fields for you to identify information about the annotation data sources.
  2. Navigate to locate a subject annotation data file which is required for a study. Files must be in CSV file format.
  3. Click the appropriate box if you want caIntegrator to Create an annotation definition if one is not found.
  4. Click Upload Now to load the annotation source data.

After the data file is uploaded to this study, it will be listed in the Subject Annotation Data Sources section of the Edit Study page.

From this page you can initiate editing the annotations. In the Subject Annotation Data Sources section, click Edit Annotations corresponding to the subject annotations that have been uploaded for the study. This open the .

Define Fields Page for Editing Annotations

study:editing subject annotations;subject annotation:editing;editing:subject annotationThe Define Fields for Subject Data page opens when you click Edit Annotations in the Subject Annotation Data Sources or the Image Data Sources section of the Edit Study page (). The exception to this is if you have not yet imported annotations for the imaging data for the study, In that case, when you click the Edit Annotations button in the Imaging Data Sources section, a page opens where you can identify and upload image annotation data ().

If this Define Fields page opens after clicking the Edit Annotations button, working with this page is identical for both subject and image annotations Define Fields for Subject Data page

The first column of the table on this page displays annotation groups that have been created for this study. For more information, see .
To add subject or image annotation metadata in this page, follow these steps:

  1. You can specify visibility of specified annotation data in the Visible column.
  2. The Annotation Header from File column on the Define Fields for Subject (or Image) Data page displays column headers taken from the source CSV file. The page also displays data values in the file you have designated. You must map each column name to an existing column name in the caIntegrator database or in caDSR. If it doesn't yet exist, you can create a custom column name, as shown in the following example figure.
    Example of a source CSV file whose data you are mapping in caIntegrator

The MOST important steps in creating a new study in caIntegrator:

Note the following regarding the list of annotations on this page:
– If caIntegrator "recognizes" the same column header in other files already in the system, a term, for example "age" or "survival", which is the current definition, appears in the Annotation Definition column above the blue Change Assignment link.
– When the annotation definition has not been assigned, and the area above the blue Assign Annotation Definition link is blank, no correlating term exists in the database. In this case, you must specify the field type, and then the term will populate the space. See for more information.
– A field name that displays in red indicates an error in the annotation. Click the Change Assignment button for more information about the error.

  1. To indicate the unique identifier of choice, on the row showing the column header (PatientID in the figure, but other examples are subject identifier, sample identifier, etc), click Change Assignment in the Field Definition column.

Assigning An Identifier or Annotation*

assigning, annotation identifier;annotation:assigning identifierWhen you click Change Assignment on the Define Fields... page, the Assign Annotation Definition for Field Descriptor dialog box opens (). On this page you can change the column type and the field definition for the specific data field you selected.

When you change an assignment, you must make sure the data types match--numeric, etc.

The Assign Annotation Definition dialog box

  1. For the column (PatientID) that you choose to be the one and only Identifier column, in the Column Type drop-down list, select Identifier.
  2. Click Save to save the identifier. This returns you to the Define Fields for Subject Data page where the Identifier is noted in the Field Definition column.
  3. After you have defined which field is the Identifier, you must ensure that ALL other fields also have a field definition assignment. For those fields without a Field Definition assignment or for those whose Annotation Definition you want to review, click Change Assignment.
  4. In the Assign Annotation Definition for Field Descriptor dialog box, select Annotation in the drop-down list.

As you select the column type, you can work with column headers in one of four ways in this dialog box.

  1. Review the current annotation definition in the Assign Definition page, Current Annotation Definition section. Click Cancel to return to the Define Fields... page.

You can still initiate a search for another annotation definition in the Search for an Annotation Definition section if you choose to change the definition (). See . Click Save to retain any changes. Current Annotation Definition

  1. To enter a new name annotation, or any other information about the annotation definition, click the New button and enter the information described in the following table.

    Annotation Field

    Field Description

    Name

    Enter the name for the annotation.

    Definition

    Enter the term(s) that define the annotation.

    Keywords

    Insert keyword(s) that can be used to find the annotation in a search, separated by commas.

    Data Type

    Select a string (default), numeric, or date.

    Apply Max Number Mask

    This field is available only for numeric-type annotations, or when a new definition is created. This feature is unavailable when permissible values are present.
    Select the box and enter a maximum number for the mask, such as "80" for age. When you query results above the value of the mask, then the system displays the mask and not the actual age.
    Note: If you enter masks of both "max number" and "range", caIntegrator applies both masks at the same time.
    The Data Dictionary page now has a Restrictions column that shows restrictions whenever a mask has been applied.

    Apply Numeric Range Mask

    This field is available only for numeric-type annotations, or when a new definition is created. This feature is unavailable when permissible values are present.
    Select the box and enter a width of range for the mask, such as "5" representing blocks of 5 years. For example, if you enter a width of 5, the query only allows age blocks of 0-5, 6-10, 11-15, etc. When you query results above the value of the mask, then the system displays the mask and not the actual age ranges.
    Note: If you enter masks of both "max number" and "range", caIntegrator applies both masks at the same time.
    The Data Dictionary page now has a Restrictions column that shows restrictions whenever a mask has been applied.

    permissible value, annotation definition;annotation definition:permissible valuesannotation definition:field definition entriesPermissible/Non-permissible Values

    Note: The first time you load a file, before you assign annotation definitions (), these panels may be blank. If the column header for the data is already "recognizable" by caIntegrator, the system makes a "guess" about the data type and assigns the values to the data type in the newly uploaded file. They will display in the Non-permissible values sections initially. Use the Add and Remove buttons to move the values shown from one list to the other, as appropriate.
    When you select or change annotation definitions by selecting matching definitions (described in ), this may add (or change) the list of non-permissible values in this section.
    If you leave all values for a field in the Non-permissible panel, then when you do a study search, you can enter free text in the query criteria for this field.
    If there are items in the Permissible values list, then the values for this annotation are restricted to only those values. When you perform a study search, you will select from a list of these values when querying this field. If there are no items in the permissible values list then the field is considered free to contain any value.
    To edit a field's permissible values, you must change the annotation definition. You can do this even after a study has been deployed.
    Note: You cannot edit permissible values in an existing annotation definition. To change permissible values, you must create a new annotation.

Searching for Annotation Definitions

An alternative to creating a new definition is to search for annotation definitions already present in caIntegrator studies or in caDSR.

  1. Enter search keyword(s) in the Search text box on the Assign Annotation Definition page. Click Search or click Enter to launch the search. After a few moments, the search results display on the page shown in the following figure. Results for annotation definition search
  2. To view the definitions corresponding to any of the "Matching Annotation Definitions", which are those currently found in other caIntegrator studies, click the \[term\], such as "age", hypertext link. The definition then appears in the Current Annotation Definition segment of the page just above.

In summary, when you click the link, that assigns the definition to the Define Fields for Subject Data page, and it also closes the Annotation Definition page.

You can modify any portion of the definition, as described in .

  1. The matches from caDSR display some of the details of the search results. To view more details of a match, such as permissible values, click View, which opens caDSR to the term. If you click Select, the caDSR definition automatically replaces the annotation definition for this field with which you are working.
    {note:title=cautionTake care before you add a caDSR definition that it says exactly what you want. caDSR definitions can have minor nuances that require specific and limited applications of their use.

  2. Once you have settled on an appropriate field definition for the annotation, click Save. This returns you to the Define Fields for Subject Data page.

    If you have not clicked Select for alternate definitions in this dialog box, then click Save to return to the Define Field...dialog box without making any definition changes.

  3. From the Define Fields for Subject Data page, be sure and designate the data types for each field in the file. Click Save on each page to save your entries or click New to clear the fields and start again. You will not be able to proceed until every field definition entry on the Fields for Subject Data screen has an entry, one as the unique Identifier and the remainder as annotations.

The Data From File columns on the page display the column header values of the first three rows you designated as "annotations".

Saving your entries in this way saves the study by name and description, but does not deploy the study. See .

The Edit Study page now displays a "Not Loaded" status for the file whose annotations (column headers) you have defined. An example of a file whose annotations have been defined but not yet loaded is shown in the following figure. Example file whose annotations have been defined but not yet loaded

Status definitions:

  1. Click the Load Subject Annotation Source button in the Action section to load the data file you have configured, The Deploy Study button, to this point has been unavailable, but this step activates the button.

    You can add as many files as are necessary for a study. Patients 1-20 in first file, 21-40 in second file, or many patients in first file and annotations in second file, etc. As long as IDs are defined correctly, it works.

The Manage Studies page opens when the study is deployed. The Deployed status is indicated on the Manage Studies page as well as the Edit Study page. For more information, see .
You can continue to perform other tasks in caIntegrator while deployment is in process.
See also .

You can repeatedly upload additional or updated subject annotations, samples, image data, array data to the study at later intervals. These later imports do not remove any existing data; they instead insert any new subjects or update annotations for existing subjects.

Defining Survival Values

survival values, defining;defining survival valuesSurvival value is the length of time a patient lived. If you plan to analyze your caIntegrator data to create a Kaplan-Meier (K-M) Plot, then during the Annotation Definition process described above in , you should do one of two things:

  1. Make sure that you have defined at least three fields set to the "date" Data Type. These will be matched to the following three properties during Survival Value definition.
  1. It is also possible to generate KM plots if an Annotation Field Descriptor such as DAYSTODEATH has been set to Data Type 'numeric'. See .

For some applications, such as REMBRANDT and I-SPY, survival values are pre-defined in the databases when you load the data. In caIntegrator, however, you can review and define survival value ranges in a data set you are uploading to a study. To be able to do so, you need to understand the kind of data that can comprise the survival values.
To set up survival values, follow these steps:

  1. Click New to enter new survival value definitions.
  1. The dialog box extends, now displaying radio buttons and three drop-down lists that show column headers for date metadata in the spreadsheet you have uploaded. displays survival value ranges that have already been added to a study. Survival Definitions example

Survival values can be defined by Date or by Length of time in study. Select the radio button for the category that defines your survival data.

In the drop-down lists, select the appropriate survival value definitions for each field listed. You might want to refer to the column headers in the data file itself. Dates covered by the definitions are already in the data set. You cannot enter specific dates.

See also on page 82.
Updated the Edit Survival Value Definitions page, now has a radio button and 2 different types of ways to define survival values.

Adding/Editing Genomic Data

Genomic data that is parsed and stored in caArray can be analyzed in caIntegrator. Additionally, supplemental files in caArray that have not been parsed can be uploaded and analyzed in caIntegrator.

Once you have loaded subject annotation data and identified patient IDs, you can add either one or more sets of array genomic sample data from caArray, which caIntegrator maps by sample IDs to the patient IDs in the subject annotation data, covered in this section, or you can load imaging files from NBIA, also mapped by IDs to the patient data, covered in . You can also edit genomic data information that you have already added to the study. Genomic sample data and imaging data are independent of each other, so neither is required before loading the other.

It is essential that you are well acquainted with the data you are working with--the subject annotation data, and the corresponding array data in caArray.

caIntegrator supports a limited number of array platforms. For more information, see .

To add genomic data to your caIntegrator study, follow these steps:

  1. On the Edit Study page where you have selected and added the subject annotation data, click the Add New button under Genomic Data Sources. You can upload genomic data only from caArray.

This opens the Edit Genomic Data Source dialog box. Enter the appropriate information in the fields (). This fields are described below. Edit Genomic Source dialog box

  1. Click Save.

caIntegrator goes to caArray, validates the information you have entered here, finds the experiment and retrieves all the sample IDs in the experiment. Once this finishes, the experiment information displays on the Edit Study page under the Genomic Data Sources section (). Genomic Data Sources section of the Edit Study page

  1. If you want to redefine the caArray experiment information, you can edit it. Click the Edit link corresponding to the Experiment ID. The Edit Genomic Data Source dialog box reopens, allowing you to edit the information.

Mapping Genomic Data to Subject Annotation Data

Because the goal of caIntegrator is to integrate data from subject annotation, genomic and imaging data sources, data from uploaded source files must be mapped to each other. Mapping files can map to caArray genomic data of two types: "imported and parsed" and that stored in supplemental files.

Creating a Mapping File

You, as the caIntegrator study manager, must create a Subject to Sample mapping file before following the actual mapping steps. This file provides caIntegrator with the information for mapping patients to caArray samples.

  1. Start with the 6-column mapping file template, described as follows:

The following figure shows an example multiple sample mapping file in CSV format.
"Mapping file in CSV format

The following steps described in use data of either type.

Steps for Mapping Genomic Data*

To map the samples from the caArray experiment to the patients in the subject annotation data you uploaded, follow these steps:

If the caArray data you have identified is imported and parsed, when you click the Map Samples button, the mapping takes place as the data is uploaded into caIntegrator. If the caArray data is supplemental, the mapping does not occur until the study is deployed.
Mapped samples will be listed in the Samples Mapped to Subjects section. Unmapped samples show at the top of the caIntegrator page. They were loaded from caArray, but they are not in the mapping file. These are not used for integration.

Uploading Control Samples

control samples, uploading;study:uploading control samples to;fold change:control samples fileA Control Samples file is used to calculate fold change data, which compares "tumor" sample gene expression in the caArray experiment to the control samples to identify those that exhibit up or down gene regulation. Control samples can be the "normal" samples, but that is not necessarily the case.
To upload the control samples, follow these steps:

The control samples now display toward the bottom of the page.

Configuring Copy Number Data

study:configuring copy number data;copy number:configuring data;genomic data:adding copy number data to;configuring:copy number dataYou can add copy number data for a genomic data source by uploading the mapping file. This allows you to configure parameters to be used when segmentation data is being configured.
The name specified in the third column of the mapping file is specific for each array manufacturer as follows:

To add copy number data relating to the genomic data you are adding, follow these steps:

The Edit Copy Number page opens (). Edit Copy Number page

Remapping Copy Number Data in a Deployed Study

copy number:remapping data, deployed studyOccasionally you may need to remap copy number data in a deployed study. To do so, follow these steps:

See also .
2. hWorking with Imaging Data
study:working with imaging data;imaging data:working withOnce you have loaded subject annotation data and identified patient IDs, you can add either array genomic sample data from caArray which caIntegrator maps by sample IDs to the patient IDs in the subject annotation data, or you can upload image data from NBIA, also mapped by IDs to the subject data. Once you have configured an NBIA image data source for adding images, then you can import image annotation data for the images. Genomic sample data and imaging data are independent of each other, so neither is required before loading the other.
It is essential that you are well acquainted with the data you are working with--the subject annotation data, and the corresponding imaging data in NBIA.

Adding or Editing Imaging Data Files from NBIA

study:adding imaging data;imaging data:adding to study;adding:imaging data;NBIA:adding files to caIntegrator;editing imaging files;imaging data:editing NBIA images sourcesTo add images from NBIA to the study you are creating, follow these steps:

The imaging data displays on the Edit Study page under the Imaging Data Sources section (). Imaging Data Sources section of the Edit Study page.

Adding or Editing Image Annotations

After you have configured an image data source with an NBIA Grid service and uploaded the image data, described in , you can load image annotations into caIntegrator from a file in CSV format or through an Annotations and Image Markup (AIM) service.

Imaging Data Sources section of the Edit Study page. The circled section in this screen shot indicates that annotations have been uploaded for this image collection.

To add image annotations from a file, follow these steps:

To load image annotations through an AIM service, follow these steps:

Using either method, the image annotations are uploaded to caIntegrator. After this occurs, when you click the Edit Annotations button, the system opens to the Define Fields for Imaging Data page where you can edit the annotations. For more information, see . You must assign identifiers and annotations to the data in the same way you did with the subject annotation data. For more information, see and .
Adding External Links
external links, adding for a studyThis feature on the Edit Study page, described in , allows you to configure a CSV file with URLs to be used as external links relevant to a study. This allows you to easily share or configure references.
To add an external link, follow these steps:

Once you have created external links for a study, when the study is open, an External Links section showing the link(s) displays on the left sidebar of the page (). Left sidebar displaying external links

Click the link to open a page that displays appropriately formatted web page links (). An example of exernal links

Deploying the Study

When you are ready to deploy the study, click the Deploy Study button on the Edit Study page. caIntegrator retrieves the selected data from the data service(s) you defined and makes the study available to a study manager or to anyone else who may want to analyze the study's data. Using the Manage Studies feature, you can then configure and share data queries and data lists with all investigators who access the study.
Note that you can continue to work in caIntegrator while study is being deployed. See also .

Managing a Study

Once you have started to create a study or have deployed it, you can update an existing study in the following ways:

To update, edit or delete a study, follow these steps:

All of the "in process" or "completed" studies display on this page, with associated metadata. Note that whoever edited or updated the study last is shown in the Last Modified Column, indicated as the Study Manager.

On this page you can edit any details such as adding or deleting files, survival values, and so forth. For information about working with the Edit Study feature, see .

Managing Platforms

Integrator supports a limited number of array platforms, all of which originate from Agilent or Affymetrix. While they do not represent all of the platforms supported by caArray, caIntegrator must have array definitions loaded for the platforms it supports, and be able to properly load the data from caArray and parse it.
You can create a study without genomic data, but you cannot add genomic data to a caIntegrator study without a corresponding supported array platform. If you add more than one set of genomic data to the study, you can specify more than one platform for the study.
On the Manage Platforms page, you can identify, add or remove supported platforms.
To manage platforms in caIntegrator, follow these steps:

The Manage Platforms page that opens lists the platforms caIntegrator currently supports, those that the system can pull from caArray (). You can also add a new platform by entering information in the fields in the Create a New Platform section. Manage Platforms page

Depending on the Platform Type you select, there may be other parameters to provide here as well, such as Platform Channel Type for an Agilent platform.

The platform deployment can be time-consuming. If the platform takes more than 12 hours to deploy, caIntegrator displays a "timed out" message. At that point, you can delete the platform, even if it has not loaded to the system.