NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Technical Description: Biospecimen repositories are deployed locally, as well as Washington University, Thomas Jefferson University, and Fox Chase Cancer Center.  Each has their information models registered in a metadata repository, as well as has standardized APIs exposed.  The local instance of caTissue discovers services with compatible metadata and APIs, and performs the query.  The data returned is aggregated based on standardized metadata, and presented to the user.  caTissue uses CDE names, descriptions, and standard value sets to display data, help the user build the query, and issue the query.

Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information.

Cross Reference:

Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information.

Domain Description: a cancer researcher has developed a new image Domain Description: a cancer researcher has developed a new image detection algorithm for identifying glioblastoma multiforme, which is the most common and most aggressive type of primary brain tumor in humans, involving glial cells and accounting for 52% of all parenchymal brain tumor cases and 20% of all intracranial tumors.  When viewed with MRI, glioblastomas often appear as ring-enhancing lesions. The appearance is not specific, however, as other lesions such as abscess, metastasis, tumefactive multiple sclerosis, and other entities may have a similar appearance.  The cancer researcher's algorithm should be able to differentiate between cancerous lesions and other lesions, but he needs additional tissues and images to make his testing statistically significant.  The cancer researcher sits down to his laptop and loads Cancer Bench-to-Bedside (caB2B).  He builds a search on all known tissues that have been identified as globlastoma multiforme via stereotactic biopsy and have corresponding CT images.  He hits the search button, gets a cup of coffee, and a returns to a list of 74 tissues with 465 images.  He hits the export button, which downloads all the images with associated pathology results.

Technical Description: a number of organizations have exposed pathology and image services with standardized metadata.  caB2B uses CDE names, descriptions, and value sets to allow the user to construct a query across all of these services.  The user selects the CDEs to filter on, which includes a join across information models (caTissue annotations to imaging annotations).  A semantic relationship between the two models based on biospecimen identifier has previously been established.  A distributed query is formulated and executed.  The resulting data is aggregated based on semantic relationships and presented to the user using CDE names and descriptions.

Determine if each sample used in an expression profiling experiment is available for a SNP analysis experiment.

Is this a repeat of "Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information."?

Cross Reference:

Determine if each sample used in an expression profiling experiment is available for a SNP analysis experimentSearch for a particular gene based on the Entrez Gene ID and its related information  e.g. messenger RNA and protein information from GeneConnect.

Is this a repeat of "

Automatically discover analytical steps for Illumina bead array analysis using inference based on the semantic metadata of the parameters.

Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information"?

Search for a particular gene based on the Entrez Gene ID and its related information  e.g. messenger RNA and protein information from GeneConnect.

Is this a repeat of "Search  for all 'pre-cancerous' biospecimens that are available for sharing at Washington University, Thomas Jefferson University, and Fox Chase Cancer Center"?

Automatically discover analytical steps for Illumina bead array analysis using inference based on the semantic metadata of the parameters.

Domain Description: The Illumina BeadChip is proprietary method of performing Domain Description: The Illumina BeadChip is proprietary method of performing multiplex gene expression and genotyping analysis. The essential element of BeadChip technology is the attachment of oligonucleotides to silica beads.  An informaticist is working with a cancer researcher to study expression profiles related to proto-oncogenes in T-cell leukemias.  It is the first time either has worked with this technology, and the inforaticist is in the process of developing an analytical pipeline.  He performs a search for such an analytical pipeline, and a number of steps that can be linked together are presented to him based on the Illumina bead output data type and his end goal of identifying gene annotations.  The pipeline that was inferred using semantic metadata is: BeadArray-specific variance stabilization and gene annotation at the probe level.  The informaticist knows he needs a quality control step at the beginning to determine whether an experiment run was successful or produced bad data.  He searches "quality control" and "expression data" for analytical services, and finds an option specific to Illumina bead arrays.  He also knows that the control and experimental runs will need to be normalized.  The results from his search apply to gene expression matrices, so he will need a translation step.  He enters the bead array format as the input and the gene expression matrix as the output and finds what he needs.  Fortunately, the analytical step fits right in before probe annotation, which can also work on a gene expression matrix provided the bead identifiers are included.  He saves the workflow, types in some notes about it, and shares it with his cancer researcher colleague.

Technical Description: the discovery of analytical steps utilizes inference over semantic annotations of input and output parameters.  The researcher selects the metadata types that will be input to the pipeline and those that will be output from the pipeline.  The inference engine performs discovery steps, chaining inputs to outputs in an expanding set until all options are exhausted or the resulting type matches.  Furthermore, when specific analytical steps are queries for, full-text and concept-based metadata searches are performed in conjunction with output/input matching to provide the bet possible results.  Workflows are saved as a set of steps or as a set of constrains upon which workflows are dynamically generated to meet scientific goals.

Cross Reference:

  • Support

...

Domain Description: a metadata specialist works with the principle investigator of a trial to define the eligibility criteria for a study in enough detail so that eligibility can be computed from patient data.  The metadata specialist defines each eligibility question as a common data element (CDE) with a description, an mathematical operator, and a data operand (what the data gets compared to).  For example, the principle investigator tells the metadata specialist that all patients must be at least 21 years old.  The metadata specialist defines a CDE annotated with the concept "age", the operator "greater-than or equal", and the operand "21".  These steps are performed for each of the 32 eligibility criteria.  The principle investigator now works with a clinical informaticist to perform a search using these computable eligibility criteria on patient data at the cancer center to see if anyone is eligible.  Furthermore, prospective patients themselves can type their data into the trial matching system to compute elgibility for all known trials to determine if there are any trials for their cancer.

Technical Description: the metadata specialist defines CDEs with operator and operand annotations.  These are stored in the local metadata repository, which is used by the trial matching software.  When computing eligibility, data for semantically equivalent data element are computed against the eligbility metadata to determine eligibility.  "Fuzzy" eligibility can be computed when data is missing or does not match.

Support the addition of data elements to an existing information model and automatically capture and publish the information about the extensions.

Domain Description: A teratoma is an encapsulated tumor with tissue or organ components resembling normal derivatives of all three germ layers.  Regardless of location in the body, a teratoma is classified according to a cancer staging system: 0 or mature (benign); 1 or immature, probably benign; 2 or immature, possibly malignant (cancerous); and 3 or frankly malignant.  Teratomas are also classified by their content: a solid teratoma contains only tissues (perhaps including more complex structures); a cystic teratoma contain only pockets of fluid or semi-fluid such as cerebrospinal fluid, sebum, or fat; a mixed teratoma contains both solid and cystic parts.  A cancer researcher would like to extend the pathology annotations associated with tissues in the center's tissue bank by adding Teratoma Content as an additional nonseminomatous germ cell tumor (NSGCT) annotation.  The researcher communicates this to the director of the tissue repository, who promptly opens the administrative interface to caTissue and adds the additional pathology annotation.  The system is now able to capture this, and the data and data descriptions are shareable with other organizations.

Technical Description: the cancer center is running caTissue with a local metadata repository.  When a new annotation is added to caTissue, the dynamic extensions module is invoked.  The caTissue information model is extended to include necessary additional classes and attributes, which in turn are propagated as new data elements in the metadata repository.  These data elements represent well formed metadata that is automatically discoverable and shareable through the public interfaces.  When another organization wishes to extend their caTissue model to include this type of data, they will be able to discover the metadata already created and instantiate a reference to it rather than creating it afresh.

When defining new datasets for caIntegrator's data-warehouse for biomedical data collection and analysis, automatically record these new datatypes in a well-defined and federated manner so that data can be shared.

Is this a repeat of "Support the addition of data elements to an existing information model and automatically capture and publish the information about the extensions."?

Discover and orchestrate services to achieve LS research goals; e.g. start with a hypothesis, identify relevant services that provides the necessary analysis and data, create the worklow/pipeline, report findings. Workflow related requirements.

This is a repeat of "Search  for all "pre-cancerous" biospecimens that are available for sharing at Washington University, Thomas Jefferson University, and Fox Chase Cancer Center."?

B. Forms Stories

...

Support patient to trial matching through the use of computable eligibility criteria

Domain Description: a metadata specialist works with the principle investigator of a trial to define the eligibility criteria for a study in enough detail so that eligibility can be computed from patient data.  The metadata specialist defines each eligibility question as a common data element (CDE) with a description, an mathematical operator, and a data operand (what the data gets compared to).  For example, the principle investigator tells the metadata specialist that all patients must be at least 21 years old.  The metadata specialist defines a CDE annotated with the concept "age", the operator "greater-than or equal", and the operand "21".  These steps are performed for each of the 32 eligibility criteria.  The principle investigator now works with a clinical informaticist to perform a search using these computable eligibility criteria on patient data at the cancer center to see if anyone is eligible.  Furthermore, prospective patients themselves can type their data into the trial matching system to compute elgibility for all known trials to determine if there are any trials for their cancer.

Technical Description: the metadata specialist defines CDEs with operator and operand annotations.  These are stored in the local metadata repository, which is used by the trial matching software.  When computing eligibility, data for semantically equivalent data element are computed against the eligbility metadata to determine eligibility.  "Fuzzy" eligibility can be computed when data is missing or does not match.

Support the addition of data elements to an existing information model and automatically capture and publish the information about the extensions.

Domain Description: A teratoma is an encapsulated tumor with tissue or organ components resembling normal derivatives of all three germ layers.  Regardless of location in the body, a teratoma is classified according to a cancer staging system: 0 or mature (benign); 1 or immature, probably benign; 2 or immature, possibly malignant (cancerous); and 3 or frankly malignant.  Teratomas are also classified by their content: a solid teratoma contains only tissues (perhaps including more complex structures); a cystic teratoma contain only pockets of fluid or semi-fluid such as cerebrospinal fluid, sebum, or fat; a mixed teratoma contains both solid and cystic parts.  A cancer researcher would like to extend the pathology annotations associated with tissues in the center's tissue bank by adding Teratoma Content as an additional nonseminomatous germ cell tumor (NSGCT) annotation.  The researcher communicates this to the director of the tissue repository, who promptly opens the administrative interface to caTissue and adds the additional pathology annotation.  The system is now able to capture this, and the data and data descriptions are shareable with other organizations.

Technical Description: the cancer center is running caTissue with a local metadata repository.  When a new annotation is added to caTissue, the dynamic extensions module is invoked.  The caTissue information model is extended to include necessary additional classes and attributes, which in turn are propagated as new data elements in the metadata repository.  These data elements represent well formed metadata that is automatically discoverable and shareable through the public interfaces.  When another organization wishes to extend their caTissue model to include this type of data, they will be able to discover the metadata already created and instantiate a reference to it rather than creating it afresh.

Cross Reference:

When defining new datasets for caIntegrator's data-warehouse for biomedical data collection and analysis, automatically record these new datatypes in a well-defined and federated manner so that data can be shared.

Is this a repeat of "Support the addition of data elements to an existing information model and automatically capture and publish the information about the extensions."?

Discover and orchestrate services to achieve LS research goals; e.g. start with a hypothesis, identify relevant services that provides the necessary analysis and data, create the worklow/pipeline, report findings. Workflow related requirements.

This is a repeat of "Search  for all "pre-cancerous" biospecimens that are available for sharing at Washington University, Thomas Jefferson University, and Fox Chase Cancer Center."?

B. Forms Stories

Create and reuse forms

Domain Description: Forms provide a convenient paper-like electronic mechanism to capture data in a structured way.  For example, when a patient is placed on a clinical trial, data about the patient's demographics and eligibility for the trial need to be captured.  The trial investigator sits with the forms curator to generate this case report form.  The forms curator searches for existing demographics forms and form modules, and the investigator reviews them.  They identify an appropriate set of questions, and include them in the case report form.  They then move onto the eligibility checklist.  The investigator drafted the checklist, and it has been approved by the IRB.  The forms curator begins keying in the questions, some of which are identified as existing questions and reused, others of which are created completely new.  The form is marked complete and is available by the clinical research staff for gathering and enrolling new patients.

Technical Description: Forms are a collection of data elements annotated and grouped within the metadata repository.  The forms curator can search for existing forms and form modules (portions of a form) by question text, annotations, etc.  These can be reused by reference, or imported and modified.  When new data elements are being curated, the form curator can search the federated set of all metadata repositories to identify data elements for reuse.  This can happen automatically within the curation tooling or explicitly through the metadata web interface.  The final CRF is saved and annotated within the local metadata repository.

Cross Reference:

...

Allowing form annotations to enable form behavior

...