NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As a starting point, the requirements gathering effort will be informed by past and current related efforts in caBIG®, such as the recent Semantic Infrastructure Requirements Elicitation effort, the caBIO ECCF service specification project on molecular and pathway annotation services from the Integrated Cancer Research (ICR) Workspace, the Annotated Information Model (AIM) from the In-Vivo Imaging (IMAG) Workspace for both radiological and pathologic images, and work on "Dynamic Extensions" from the Tissue Banks and Pathology Tools (TBPT) Workspace. Nevertheless, we anticipate significant further input from community feedback based on this roadmap document, and from the life sciences workgroup within the Semantic Infrastructure 2.0 Inception Effort which will involve both requirements gathering and prototype tool building.

Key Use Cases

...

& Requirements

This section highlights some key use cases that depend on data semantics. These use-cases are used as a representative set to capture the requirements of the life sciences domain. A comprehensive set of all clinical trails use-cases can be found at https://gforge.nci.nih.gov/plugins/wiki/index.php?Use%20Cases&id=512&type=g

Discovering a Biomarker

A scientist is trying to identify a new genetic biomarker for HER2/neu negative stage I breast cancer patients. Using a caGrid-aware client, the scientist

Scenario 1: Discovering a Biomarker

A scientist is trying to identify a new genetic biomarker for HER2/neu negative stage I breast cancer patients. Using a caGrid-aware client, the scientist queries for HER2/neu negative tissue specimens of Stage I breast cancer patients at LCCC that also have corresponding microarray experiments. Analysis of the microarray experiments identify genes that are significantly over-expressed and under-expressed in a number of cases. The scientist decides that these results are significant, and related literature suggest a hypothesis that gene A may serve as a biomarker in HER2/neu negative Stage I breast cancer. To validate this hypothesis in a significant number of cases the scientist needs a larger data set, so he queries for all the HER2/neu negative specimens of Stage I breast cancer patients with corresponding microarray data and also for appropriate control data from other cancer centers. After retrieving the microarray experiments the scientist analyzes the data for over-expression of genes A.
Scenario 2: Finding Biomaterial to Validate a Biomarker

In scenario 1, the scientist has validated a biomarker based on available microarray experiments provided by various cancer centers. Now, the scientist would like to request biomaterial in the form of formalin-fixed/paraffin embedded tissue specimens from patients with the appropriate clinical outcomes. The scientist would like to validate the genetic biomarker in a different series of cases, this time using a different technique such as immunohistochemistry. The scientist queries for the presence of appropriate tissue using a caGrid-aware client and for the appropriate contact information of the person(s) responsible for the tissue repository. The scientist contacts the person(s) to begin the protocol for retrieving biomaterials.

Scenario 3: Extending Extending the use of a biomarker

The scientist would like to check if genes A could also be used as biomarker for other types of cancer. The flow of events will be similar to Scenario 1 with the exception that the specimen query will not be restricted to Stage I breast cancer patients.

Scenario 4: Exploring predictive power of gene expression in breast cancer metastasis

The scientist would like to explore if gene expression patterns can predict how breast cancer will metastasize. He queries all the specimens of breast cancer patients from other cancer centers where their metastasis sites are liver, bone and brain. The scientist then retrieves the corresponding microarray experiments for these specimens. The scientist analyzes the microarray experiments to explore for a correlation between expression profiles and metastasis sites.

Scenario 5: Oncologists in formulating ideas for new clinical studies

The oncologist often wants to first find out the answers to questions such as: How many patients have been seen at our institution with disease x"? How does that compare with other institutions? What is the average survival of patients with disease x? How is it different if they are treated with drug x or y? How many patients with disease x and TNM stage y at diagnosis? How many patients with disease x relapse after treatment y? This use case, then, is about enabling oncologists to ask these exploratory questions of their clinical databases as well as those at other institutions accessible on caGrid.

Scenario 6: High throughput screen for anti-cancer drugs leads based on robotic microscopy

A basic research scientist has developed a high throughput screen for anti-cancer drugs (leads) based on robotic microscopy. The final output of the process is relatively simple; a two by two matrix with the rows being a few million small molecules and the columns being some biological properties of these compounds, e.g. toxicity against several different tumor cell lines, toxicity against several different normal cell lines, ability of the molecule to enter the cell and its intracellular distribution, and impact of the molecule on a number of biological endpoints. However, the process of generating this output presents a number of challenges. The initial output of the robotic microscopy is many thousands of images a day. The raw images must be stored so as to be available for future analytical algorithms and should, for similar reasons, be sharable. The space required to store these images is tens to hundreds of terabytes per year per instrument. Analysis of these images to generate the desired input needs to be automated. Many of the best algorithms are proprietary and are embedded in software which is not caBIG compatible and resides within a community not currently engaged with the caBIG community. Both the raw images and final results, annotated as to how those results were obtained, needs to be made available to appropriate collaborators both within the academic and commercial sphere so that leads identified in these small molecule libraries can be modified into drug candidates which can then be tested first preclinically and then in clinical trials.

x and TNM stage y at diagnosis? How many patients with disease x relapse after treatment y? This use case, then, is about enabling oncologists to ask these exploratory questions of their clinical databases as well as those at other institutions accessible on caGrid.

Scenario 7: Translational Research Use Case - Multi-Center Ancillary Study in the context of a Consortium Clinical Trial (extension from Enterprise Use Cases)from Enterprise Use Cases)
The following are the steps in a translational research scenario.

• Within a consortium of cooperating institutions an investigator conducts a search across the consortiums’ clinical data repositories to investigate the feasibility of a potential clinical research idea.

...

• Data are made available according to funding agencies requirements.

Scenario 8: Overlay of protein array data on the regulatory pathways with links to patient and cell culture data.

A clinical research scientist wants to be able to predict the efficacy of tyrosine kinase inhibitors as cancer chemotherapeutic agents. The fact that many oncogenes are tyrosine kinases would predict that such agents should be effective, but several have been synthesized and tested in clinical trials, and the results have been disappointing in the extreme, with more cases of tumor growth stimulation than inhibition. The clinician hypothesizes that these unexpected effects are the result of regulatory feedback loops. To test this hypothesis, he requires software tools for modeling regulatory pathways. In addition, he needs to determine the state of such pathways in different patients by measuring the state of phosphorylation of the elements (proteins) of these pathways using reverse phase protein arrays. Because the consequences of treating the wrong patient with the wrong agent are so severe, the response of the tumor to the inhibitors will be tested in vitro, on cell cultures established from tumor biopsies. However, biospecimens and data from those patients who participated in clinical trials of these reagents before their ineffectiveness was appreciated is also available. Outputs measured on these cultures and biospecimens will include growth rate (determined by flow cytometry or by visually counting of cells at different time points, extent of cell death determined similarly, photomicrographs, reports of microscopic observations by trained investigators, rate of DNA synthesis measured by radioisotope or fluorescent labeled precursor uptake and incorporation, and staining with various immune reagents followed by high throughput robotic microscopy and automated image analysis. To develop an understanding that will resulting in giving the correct drugs to the correct patients, data from the protein arrays will be overlayed on the regulatory pathways and linked to patient and cell culture data.

Scenario 9: Animal model use case·

The following are the steps describing the scenario.

  • Bench scientist chooses candidate glioblastoma genes using human GWAS (eg, TCGA). 

...

  • Scientist also utilizes pathway analysis to postulate how multiple "hits" may be involved in tumorigenesis, to direct design of genetically altered mice. 

...

  • Utilizes targeted gene transfer to deliver mutated genes to inbred mice. 

...

  • Using inbred mice provides uniform genetic background in which researcher can also investigate mutated candidate genes in conjunction with other gene knockouts. 

...

  • Scientist finds mutated gene x expressed in gene y knockout mouse results in glioblastoma development that parallels human pathology. 

...

  • Scientist validates that model reacts in similar way to current therapeutic treatments. 

...

  • Scientist uses mouse model to test new therapeutic treatments, including combinations of drugs chosen to inhibit multiple pathways. 

...

  • Clinical scientists utilize mouse model results to design clinical trials to treat glioblastoma, incorporating genomic information on patients.

Scenario 10:

Outside researcher requesting access to a consortium's (Prostate SPOREs) Federated Biorepositories---11 instances of caTissue Suite independently maintained and managed.

...

• The Fellow uses this information to request tissue from 4 institutions to build her TMA.

Scenario 11: High Throughput Sequencing Using DNA Sequencing to Exhaustively Identify Tumor Associated Mutations(Use cases courtesy of David Wheeler and Kim Worley of BCM’s Human Genome Sequencing Center (HGSC).)

This is a basic research use case that easily becomes translational when the output of this use case is used, for example, to identify targets for biomarker studies or drug candidates for clinical trials.
Version A: Sequencing of selected genes via Maxim Gilbert Capillary (“First Generation”) sequencing. Nature. 2008 Sep 4 - Epub ahead of print? 1) Develop a list of 2000 to 3000 genes thought to be likely targets for cancer causing mutations. 2) As a preliminary (lower cost) test, pick the most promising 600 genes from this list. 3) Develop a gene model for each of these genes. 4) Hand modify that gene model e.g. to merge small exons into a single amplicon. 5) Design primers for PCR amplification for each of these genes. 6) Order Primers for each exon of each of the genes. 7) Test Primers. 8) In parallel with steps 1-7, identify match pairs of tumor samples/normal tissue from the same individual for the tumors of interest. 9) Have pathologists confirm that the tumor samples are what they claim to be and that they consist of a high percentage of tumor tissue. 10) Make DNA from the tumor samples, confirming for each tumor that quantity and quality of the DNA are adequate. 11) PCR amplify each of the genes. 12) Sequence each of the exons of each of the genes for each tumor/normal pair of DNA samples. 13) Find all the differences between the tumor sequence and normal sequence. 14) Confirm that these differences are real using custom arrays, the seqenome (Mass Spec) technology and/or biotage (a pyrosequencing-based technology directed specifically at looking for SNP-like changes) 15) Identify changes that are seen at a higher frequency than what would occur by chance. 16) Relate the genes in which these changes are seen to known signaling pathways.

...

Scenario 14 C Extended scenario 14 Analyzing existing datasets to identify nanoparticle probes When many nanoparticles have been screened for their uptake in many different cell lines across many cancer centers, a scientist imports all the datasets that involve nanoparticle binding or uptake to cells. The cell lines are reclassified into target or background cells based on a set of criteria (tissue type, presence or absence of a oncogene mutation, etc.) and an analogous analysis is performed to identify nanoparticles that exhibit differential binding/uptake to different classes of cell lines.

NanoParticle Ontology 

Scenario 15 Scenario based on evaluating and enriching the NanoParticle? Ontology The NanoParticle? Ontology (NPO) is an ontology which is being developed at Washington University in St. Louis to serve as a reference source of controlled vocabularies / terminologies in cancer nanotechnology research. Concepts in the NPO have their instances in the data represented in a database or in literature. In a database, these instances would include field names and/or field entries of the data model. NPO represents the knowledge for supporting unambiguous annotation and semantic interpretation of data in a database or in the literature. To expedite the development of NPO, object models must be developed to capture the concepts and inter-concept relationships from the literature. Minimum information standards should provide guidelines for developing these object models, so the minimum information is also captured for representation in the NPO.