NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Include Page
wikicontent:Included No Longer Updated Panel
wikicontent:Included No Longer Updated Panel

Section
Column
width20%30%
Panel
titlePage Contents
Table of Contents
maxLevel2
Panel
titleDocumentation Table of Contents
Info
titleTo Print the Guide

We recommend you print one wiki page of the guide at a time. To do this, click the printer icon at the top right of the page; then from the browser File menu, choose Print. Printing multiple pages at one time is more complex. For instructions, refer to Exporting Multiple Pages to PDF.

Info
titleHaving Trouble Reading the Text?

Resizing the text for any web page is easy. For information on how to do this in your web browser, refer to this W3C tutorial

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include

ABOUT THE CANCER GENE INDEX
Column
Column

Overview of the Cancer Gene Index

There are nearly 2.5 million cancer-related publications in MEDLINE

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
as of December 2009, and this number is rapidly increasing. Scientists cannot manually identify all known cancer genes, and it is even more difficult to uncover the relationships between these genes and various human cancers or pharmacological compounds. A , for example. In theory, one could exhaustively search PubMed and compile a list of the genes related to a given disease or compound, but this would take many weeks, and it is highly likely that such a manual search would still miss some genes and relationships. The National Cancer Institute (NCI) recognized that a publicly-available resource that compiled these combined these gene-disease and gene-compound data with associated relevant annotations would greatly facilitate research, and as part of its caBIG™ caBIG® initiative, the NCI addressed this problem with it created the Cancer Gene Index Project.

The goal of the Cancer Gene Index is to further translational cancer research by providing a high quality data resource consisting of genes that have been experimentally associated with human cancers cancer diseases and/or pharmacological substancescompounds, the evidence of these associations, and relevant annotations on the data. Thus, scientists can use the data resource to quickly discover fact-based associations between genes and diseases or genes and compounds (i.e., and evaluate all of the genes associated with a disease, all of the genes associated with a compound, or all of the diseases and compounds associated with a gene) and to evaluate the evidence from which these associations were determined.A unique process coupling . This extremely valuable resource was created through a unique process that coupled automated linguistic text analysis of millions of MEDLINE

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
abstracts with manual validation and annotation of the extracted data was used to generate the Cancer Gene Index. The results of this effort are available as Gene-Disease and Gene-Compound data sets, which include data and annotations for relationships between genes and cancer diseases or genes and cancer-associated compounds, respectivelyby expert human curators. Details on this process are found in the section Creation of the Cancer Gene Index#Creation of the Cancer Gene Index.

The Cancer Gene Index includes data on 6,955 unique human genes, nearly 12,000 NCI Thesaurus cancer disease terms from a variety of public sources, and 2,180 unique pharmacologic pharmacological compounds from the NCI Thesaurus. The gene-disease and gene-compound associations were extracted from over 92 million analyzed sentences from of nearly 20 million abstracts. The resource was last updated in June, 2009.

Means of Accessing Cancer Gene Index Data

The Cancer Gene Index is available as computer-readable Gene-Disease and Gene-Compound #XML documents are a rich, freely-accessible resource. Although this XML format is simple for bioinformaticists to work with, it may be difficult for those without computer programming experience or those who cannot data files. To effectively use these files, you must be a bioinformaticist or computer programmer-scientist, or you must collaborate with someone who has such experience, to do so. In the future, it is anticipated that many this expertise. Ideally, intuitive graphical user interfaces (GUIs) will be created that are easy-to-use for all scientists, but already there are several caBIO tools that can be used to query would allow all scientists to quickly and easily access these data and exploit the full power of the Cancer Gene Index data. Other tools pull data from caBIO or provide limited views of the data, such as . Already, geWorkbench and the Cancer Molecular Analysis Portal, respectively both allow end users to view some Cancer Gene Index data.caBIO is a resource that integrates biomedical data on genes, proteins, clinical protocols

, disease ontologies, pharmacological agents, pathways, and other entities with annotations, controlled vocabularies, and metadata models from multiple curated data sources. Interfaces for exploring Cancer Gene Index data include the caBIOPortal, caBIO Portlet on the caGrid Portal, caBIO iPhone Application, and various caBIO application programmatic interfaces (APIs). Because these tools are still continually being improved for scientist end users, some do not necessarily have intuitive GUIs. The following section will help you decide which of these means of accessing the Cancer Gene Index data is right for youSeveral caBIO interfaces, on the other hand, expose all of the Cancer Gene Index data, and these can give you an appreciation for the full potential of the data resource. Many of these interfaces, however, require more experience with computer programming than the average bench scientist may have. Data within caBIO may be programmatically accessed through a variety of Application Programming Interfaces (APIs). The caBIO GUIs include the caBIO Portlet Templated Search, the caBIO Home Page, and Simple Search of the caBIO Portlet on the caGrid Portal.These caBIO GUIs are similar to PubMed in that queries will retrieve many results that you must sift through, examining each to determine whether or not it is useful. Unlike PubMed, caBIO is much more likely to return the information that you want.

Selecting the Best Way for You to Access Cancer Gene Index Data

If you have little familiarity with object models or bioinformatics, in particular with computer programming

The following section will help you select the best means to access Cancer Gene Index data based on your experience with bioinformatics and computer programming.

Tip
titleBioinformaticists and Scientist Programmers
  • If you have limited knowledge of the caBIO object model and caBIG®, you should use the
step-by-step guide for the caBIO Portlet Templated Search, which provides fast and easy access to gene-disease and gene-compound associations from the Cancer Gene Index. The Templated Search does not, however, allow you to view evidence of or annotations on the retrieved associations
  • Cancer Gene Index Gene-Disease and Gene-Compound XML documents with the Cancer Gene Index XML guide. The format of these documents is extremely simple, making them very easy with which to work. To download the XML documents, you must have a computer with at least 720 MB of free disk space, an internet connection, and a web browser; other system requirements depend upon the way in which you intend to use the data resource.
  • If you are familiar with the caBIO object model and caBIG®, you may wish to refer to the documentation on the caBIO APIs. The caBIO APIs allow you to uncover associations within the Cancer Gene Index data set and to find additional information linking these data with associated pathways, protein annotations, clinical protocols, and other biomedical entities. For information on system requirements, please refer to the links for each API on the caBIO wiki page.
Tip
titleScientists with No Programming or Bioinformatics Experience

You should use the step-by-step guide for the caBIO Portlet Templated Search tool. All that is required to access this web-based

graphical user interface (

GUI

)

is a computer with an internet connection and a web browser.

If you are comfortable with concepts like computer programming classes, objects, and object models (but not necessarily computer programming or object modeling), you

Although it is easy to uncover gene-disease and gene-compound associations with this tool, it does not allow you to limit your search results and thus can return genes-disease and gene-compound associations that were not validated by human curators. Also, it does not necessarily return all of the data you would like. Because of these issues, you must use this tool in conjunction with the

  • caBIO Object Graph Browser and, potentially, the
  • NCI Thesaurus
Tip
titleScientists Familiar with Classes, Objects, and Object Models

You can use the step-by-step guide to the caBIO

Portal searches

Home Page,

the caBIO Portlet caBIO, or the caBIO caBIO Application. In contrast to the caBIO Portlet Templated Search,

which has the Freestyle Lexical Mine and Search for Biological Entities tools. Although these interfaces expose the entirety of the Cancer Gene Index

. The caBIO Portal is similar to PubMed in that queries will retrieve many results that you must sift through, examining each to determine whether or not it is useful. Unlike PubMed, caBIO is much more likely to return the information that you want. The caBIO Portal has two search functionalities, the Freestyle Lexical Mine and the Search for Biological Entities. The caBIO is ideal for end users who would like to view all caBIO objects associated with a search term. The caBIO allows you to search for specific attributes and to limit retrieved results to one or more caBIO classes. The caBIO Portlet caBIO can be useful for viewing one object at a time, and it links to the caBIO Object Viewer in order to provide access to related information

and the rest of caBIO, they require knowledge of the caBIO object model. All that is required to access these web-based caBIO search tools is a computer with an internet connection and a web browser.

Note
titlecaBIO Portlet Simple Search

The caBIO Portlet also has a Simple Search tool, which provides an overview of the data in caBIO in a way that does not require knowledge of the caBIO object model and can be useful for a quick look of the kinds of data within caBIO. Because the Simple Search tool does not allow you to differentiate Cancer Gene Index data from other data in caBIO, it is recommended that you instead use the XML, caBIO Portlet Templated Search, or even the caBIO Home Page. In the event that you would prefer to use the Simple Search a guide is provided.

Info
titleScientists On the Go

If you would like to view Cancer Gene Index data on the go, you can use the caBIO iPhone Application. A limited guide to accessing Cancer Gene Index data is provided here.

Examples of How the Cancer Gene Index Facilitates Translational Research

The Cancer Gene Index can facilitate many different types of cancer research. Two examples are given below.

Translational Medicine Research

In this first example from the Cancer Gene Index Project poster, the data resource is used to validate colon cancer translational medicine research data. Here, scientists have obtained access to deidentified demographic data, histopathology data (lymph node [ICR:pN], tumor size [ICR:pT], and degree of metastasis [ICR:G]), and tumor tissue biospecimens from patients, which are represented by gray figures. The scientists perform gene expression microarrays on each colon cancer biospecimen (pink and red colon tissue cells). The genes (purple DNA fragments) with significantly altered expression are validated by cross-referencing the Cancer Gene Index.

Illustration showing how the data resource is used to validate colon cancer translational medicine research data. The figure is described in detail in the text. Image Added

Biomarker Discovery

The Cancer Gene Index also may be used for lymphoma biomarker discovery. This example from the Cancer Gene Index Project poster illustrates that researchers can use the data resource to quickly identify the genes (purple DNA fragments) that are associated with and may be biomarkers for Lymphoma. Here, gene-disease concept pair associations are shown as blue "to diseases" arrows. By searching the Cancer Gene Index for therapeutic compounds that are associated with these genes, scientists easily uncover which of these candidate disease biomarkers are also associated with lymphoma-related compounds. An association between the gene encoding SPN, also known as sialophorin or CD43, and the compound leflunomide is represented by a black "has validated association with" arrow. Cancer Gene Index data can be cross-referenced to other resources, such as the clinical trial protocol database Physician Data Query® (PDQ) to obtain information about trials that link these data.

Illustration showing that researchers can use the data resource to quickly identify genes (shown as purple DNA fragments) that are associated with and may be markers for lymphoma. The figure is described in detail in the text.Image AddedIf you are a bioinformatician or a scientist programmer, you can use the Cancer Gene Index Gene-Disease and Gene-Compound XML documents or one of the caBIO. The XML is the complete resource that gives you the flexibility to parse the XML, load the parsed data into a database, and query; write code to extract relevant data directly from the XML; or use any other strategy to find relevant data within the XML. To download the XML documents, you must have a computer with at least 720 MB of free disk space, an internet connection, and a web browser; other system requirements depend upon the way in which you intend to use the data resource. The caBIO APIs allow one to uncover associations not only between genes and diseases or compounds in the Cancer Gene Index, but also find additional information about the genes, diseases, or compounds (e.g., associated pathway data or clinical protocols in caBIO). It can, however, be difficult to navigate the caBIO object model and use the APIs for these purposes. For information on system requirements, please refer to the links for each API on the caBIO wiki page.