ABOUT THE CANCER GENE INDEX
Overview of the Cancer Gene Index
There are nearly 2.5 million cancer-related publications in MEDLINE as of December 2009, and this number is rapidly increasing. Scientists cannot manually identify all known cancer genes, and it is even more difficult to uncover the relationships between these genes and various human cancers or pharmacological compounds. A publicly-available resource that compiled these data with associated annotations would greatly facilitate research, and as part of its caBIG™ initiative, the NCI addressed this problem with the Cancer Gene Index.
The goal of the Cancer Gene Index is to further translational cancer research by providing a high quality data resource consisting of genes that have been experimentally associated with human cancers and/or pharmacological substances, the evidence of these associations, and relevant annotations on the data. Thus, scientists can use the data resource to quickly discover fact-based associations between genes and diseases or genes and compounds (i.e., all of the genes associated with a disease, all of the genes associated with a compound, or all of the diseases and compounds associated with a gene) and to evaluate the evidence from which these associations were determined.
A unique process coupling automated linguistic text analysis of millions of MEDLINE abstracts with manual validation and annotation of the extracted data was used to generate the Cancer Gene Index. The results of this effort are available as Gene-Disease and Gene-Compound data sets, which include data and annotations for relationships between genes and cancer diseases or genes and cancer-associated compounds, respectively. Details on this process are found in the section Creation of the Cancer Gene Index#Creation of the Cancer Gene Index.
The Cancer Gene Index includes data on 6,955 unique human genes, nearly 12,000 cancer disease terms from a variety of public sources, and 2,180 unique pharmacologic compounds from the NCI Thesaurus from over 92 million analyzed sentences from nearly 20 million abstracts. The resource was last updated in June, 2009.
Means of Accessing Cancer Gene Index Data
The Cancer Gene Index Gene-Disease and Gene-Compound #XML documents are a rich, freely-accessible resource. Although this XML format is simple for bioinformaticists to work with, it may be difficult for those without computer programming experience or those who cannot collaborate with someone who has such experience, to do so. In the future, it is anticipated that many graphical user interfaces (GUIs) will be created that are easy-to-use for all scientists, but already there are several caBIO tools that can be used to query the Cancer Gene Index data. Other tools pull data from caBIO or provide limited views of the data, such as geWorkbench and the Cancer Molecular Analysis Portal, respectively.
caBIO is a resource that integrates biomedical data on genes, proteins, clinical protocols, disease ontologies, pharmacological agents, pathways, and other entities with annotations, controlled vocabularies, and metadata models from multiple curated data sources. Interfaces for exploring Cancer Gene Index data include the caBIOPortal, caBIO Portlet on the caGrid Portal, caBIO iPhone Application, and various caBIO application programmatic interfaces (APIs). Because these tools are still continually being improved for scientist end users, some do not necessarily have intuitive GUIs. The following section will help you decide which of these means of accessing the Cancer Gene Index data is right for you.
Selecting the Best Way for You to Access Cancer Gene Index Data
If you have little familiarity with object models or bioinformatics, in particular with computer programming, you should use the step-by-step guide for the caBIO Portlet Templated Search, which provides fast and easy access to gene-disease and gene-compound associations from the Cancer Gene Index. The Templated Search does not, however, allow you to view evidence of or annotations on the retrieved associations. All that is required to access this web-based graphical user interface (GUI) is a computer with an internet connection and a web browser.
If you are comfortable with concepts like computer programming classes, objects, and object models (but not necessarily computer programming or object modeling), you can use the step-by-step guide to the caBIO Portal searches, the caBIO Portlet caBIO, or the caBIO caBIO Application. In contrast to the caBIO Portlet Templated Search, these interfaces expose the entirety of the Cancer Gene Index. The caBIO Portal is similar to PubMed in that queries will retrieve many results that you must sift through, examining each to determine whether or not it is useful. Unlike PubMed, caBIO is much more likely to return the information that you want. The caBIO Portal has two search functionalities, the Freestyle Lexical Mine and the Search for Biological Entities. The caBIO is ideal for end users who would like to view all caBIO objects associated with a search term. The caBIO allows you to search for specific attributes and to limit retrieved results to one or more caBIO classes. The caBIO Portlet caBIO can be useful for viewing one object at a time, and it links to the caBIO Object Viewer in order to provide access to related information. All that is required to access these web-based caBIO search tools is a computer with an internet connection and a web browser.
If you are a bioinformatician or a scientist programmer, you can use the Cancer Gene Index Gene-Disease and Gene-Compound XML documents or one of the caBIO. The XML is the complete resource that gives you the flexibility to parse the XML, load the parsed data into a database, and query; write code to extract relevant data directly from the XML; or use any other strategy to find relevant data within the XML. To download the XML documents, you must have a computer with at least 720 MB of free disk space, an internet connection, and a web browser; other system requirements depend upon the way in which you intend to use the data resource. The caBIO APIs allow one to uncover associations not only between genes and diseases or compounds in the Cancer Gene Index, but also find additional information about the genes, diseases, or compounds (e.g., associated pathway data or clinical protocols in caBIO). It can, however, be difficult to navigate the caBIO object model and use the APIs for these purposes. For information on system requirements, please refer to the links for each API on the caBIO wiki page.