ABOUT THE CANCER GENE INDEXh2. Overview of the Cancer Gene IndexThere are nearly 2.5 million cancer-related publications in \[MEDLINE|[http://www.proquest.com/en-US/catalogs/databases/detail/medline_ft.shtml|http://www.proquest.com/en-US/catalogs/databases/detail/medline_ft.shtml]\] as of December 2009, and this number is rapidly increasing. Scientists cannot manually identify all known cancer genes, and it is even more difficult to uncover the relationships between these genes and various human cancers or pharmacological compounds. In theory, one could exhaustively search PubMed and compile, for example, a list of the genes related to a given disease or compound, but this would take many weeks, and it is highly likely that such a manual search would still miss some genes. The National Cancer Institute (NCI) recognized that a publicly-available resource that compiled these gene-disease and gene-compound data with relevant annotations would greatly facilitate research, and as part of its \[caBIG® initiative|[http://cabig.cancer.gov/|http://cabig.cancer.gov/]\], it created the \[Cancer Gene Index|[https://cabig.nci.nih.gov/inventory/data-resources/cancer-gene-index|https://cabig.nci.nih.gov/inventory/data-resources/cancer-gene-index]\] Project.The goal of the Cancer Gene Index is to further translational cancer research by providing a high quality data resource consisting of genes that have been experimentally associated with human cancers and/or pharmacological substances, the evidence of these associations, and relevant annotations on the data. Thus, scientists can use the data resource to quickly discover fact-based associations between genes and diseases or genes and compounds (i.e., all of the genes associated with a disease, all of the genes associated with a compound, or all of the diseases and compounds associated with a gene) and to evaluate the evidence from which these associations were determined. This extremely valuable resource was created through a unique process that coupled automated linguistic text analysis of millions of \[MEDLINE|[http://www.proquest.com/en-US/catalogs/databases/detail/medline_ft.shtml|http://www.proquest.com/en-US/catalogs/databases/detail/medline_ft.shtml]\] abstracts with manual validation and annotation of the extracted data. Details on this process are found in the section \[#Creation of the Cancer Gene Index\].The Cancer Gene Index includes data on 6,955 unique human genes, nearly 12,000 cancer disease terms from a variety of public sources, and 2,180 unique pharmacologic compounds from the NCI Thesaurus. Associations between genes and diseases or genes and compounds were extracted from over 92 million analyzed sentences of nearly 20 million abstracts. The resource was last updated in June, 2009.h2. Means of Accessing Cancer Gene Index DataThe \[Cancer Gene Index|[https://cabig.nci.nih.gov/inventory/data-resources/cancer-gene-index|https://cabig.nci.nih.gov/inventory/data-resources/cancer-gene-index]\] is available as computer-readable Gene-Disease and Gene-Compound data files. To effectively use these files, you must be a bioinformaticist or computer programmer-scientist or collaborate with someone who has this expertise. Ideally, intuitive graphical user interfaces (GUIs) would allow all scientists to quickly and easily access these data and exploit the full power of the Cancer Gene Index data resource. Several preliminary \[caBIO|[https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page|https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page]\] interfaces already exist, and these can begin to give you an appreciation for the full potential of the data resource. In addition, \[geWorkbench|https://cabig.nci.nih.gov/tools/geWorkbench\] pulls some Cancer Gene Index data from caBIO as annotations on genomic data or the \[Cancer Molecular Analysis Portal|https://cma.nci.nih.gov/cma-rembrandt/\] provides limited views of the Cancer Gene Index data.At this time the caBIO interfaces are not yet fully-featured, as is the case with the caBIO Portlet Templated Search, and in many cases are difficult for the average scientist to utilize, as with the \[caBIO Portal|[http://cabioapi.nci.nih.gov/cabio43/Home.action|http://cabioapi.nci.nih.gov/cabio43/Home.action]\] and the \[Simple Search of the caBIO Portlet on the caGrid Portal|[http://cagrid-portal.nci.nih.gov/web/guest/community?p_p_id=cabioportlet_WAR_cabioportlets_INSTANCE_R7dp&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_cabioportlet_WAR_cabioportlets_INSTANCE_R7dp_struts_action=%2Fcabioportlet%2Fview&tabs1=Simple%20Search|http://cagrid-portal.nci.nih.gov/web/guest/community?p_p_id=cabioportlet_WAR_cabioportlets_INSTANCE_R7dp&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_cabioportlet_WAR_cabioportlets_INSTANCE_R7dp_struts_action=%2Fcabioportlet%2Fview&tabs1=Simple%20Search]\]. An effort is currently underway to improve the caBIO GUIs for scientist end users. In the future, it is expected that these and other GUIs will be fully functional for all scientists.With this step-by-step guide, a persistent and cautious scientist can use the Templated Search in conjunction with the caBIO Portal and NCI Thesaurus to find lists of genes associated with diseases or compounds or of diseases and compounds associated with a gene.h2. Selecting the Best Way for You to Access Cancer Gene Index Data

The following section will help you select the best means to access Cancer Gene Index data based on your experience with bioinformatics and computer programming. 
\\
<ac:structured-macro ac:name="anchor" ac:schema-version="1" ac:macro-id="29f96168-5345-4a47-a7fc-22d4d319246a"><ac:parameter ac:name="">atlassian-macro-note</ac:parameter></ac:structured-macro>\{info:title=Bioinformaticians and Scientist Programmers\}* If you have limited knowledge of the caBIO object model and caBIG®, you should use the Cancer Gene Index Gene-Disease and Gene-Compound XML documents and \[the XML documents guide|Link HERE\]. The format of these documents is extremely simple, making them very easy with which to work. To download the XML documents, you must have a computer with at least 720 MB of free disk space, an internet connection, and a web browser; other system requirements depend upon the way in which you intend to use the data resource.* If you are familiar with the caBIO object model and caBIG®, you may wish to use one of the \[caBIO APIs#APIs\]. The caBIO APIs allow you to uncover associations within the Cancer Gene Index data set and to find additional information linking these data with associated pathways, protein annotations, clinical protocols, and other biomedical entities. Even with knowledge of the caBIO object model, however, it can be difficult to construct complex queries of caBIO. For information on system requirements, please refer to the links for each API on the \[caBIO wiki|[https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page|https://wiki.nci.nih.gov/display/caBIO/caBIO+Wiki+Home+Page]\] page.