Date: Fri, 29 Mar 2024 11:26:11 -0400 (EDT) Message-ID: <1243788732.1017.1711725971116@ip-10-208-26-37.ec2.internal> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_1016_1057401554.1711725971115" ------=_Part_1016_1057401554.1711725971115 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
To Print the Guide
We recommend you print one wiki page of the guide at a time. To do this,= click the printer icon at the top right of the page; then from the browser= File menu, choose Print. Printing multiple pages at one time is more compl= ex. For instructions, refer to Printing multip= le pages .
Having Trouble Reading the Text?
Resizing the text for any web page is easy. For information on how to do= this in your web browser, refer to this W3C tutorial <= /p>
All of the Gene-Disease and Gene-Compound Cancer Gen= e Index data, annotations, and metadata are available in the two XML docume= nts and the caBIO interfaces, excluding the caBIO Portlet Templated Search.= These data include NCI Thesaurus disease and compound terms, NCI Thesaurus= disease and compound concept identifiers, HUGO Gene Symbols, LocusLink Gen= e Symbols, UniProt Identifiers, the sentences that contained evidence of th= e gene-disease or gene-compound associations, and the PubMed identifier of = the abstract from which each sentence was extracted. The gene-disease and g= ene-compound data were then annotated by human curators with Evidence Codes= , Role Codes, and Role Details; Cell line and Negation Indicators; Sentence= and Gene Status Flags; and Comments.
For additional information on how these data were collected, validated, = and annotated, refer to the Creation of the Cancer Gene Index section.
Evidence codes qualify the assertions with respect to the association of= a gene to a disease or compound term by telling how the assertions were ma= de (for example, through inference or experimental data). The curators may = have identified the means by which an assertion using the extracted sentenc= e, alone, or through careful reading of the abstract from which the sentenc= e originated. These codes follow the suggestions of Karp et al.
for ontologies used in pathway and genome databases. The Evidence Code a=
ssociated with a specific gene-disease or gene-compound pair is found in th=
e text contents of the XML EvidenceCode
element and is the Evide=
nceCode
attribute of the caBIO Evidence Code class
,
Evidence Code |
Description |
---|---|
EV-IC |
Inferred by curator. An assertion was inferre= d by a curator from relevant information such as other assertions in a data= base. |
EV-COMP |
Inferred from computation. The evidence for a= n assertion comes from a computational analysis. The assertion itself might= have been made by an author or by a computer, that is, EV-COMP does not sp= ecify whether manual interpretation of the computation occurred. |
EV-COMP-HINF |
Human inference. A curator or author inferred= this assertion after review of one or more possible types of computational= evidence such as sequence similarity, recognized motifs or consensus seque= nce, etc. When the inference was made by a computer in an automated fashion= , use EV-AINF. |
EV-COMP-HINF-SIMILAR-TO-CONSENSUS |
An author inferred, or reviewed a computer in= ference of, sequence function based on similarity to a consensus sequence.<= /p> |
EV-COMP-HINF-POSITIONAL-IDENTIFICATION | An author inferred, or reviewed a computer in= ference of, promoter position relative to the -10 and -35 boxes. |
EV-COMP-HINF-FN-FROM-SEQ |
An author inferred, or reviewed a computer in= ference of, gene function based on sequence, profile, or structural similar= ity (as computed from sequence) to one or more other sequences. |
EV-COMP-AINF |
Automated inference. A computer inferred this= assertion through one of many possible methods such as sequence similarity= , recognized motifs or consensus sequence, etc. When a person made the infe= rence from computational evidence, use EV-HINF. |
EV-COMP-AINF-SINGLE-DIRECTON |
Automated inference of transcription unit bas= ed on single-gene direction. Existence of a single-gene transcription unit = for gene G is inferred computationally by the existence of upstream and dow= nstream genes transcribed in the opposite direction of G. |
EV-COMP-AINF-SIMILAR-TO-CONSENSUS |
A DNA sequence similar to previously known co= nsensus sequences is computationally identified. |
EV-COMP-AINF-POSITIONAL-IDENTIFICATION | Automated inference of promoter position rela= tive to the -10 and -35 boxes. |
EV-COMP-AINF-FN-FROM-SEQ |
Automated inference of function from sequence= . A computer inferred a gene function based on sequence, profile, or struct= ural similarity (as computed from sequence) to one or more other sequences.= |
EV-AS-TAS |
Traceable author statement. The assertion was= made in a publication =E2=80=93 such as a review =E2=80=93 that itself did= not describe an experiment supporting the assertion. The statement referen= ced another publication that supported the assertion, but it is unclear whe= ther that publication described an experiment that supported the assertion.= |
EV-AS-NAS |
Non-traceable author statement. The assertion= was made in a publication such as a review, without a reference to a publi= cation describing an experiment that supports the assertion. |
EV-EXP |
Inferred from experiment. The evidence for an= assertion comes from a wet-lab experiment of some type. |
EV-EXP-IPI |
IPI inferred from physical interaction The as= sertion was inferred from a physical interaction such as 2-hybrid interacti= ons, Co-purification, Co-immunoprecipitation, Ion/protein binding experimen= ts This code covers physical interactions between the gene product of inter= est and another molecule (or ion, or complex). For functions such as protei= n binding or nucleic acid binding, a binding assay is simultaneously IPI an= d IDA; IDA is preferred because the assay directly detects the binding. = |
EV-EXP-IDA |
IDA inferred from direct assay. The assertion= was inferred from a direct experimental assay such as Enzyme assays, In vi= tro reconstitution (for example, transcription), Immunofluorescence, Cell f= ractionation, etc. |
EV-EXP-IDA-UNPURIFIED-PROTEIN |
Direct assay of unpurified protein. Presence = of a protein activity is indicated by an assay. However, the precise identi= ty of the protein with that activity is not established by this experiment = (protein has not been purified). |
EV-EXP-IDA-TRANSCRIPTION-INIT-MAPPING |
The transcription start site is identified by= primer extension. |
EV-EXP-IDA-TRANSCRIPT-LEN-DETERMINATION = td> | The length of the (transcribed) RNA is experi= mentally determined. The length of the mRNA is compared with that of the DN= A sequence and by this means the number of genes transcribed are establishe= d. |
EV-EXP-IDA-RNA-POLYMERASE-FOOTPRINTING | The binding of RNA polymerase to a DNA region= (the promoter) is shown by footprinting. |
EV-EXP-IDA-PURIFIED-PROTEIN-MULTSPECIES = td> | Protein purified from mixed culture or other = multispecies environment (such as, infected plant or animal tissue), and ac= tivity measured through in vitro assay. |
EV-EXP-IDA-PURIFIED-PROTEIN |
Protein purified to homogeneity from specific= species (or from heterologous expression vector), and activity measured th= rough in vitro assay. |
EV-EXP-IDA-BOUNDARIES-DEFINED |
Sites or genes bounding the transcription uni= t are experimentally identified. Several possible cases exist, such as defi= ning the boundaries of a transcription unit with an experimentally identifi= ed promoter and terminator, or with a promoter and a downstream gene that i= s transcribed in the opposite direction, or with a terminator and an upstre= am gene that is transcribed in the opposite direction. |
EV-EXP-IDA-BINDING-OF-PURIFIED-PROTEINS = td> | IDA inferred from direct assay. The assertion= was inferred from a direct experimental assay such as Enzyme assays, In vi= tro reconstitution (for example, transcription), Immunofluorescence, Cell f= ractionation. |
EV-EXP-IDA-BINDING-OF-CELLULAR-EXTRACTS = td> | There exists physical evidence of the binding= of cellular extracts containing a regulatory protein to its DNA binding si= te. This can be either by footprinting or mobility shift assays. |
EV-EXP-IEP |
IEP inferred from expression pattern. The ass= ertion was inferred from a pattern of expression data such as Transcript le= vels (for example, Northerns, microarray data), Protein levels (for example= , Western blots). |
EV-EXP-IEP-GENE-EXPRESSION-ANALYSIS |
The expression of the gene is analyzed throug= h a transcriptional fusion (that is, lacZ), and a difference in expression = levels is observed when the regulatory protein is present (wild type) vs in= its absence. Note that this evidence does not eliminate the possibility of= an indirect effect of the regulator on the regulated gene. |
EV-EXP-IGI |
IGI inferred from genetic interaction. The as= sertion was inferred from a genetic interaction such as "Traditional" genet= ic interactions such as suppressors, synthetic lethals, etc., Functional co= mplementation, Inference about one gene drawn from the phenotype of a mutat= ion in a different gene. This category includes any combination of alterati= ons in the sequence (mutation) or expression of more than one gene/gene pro= duct. This category can therefore cover any of the IMP experiments that are= done in a non-wild-type background, although we prefer to use it only when= all mutations are documented. |
EV-EXP-IGI-FUNC-COMPLEMENTATION |
Protein activity inferred by isolating its ge= ne and performing functional complementation of a well characterized hetero= logous mutant for the protein. |
EV-EXP-IMP |
IMP inferred from mutant phenotype. The asser= tion was inferred from a mutant phenotype such as Any gene mutation/knockou= t, Overexpression/ectopic expression of wild-type or mutant genes, Anti-sen= se experiments, RNA interference experiments, Specific protein inhibitors, = Complementation. Inferences made from examining mutations or abnormal level= s of only the product(s) of the gene of interest are covered by code EV-IMP= (compare to code EV-IGI). Use this code for experiments that use antibodie= s or other specific inhibitors of RNA or protein activity, even though no g= ene may be mutated (the rationale is that EV-IMP is used where an abnormal = situation prevails in a cell or organism). |
EV-EXP-IMP-REACTION-ENHANCED |
Gene is isolated and over-expressed, and incr= eased accumulation of reaction product is observed. |
EV-EXP-IMP-POLAR-MUTATION |
If a mutation in a gene or promoter prevents = expression of the downstream genes due to a polar effect, the mutated gene = is clearly part of the transcription unit. |
EV-EXP-IMP-REACTION-BLOCKED |
Mutant is characterized, and blocking of reac= tion is demonstrated. |
EV-EXP-IMP-SITE-MUTATION |
A cis-mutation in the DNA sequence of the tra= nscription-factor binding site interferes with the operation of the regulat= ory function. This is considered strong evidence for the existence and func= tional role of the DNA binding site. |
not_assigned |
Evidence Code was not assigned. |
based on abstract |
Determinations of whether the gene-disease or= gene-compound association from a sentence was factual based upon an expert= 's interpretation of the abstract from which the sentence originated. = td> |
|
No Evidence Code was assigned because the sen= tence did not contain the expected gene-disease or gene-compound associatio= n evidence. |
The Cancer Gene Index Role Codes and Role Details are derived from NCI Role Codes. Both describe the semanti= c associations between gene concept and either a disease or compound concep= t (that is, concept pairs). Whereas the Evidence Codes describe how the ass= ociation was inferred or the type of experiment upon which the inference wa= s made, Role Codes and Role Details give information about the actual gene-= disease or gene-compound association. Multiple Role Codes and Role Details = can be used for the same sentence.
Note
A concept is the actual compound, disease, or gene to which the various = names, acronyms, alternate spellings, and abbreviations refer.
Gene-Disease and Gene-Compound Role Codes most often describe that a gen=
e is associated with a disease or compound (for example, GENE_ASSOCIA=
TED_WITH_DISEASE
) or how the concepts are associated (for example, C=
hemical_or_Drug_Is_Metabolized_By_Enzyme), but they also may describe relev=
ant features of the role of a particular gene (for example, GENE_HAS_=
FUNCTION
). For the former, the gene name, Role Code, and disease or =
compound often can form a sentence, such as "BRCA1 GENE_ASSOCIATED_WI=
TH_DISEASE
BREAST CANCER." The Role Code not_assigned
i=
ndicates that the curator did not or could not assign a specific code. The =
Role Code associated describing evidence of a gene-disease or gene-compound=
pair is found in the text contents of the XML PrimaryNCIRoleCode
elem=
ent and is the role attribute of the caBIO GeneDiseaseAssociation and
GeneAgentAssociation
classes, gov.nih.nci.cabio.=
domain.GeneDiseaseAssociation
and gov.nih.nci.cabio.domain.Gen=
eAgentAssociation
.
Note
Although pharmacological substances are referred to as "compounds" in th= e Cancer Gene Index, the NCI Thesaurus and caBIO use the term "agent."
Gene-Disease and Gene-Compound Role Details most often provide precise d=
escriptions of the association of a gene term and a corresponding disease o=
r compound term. These Details can also describe relevant features of the r=
ole of a particular gene (for example, Chemical_or_Drug_Represses_Gen=
e_Product_Expression
). While similar to Role Codes, Role Details giv=
e more specific semantic descriptions. For example, a Role Detail for a par=
ticular gene-disease concept pair association may be GENE_PRODUCT_UPR=
EGULATED_IN_DISEASE
, whereas a similar role code may be GENE_A=
SSOCIATED_WITH_DISEASE
. The Role Detail not_assigned
in=
dicates that the curator did not or could not assign a specific semantic de=
tail. The Role Detail associated with a specific gene-disease or gene-compo=
und pair is found in the text contents of the XML OtherRole
element an=
d is, like Role Codes, the role attribute of the caBIO GeneDiseaseAss=
ociation
and GeneAgentAssociation
classes.
The binary indicators yes
or no
were set for c=
ell line and negation annotations of each gene-disease and gene-compound as=
sociation. Cell line indicators denote whether th=
e evidence came from a cell line (yes
) or other source, such a=
human subject, animal model, or primary cells (no
). The cell =
line indicator is the text contents of the XML CelllineIndicator element and the
celllineStatus
attribute of the caBIO Evidence
class. Negation indicators speci=
fy whether the evidence actually described a lack of association between th=
e candidate binary concept pair (yes
), or whether there was a =
true relationship between them (no
). The curators may have ded=
uced the negation indicator by the extracted sentence, alone, or through ca=
reful reading of the abstract from which the sentence originated. Occasiona=
lly, the curators did not set a negation indicator (-
). The ne=
gation indicator is the text contents of the XML NegationIndicator
ele=
ment and the negationStatus
attribute of the caBIO Evide=
nce
class.
The curators set flags to denote the state of annotations of a gene term= or of a particular sentence.
Gene Status Flags describe whether annotations for all of the sentences =
for a given gene term are complete and whether the gene term has been withd=
rawn from EntrezGene. All low frequency sentence count genes were finished,=
whereas some high frequency sentence count genes=
were not. The status of a specific gene is found in the text contents =
of the XML=
a> GeneStatusFlag
element.
Gene Status Flag |
Status Flag Description |
---|---|
Finished |
All sentences with this gene have been annota= ted |
New |
Not all sentences with this high frequency se= ntence count gene have been annotated. |
Withdrawn |
Gene term has been withdrawn from EntrezGene = and, where possible, the term has been mapped to a valid EntrezGene term.= p> |
Entry withdrawn |
Gene term has been withdrawn from EntrezGene = and, where possible, the term has been mapped to a valid EntrezGene term.= p> |
Unlike gene status flags, which can cover many sentences associated with=
a single gene, Sentence Status Flags describe the curator's findings for a=
specific sentence. Sentences can be true positives, false positives, uncle=
ar, or redundant. The status of a specific gene is found in the text conten=
ts of the XM=
L SentenceStatusFlag
element and is the sentenceStat=
us
attribute of the caBIO Evidence
class.
Sentence Status Flag |
Status Flag Description |
---|---|
Finished |
Sentence validation and annotation complete= p> |
No_fact |
Invalid sentence or false positive |
Unclear |
Sentence included both a gene and disease or = gene and compound term, but the relationship between the gene-disease or ge= ne-compound pair was not obvious from the sentence. |
Redundant |
Identical gene-disease or gene-compound assoc= iations were captured from multiple sentences originating from the same abs= tract. |
Often, the expert curators made free-text comments on records within the=
Gene-Disease or Gene-Compound databases. Comments included, but were not l=
imited to, notations of genetic anomalies (for example, loss of heterozygos=
ity, polymorphisms, or aberrant methylation), additional disease informatio=
n, name of the non-human organism from which the experimental data were col=
lected, information on the cell line or other notable reagents used in the =
execution of the experiment, and other miscellaneous information. Any comme=
nts on a sentence are found in the text contents of the XML Comments
e=
lement and the comment
attribute of the caBIO Evidence=
code> class.