Page History

Versions Compared

Old Version 1

changes.mady.by.user Unknown User (boydl)

Saved on Jan 28, 2010

compared with

New Version 2

changes.mady.by.user Unknown User (boydl)

Saved on Jan 28, 2010

Key

This line was added.
This line was removed.
Formatting was changed.

Section

Column

width	30%

Info

title	To Print the Guide

We recommend you print one wiki page of the guide at a time. To do this, click the printer icon at the top right of the page; then from the browser File menu, choose Print. Printing multiple pages at one time is more complex. For instructions, refer to How do I print multiple pages?.

Panel

title	Table of Contents

Table of Contents

maxLevel	2

Panel

Scrollbar

Column

High Frequency Sentence Count Gene Filtering

Early on, the decision was made to focus first on filtering HFG gene-disease (GD) sentences and then to go back to HFG gene-compound (GC) sentences. Natural language processing (NLP) filtering found that GD sentences described Expression-Gene Relationships (A), Abnormality-Gene Relationships (B), Biomarker-Gene Relationships (C), and/or Therapy-Gene Relationships (D). Thus, the GD sentences were classified into "quadrants" where Q1 sentences described all four relationship categories, Q2 any three categories, Q3 any two categories, and Q4 only one of the four categories. Q3 and Q4 sentences were all manually curated. Q1 and Q2 sentences were subjected to additional filtering criteria, and the three or four sentences from each of the two categories were selected for manual curation. A similar approach was taken for GC sentences, but NLP analysis of these pieces of evidence uncovered three relationship categories: Binding (A*), Regulation (B*), and Resistance (C*). All A*B*C* sentences (i.e., sentences describing all three GC categories) were manually curated. The remaining sentences were subjected to additional filtering steps, as before, to select those sentences that would be manually curated. Here, blue denotes GD flowchart objects, gray GC, and green both GD and GC. Dotted lines represent steps that occurred later in the GD workflow.

FIGURE HERE.Image Added

Content

Space Tools

Page History

Versions Compared

Old Version 1

New Version 2

Key

High Frequency Sentence Count Gene Filtering