NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Scrollbar
iconsfalse
Panel
titleDocument Information

Author: Craig Stancl, Scott Bauer, Cory Endle
Email: Stancl.craig@mayo.edu, bauer.scott@mayo.edu, endle.cory@mayo.edu
Team: LexEVS
Contract: S13-500 MOD4
Client: NCI CBIIT
National Institutes of Heath
US Department of Health and Human Services

Panel
titleTable of Contents

Table of Contents

...

The extent of text match algorithms in LexEVS has grown quite a lot over the decade the application has been in existence.   Many matching algorithms overlap in their functionality and dependencies.  We've created a review of each of these algorithms also with notes on their index dependencies and search focus with an eye towards simplifying and updating the search functionality.  NCI should review and decide if any of these can be removed or updated.

Current Text Matches

  • Lucene Query
  • phrase
  • contains
  • leading and trailing wild card
  • exact match
  • substring
  • spelling error tolerant substring match
  • stemmed lucene Lucene query
  • literal contains
  • starts with
  • non leading wild card literal substring
  • literal
  • Weighted double metaphone lucene Lucene query
  • literal substring
  • Double metaphone lucene Lucene query
  • Regular expression

Text Match Breakdown:

...

Search based on a \"*some sub-string here*\" Functions much like the Java String.indexOf method. Singe Single term searches will match '*term' and 'term*' but not '*term*'. This is because leading wildcards are very inefficient.  Special characters are included.  This seems to be very similar to the literal contains, but makes use of the reverse index.

...