NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be will be undergoing maintenance on Monday, June 24th between 1000 ET and 1100 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{scrollbar:icons=false}
Panel
titleContents of this Page
Table of Contents
minLevel2

 

Starts With Algorithm Implementation Details

Equivalent to '* term* *' - in other words - a trailing wildcard on a term (but no leading wild card) and the term can appear at any position.

Algorithm:

The Starts With search has the following characteristics:

  • This search is case in-sensitive. 
  • It searches on the untokenizedLCPropertyValue and the property value.
  • The literal property part of the query is boosted by 50.  This gives a literal match priority.
  • A trailing wild card is on the term (but no leading wild card) and the term can appear at any position.
  • Lowercase and special characters removed during query parser parse.
  • Parsing is done with the following analyzers: 
    • untokenizedLCPropertyValue - Analyzers are not applied to property value.  However, the expression is lower cased (this is an explicit step done outside of Lucene by LexEVS code). 
    • literal_propertyValue - Uses our custom literal analyzer.  This literal analyzer uses Lucene's WhitespaceTokenizer with Lucene's LowerCaseFilter.

Example of use:

The following examples are based on the Automobiles coding scheme.

Example 1:

Search string: automob

Lucene query: +untokenizedLCPropertyValue:automob* literal_propertyValue:automob^50.0

Result: 1 result

  • entity code: A0001
  • entity description: Automobile

Example 2:

Search string: Car (with special) charaters!

Lucene query: +untokenizedLCPropertyValue:car (with special) charaters!* ((+literal_propertyValue:car +literal_propertyValue:(with +literal_propertyValue:special) +literal_propertyValue:charaters!)^50.0)

Result: 1 result

  • entity code: C0001
  • entity description: Car

Associated JUnits:

Junits tests can be found here: https://github.com/lexevs/lexevs/blob/master/lbTest/src/test/java/org/LexGrid/LexBIG/Impl/function/query/lucene/searchAlgorithms/TestStartsWith.java