NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be undergoing maintenance on Thursday, May 23rd between 1200 ET and 1300 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Error rendering macro 'rw-search'

null

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Contents of this Page

Stemmed Lucene Implementation Details

Search with the Lucene query syntax, using stemmed terms.  A search for 'trees' will get a hit on 'tree'  This requires an extra indexed field when it is enabled in the load.

Algorithm:

The Stemmed Lucene search has the following characteristics:

  • This search is case in-sensitive. 
  • It searches on the stem property value.  
  • Parsing is done with our custom stemming analyzer.  This has the following filters:
    • LowerCaseFilter - for setting to lowercase
    • StopFilter - to remove stop words (the, a, etc.) from the search
    • SnowballFilter - for stemming

Example of use:

The following examples are based on the Automobiles coding scheme.

Example 1:

Search string: Automobiles

Lucene query: stem_propertyValue:automobil

Result: 1 result

  • entity code: A0001
  • entity description: Automobile


Example 2:
Search string: Automobiled

Lucene query: stem_propertyValue:automobil

Result: 1 result

  • entity code: A0001
  • entity description: Automobile

Associated JUnits:

Junits can be found here: https://github.com/lexevs/lexevs/blob/master/lbTest/src/test/java/org/LexGrid/LexBIG/Impl/function/query/lucene/searchAlgorithms/TestStemming.java

 

 

  • No labels