NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be will be undergoing maintenance on Monday, June 24th between 1000 ET and 1100 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Contents of this Page

Weighted Double Metaphone Implementation Details

Search with the Lucene query syntax, using a 'sounds like' algorithm.  The exact user-entered text is taken into account, so correct spelling will override the 'sounds like' algorithm.  Searches on the same indexed property value as the other double metaphone search.

Algorithm:

The Weighted Double Metaphone search has the following characteristics:

  • This search is case in-sensitive. 
  • It searches on the double metaphone property value and the property value. 
  • Preference is given to the matches with the correct spelling.   
  • Parsing is done with the following analyzers:

    • dm_propertyValue - Uses our custom double metaphone analyzer.  This has the following filters:

      • LowerCaseFilter - for setting to lowercase
      • StopFilter - to remove stop words (the, a, etc.) from the search
      • DoubleMetaphoneFilter - for testing double metaphone sounds
    • propertyValue - Uses our custom standard analyzer that has no stop words.

     

Example of use:

The following examples are based on the Automobiles coding scheme.

Example 1:

Search string: car

Lucene query: +dm_propertyValue:KR propertyValue:car

Result: 2 results

  • Result 1:
    • entity code: C0001
    • entity description: Car
  • Result 2:
    • entity code: C0002
    • entity description: Kar


Example 2:
Search string: kar

Lucene query: +dm_propertyValue:KR propertyValue:kar

Result: 2 results

  • Result 1:
    • entity code: C0002
    • entity description: Kar
  • Result 2:
    • entity code: C0001
    • entity description: Car

Associated JUnits:

Junits can be found here: https://github.com/lexevs/lexevs/blob/master/lbTest/src/test/java/org/LexGrid/LexBIG/Impl/function/query/lucene/searchAlgorithms/TestWeightedDoubleMetaphone.java

 

  • No labels