NIH | National Cancer Institute | NCI Wiki  

Contents of this Page

Weighted Double Metaphone Implementation Details

Search with the Lucene query syntax, using a 'sounds like' algorithm.  The exact user-entered text is taken into account, so correct spelling will override the 'sounds like' algorithm.  Searches on the same indexed property value as the other double metaphone search.

Algorithm:

The Weighted Double Metaphone search has the following characteristics:

  • This search is case in-sensitive. 
  • It searches on the double metaphone property value and the property value. 
  • Preference is given to the matches with the correct spelling.   
  • Parsing is done with the following analyzers:

    • dm_propertyValue - Uses our custom double metaphone analyzer.  This has the following filters:

      • LowerCaseFilter - for setting to lowercase
      • StopFilter - to remove stop words (the, a, etc.) from the search
      • DoubleMetaphoneFilter - for testing double metaphone sounds
    • propertyValue - Uses our custom standard analyzer that has no stop words.

     

Example of use:

The following examples are based on the Automobiles coding scheme.

Example 1:

Search string: car

Lucene query: +dm_propertyValue:KR propertyValue:car

Result: 2 results

  • Result 1:
    • entity code: C0001
    • entity description: Car
  • Result 2:
    • entity code: C0002
    • entity description: Kar


Example 2:
Search string: kar

Lucene query: +dm_propertyValue:KR propertyValue:kar

Result: 2 results

  • Result 1:
    • entity code: C0002
    • entity description: Kar
  • Result 2:
    • entity code: C0001
    • entity description: Car

Associated JUnits:

Junits can be found here: https://github.com/lexevs/lexevs/blob/master/lbTest/src/test/java/org/LexGrid/LexBIG/Impl/function/query/lucene/searchAlgorithms/TestWeightedDoubleMetaphone.java

 

  • No labels