NIH | National Cancer Institute | NCI Wiki  

Contents of this Page

Non Leading Wild Card Literal Substring Implementation Details

Search based on  \"*some sub-string here*\" Functions much like the Java String.indexOf method. Single term searches will match '*term' and 'term*' but not '*term*'. This is because leading wildcards are very inefficient.  Special characters are included. 

Algorithm:

The Non Leading Wild Card Literal Substring search has the following characteristics:

  • This search is case in-sensitive. 
  • It searches on the literal property value and the literal reverse property value. 
  • A trailing wild card is applied to the literal property value an the literal reverse property value.
  • The literal property part (without the wild cards) of the query is boosted by 50.  This gives a literal match priority.  
  • Parsing is done with the following analyzer:

    • literal_propertyValue - Uses our custom literal analyzer.  This literal analyzer uses Lucene's WhitespaceTokenizer with Lucene's LowerCaseFilter.

     

Example of use:

The following examples are based on the Automobiles coding scheme.

Example 1:

Search string: grap

Lucene query: +(literal_propertyValue:grap* literal_reverse_propertyValue:parg*) literal_propertyValue:grap^50.0

Result: 1 result

  • entity code: NoRelationsConcept
  • entity description: A concept for testing Graph Building on Concepts with no relations

Example 2:
Search string: rap

Lucene query: +(literal_propertyValue:rap* literal_reverse_propertyValue:par*) literal_propertyValue:rap^50.0

Result: 0 results

  • entity code:
  • entity description:

Associated JUnits:

Junits can be found here: https://github.com/lexevs/lexevs/blob/master/lbTest/src/test/java/org/LexGrid/LexBIG/Impl/function/query/lucene/searchAlgorithms/TestSubStringNonLeadingWildcardLiteralSubString.java

 

 

  • No labels