Substring Algorithm Implementation Details
Search based on a \"*some sub-string here*\". Functions much like the Java String.indexOf method. This requires two indexed fields to manage this without significant overhead. One field is the tokenized property value which causes no extra indexing, the other is reversed which requires an extra indexed field.
Algorithm:
The contains search has the following characteristics:
- This search is case in-sensitive.
- Only searches on the property value and literal property value.
- The literal property part of the query is boosted by 50. This gives a literal match priority.
- Performs a wildcardQuery
- Lowercase and special characters removed during query parser parse.
- Parsing is done with Lucene's StandardAnalyzer.
Example of use:
The following examples are based on the Automobiles coding scheme.
Example 1:
Search string: graph
Lucene query: +propertyValue:*graph* literal_propertyValue:graph^50.0
Complete query:
- +*:* +(entityType:concept)
- +*:* +isAnonymous:F
- +*:* +(+propertyValue:*graph* literal_propertyValue:graph^50.0) +(propertyType:presentation)
Result: 1 result
- entity code: NoRelationsConcept
- entity description: A concept for testing Graph Building on Concepts with no relations
Example 2:
Search string: graph building on
Lucene query: +spanNear([mask(spanWildcardQuery(reverse_propertyValue:hparg*)) as propertyValue, mask(propertyValue:building) as propertyValue, mask(spanWildcardQuery(propertyValue:on*)) as propertyValue], 0, true) ((+literal_propertyValue:graph +literal_propertyValue:building +literal_propertyValue:on)^50.0)
Complete query:
- +*:* +(entityType:concept)
- +*:* +isAnonymous:F
- +*:* +(+spanNear([mask(spanWildcardQuery(reverse_propertyValue:hparg*)) as propertyValue, mask(propertyValue:building) as propertyValue
- mask(spanWildcardQuery(propertyValue:on*)) as propertyValue], 0, true) ((+literal_propertyValue:graph +literal_propertyValue:building +literal_propertyValue:on)^50.0)) +(propertyType:presentation)
Result: 1 result
- entity code: NoRelationsConcept
- entity description: A concept for testing Graph Building on Concepts with no relations
Example 3:
Search string: ncept for testing graph
Lucene query: +spanNear([mask(spanWildcardQuery(reverse_propertyValue:tpecn*)) as propertyValue, mask(propertyValue:for) as propertyValue, mask(propertyValue:testing) as propertyValue, mask(spanWildcardQuery(propertyValue:graph*)) as propertyValue], 0, true) ((+literal_propertyValue:ncept +literal_propertyValue:for +literal_propertyValue:testing +literal_propertyValue:graph)^50.0)
Complete query:
- +*:* +(entityType:concept) +*:* +isAnonymous:F
- +*:* +(+spanNear([mask(spanWildcardQuery(reverse_propertyValue:tpecn*)) as propertyValue
- mask(propertyValue:for) as propertyValue
- mask(propertyValue:testing) as propertyValue, mask(spanWildcardQuery(propertyValue:graph*)) as propertyValue], 0, true) ((+literal_propertyValue:ncept +literal_propertyValue:for +literal_propertyValue:testing +literal_propertyValue:graph)^50.0)) +(propertyType:presentation)
Result: 1 result
- entity code: NoRelationsConcept
- entity description: A concept for testing Graph Building on Concepts with no relations
Associated JUnits:
Junits for contains tests can be found here: https://github.com/lexevs/lexevs/blob/master/lbTest/src/test/java/org/LexGrid/LexBIG/Impl/function/query/lucene/searchAlgorithms/TestSubString.java