Page History
...
After the Lucene search is complete, the system stores only the Document id of documents that match the search criteria. Then, when information from the document is needed, it is retrieved from the document. This is helpful in iterator-type scenarios, where retrieval can be done one at a time.
...
Background - Lucene Documents
Lucene stores information in documents, and these documents have fields that are used to hold information. Each document has a unique id. For example, an index of people may be indexed in Lucene as:
<source>
Code Block |
---|
Document: id 1 |
...
First Name: John |
...
Last Name: Doe |
...
Sex: Male |
...
Age: 45 |
...
Document: id 2 |
...
First Name: Jane |
...
Last Name: Doe |
...
Sex: Female |
...
Age: 40 |
...
... etc. |
</source>
LexEVS stores information about entities in this way. Property names and values, as well as qualifiers, language, and various other information about the entity are held in Lucene indexes.
...
Background - Querying Lucene
Lucene provides a query mechanism to search through the indexed documents. Given a search query, Lucene will provide the document id and the score of the match. (Lucene assigns every match a score, depending on the strength of the match given the query.)
...
Lazy retrieval can be leveraged to increase performance in LexEVS. Consider this simplified LexEVS entity index:
<source>
Code Block |
---|
Document: id 1 |
...
Code: C12345 |
...
Name: Heart |
...
Document: id 2 |
...
Code: C67890 |
...
Name: Foot |
...
Document: id 3 |
...
Code: C98765 |
...
Name: Heart Attack |
</source>
If a user constructs a query (Name = Heart*), the query will return with the matching Document ids (1 and 2). Previously, LexEVS would immediately retrieve the Code and Name fields from the matches, and use them to construct the results that would be ultimately returned to the user. This does not scale well, especially for general queries in large ontologies. In a large ontology, a query of (Name = Heart*) may match tens of thousands of documents. Retrieving the information from all these documents is a significant performance concern.
Instead of retrieving the information up front, LexEVS will simply store the document id for later use. When this information is actually needed by the user (for example, the information needs to be displayed), it is retrieved on demand.
Searching
The org.LexGrid.LexBIG.Extensions.Extendable.Search
...
Interface
This interface enables the user to plug in custom search algorithms. Users can construct any type of query given search text. The query can include wildcards, it can group search terms, etc.
...
This algorithm does not automatically assume that the user has spelled the terms incorrectly. Searches are also based on the actual text that the user has input, along with the Metaphone value. Again, if the user input "Breast", the query will still match "Breast" and "Prostrate", but "Breast" will have a higher match score, because the actual user text is considered. This algorithm adds a greater precision to this fuzzy-type query.
Algorithm:
<source>
Code Block |
---|
get: user text input |
...
2: total score = 0 |
...
3: metaphone score = 0 |
...
4: actual score = 0 |
...
5: metaphone value = lucene.computeMetaphoneValue(user text input) |
...
6: metaphone score = lucene.scoreMetaphoneValue(metaphone value) |
...
7: actual score = lucene.score(user text input) |
...
8: total score = metaphone score + actual score |
...
9: halt |
</source>
Case-insensitive
...
Substring
The SubStringSearch algorithm is intended to find substrings within a large string. For example:
'with a heart attack'
Will ...will match:
{{ 'The patient with a heart attack was seen today.'
}}
Also, a leading and trailing wildcard will be added, so
{{ 'th a heart atta'
}}
Will
...will also match:
{{ 'The patient wi_th a heart atta_ck was seen today.'
}}
Algorithm:
<source>
Code Block |
---|
get: user text input 2: user text input = '*' + user text input + '*' 3: score = lucene.score(user text input) 4: halt |
</source>
Sorting
The org.LexGrid.LexBIG.Extensions.Extendable.Sort
...
Interface
This interface allows users to plug in customized Sort algorithms to sort query results:
Class: | org.LexGrid.LexBIG.Extensions.Extendable.Sort |
Method: | public <T> Comparator<T> Comparator getComparatorForSearchClass(Class<T> Class searchClass) throws LBParameterException |
Description: | Given a Class that this Sort is valid for, return the correct Comparator to compare the results and sort. |
Method: | public boolean isSortValidForClass(Class<?> clazz) |
Description: | Return whether or not this Sort is valid for Sorting on a given Class. |
...
Given two database tables, retrieve the Code, Name, and Qualifier for each Code
Table Codes
Code | Name |
---|---|
C01234 | Heart |
C98765 | Heart Attack |
Table Qualifiers
Code | Qualifier |
---|---|
C01234 | isAnOrgan |
C98765 | isADisease |
...
Code Block |
---|
SELECT * FROM Codes |
</source>
Results in:
Code | Name |
---|---|
C01234 | Heart |
C98765 | Heart Attack |
...
Given two database tables, retrieve the Code, Name, and Qualifier for each Code.
Table Codes
Code | Name |
---|---|
C01234 | Heart |
C98765 | Heart Attack |
Table Qualifiers
Code | Qualifier |
---|---|
C01234 | isAnOrgan |
C98765 | isADisease |
...
Code Block |
---|
SELECT * FROM Codes JOIN Qualifiers ON Code |
</source>
Results in:
Code | Name | Qualifier | * |
---|---|---|---|
C01234 | Heart | isAnOrgan | |
C98765 | Heart Attack | isADisease |
...