NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

 

Contents of This Page

Document Information

Author: Bauer, Scott
Email: Bauer.Scott@mayo.edu
Team: LexEVS
Contract: ST12-1106
Client: NCI CBIIT
National Institutes of Heath
US Department of Health and Human Services

Revision History

Version

Date

Description of Changes

Author

1.0

2013/03/05

Initial Version

Bauer, Scott

 

Overview

Search mechanisms using Lucene can become overburdened as increasing numbers of use cases make for a larger and more complex Lucene index.  When a use case as comprehensive as searching an entire multi-source terminology service for a term or fragment is proposed, then a more restricted manner of approaching the problem is in order.  Furthermore the  use of the power of the LexEVS API becomes overkill, with the necessity of the union of large datasets prior to the effective query,  Subsequently, a more restricted approach is necessary.

A Simplified Search API

The model for Entity in LexEVS/LexGrid is robust and complete enough that it allows extensive metadata and properties to be compounded within  the entity's scope when loaded from an extensively defined source into the LexGrid data model.  However, the lexical, or human readable aspects of this model element are relatively restricted and among those aspects that are human readable few need to be searched in order to return results that fit into the majority of use cases.  Reducing the matching algorithms available to a text match in these circumstances can also allow relatively good performance over a larger data set in Lucene.   The proposition here is to use a "contains" match algorithm only.

A Higher Performance Lucene

Lucene indexes can be created to perform searches in a wide variety of text match algorithms, natural language processing paradigms, customized normalization methods, regular expression implementations and more.  The resulting indexes can can be large and complex enough to slow result returns on a given Lucene query.  Breaking out a restricted Lucene index using the simplest of these could provide needed speed in result returns. 

Multi-Terminology Searches

No matter what the implementation searching sources that have millions of records is challenging.  Searching several sources of size can be even more so.  Combining a restricted search API with a well tuned Lucene index, should make the results getting performance much more efficient.

 

  • No labels