NIH | National Cancer Institute | NCI Wiki  

Document Information

Author: Craig Stancl, Scott Bauer, Cory Endle
Team: LexEVS
Contract: S13-500 MOD4
National Institutes of Heath
US Department of Health and Human Services

Table of Contents

LexEVS Current HitCollector Implementation

The current implementation of hit collectors is tied to coding scheme specific boundary doc ids which are going away with the BlockQuery implementation.  The common set of indexes is currently expected to replace the simple search extension.  If performance over the common set of new indexes can be proven out those HitCollector objects supporting simple search will go away as well. 


Proposed Collector Implementation using Lucene 5.0

The Collector interface replaces HitCollector in Lucene 5.0.  Given that much of the current HitCollector implementation is tied to the boundary doc implementation and the single index requirement to filter for a given coding scheme, custom HitCollectors or Collectors may no longer be necessary using a standard implementation of TopScoreDocsCollector.  We show a provisional implementation of Simple Search Extension here, which may be required if we cannot get the global searches of all Common Indexes to run quickly enough.  This will require implementation of a new Filter/Collector or Facet combination to sort out the portion of the indexes pertaining to a given coding scheme.

HitCollector Classes
//These classes appear to be on the main code path and will have to be adapted if we can't replace them 
//with TopScoreDocsCollector
//These BitSet using classes appear to dead end without being used by the API
//They seem most likely to go away.
//This is not expected to be needed in the multi coding scheme implementation.
  • No labels