NIH | National Cancer Institute | NCI Wiki  

Document Information

Author: Craig Stancl, Scott Bauer, Cory Endle
Email: Stancl.craig@mayo.edu, bauer.scott@mayo.edu, endle.cory@mayo.edu
Team: LexEVS
Contract: S13-500 MOD4
Client: NCI CBIIT
National Institutes of Heath
US Department of Health and Human Services

Table of Contents

Relational Indexing Implementation in Current LexEVS

All Documents have a unique document Id that identifies them as "the same" entity, but the boundary docs provide additional evidence that they are the start and finish of a set of documents all related to the same concept by providing a "CodeBoundary" identifier indicating this is the start or end of the document set for this entity representation.  This effectively flattens the entity to properties relationship by adding all entity information to each property document which includes up to 34 indexed fields and about 9 stored fields.

 

Proposed Relational Indexing Implementation under Lucene 5.0.0

Lucene provides support for limited relational expression starting with Lucene 3.4.  A single level parent/child relationship can be maintained as a one to many relationship between one document and several others.  This support provides LexEVS with an opportunity to improve index search times and reduce index size.

Relational Querying in Current LexEVS Implementation

LexEVS applies Lucene filters to get to the correct coding scheme level in the index and subsequently applies a series of queries. 

Relational Querying Using Lucene 5.0.0 BlockJoinQuery

Regardless of whether we query from the Child or from the Parent we have access from one to the other. 

A query is constructed to provide a wrapper to queries to the children of the parent document returning all matching document contained in the parent.  Filters on the parent, such as whether it is active or not work against the entity metadata contained there.  A collector verifies and provides sorting support for the result (formerly HitCollector)

 

The Relational Property Querying Mechanism Problem in Lucene

LexEVS Lucene Relational Query Implementation

Boundary Docs Index Related Classes

 

Boundary Docs Related Classes
//This appears to be where a good part of the filtering on boundary docs takes place.  A rethinking and reimplementing of all of these will be necessary.
org.lexevs.dao.index.lucene.v2010.entity.SingleTemplateDisposableLuceneCommonEntityDao
org.lexevs.dao.index.lucene.AbstractBaseLuceneIndexTemplateDao
org.lexevs.dao.index.access.entity.CommonEntityDao

//Deeply integrated with boundary doc position and scoring.  Will need to be rethought and reimplemented against the new Collector class
//in Lucene.  This will also be moved from the Indexer into the DAO project.
edu.mayo.informatics.indexer.lucene.hitcollector.AbstractBestScoreOfEntityHitCollector<T>
//This companion to it doesn’t seem to have a code path that calls it and may be able to be disposed of
edu.mayo.informatics.indexer.lucene.hitcollector.BitSetFilteringBestScoreOfEntityHitCollector

//These classes will all need to be revised or replaced to implement
//the BlockJoinQuery indexing support
edu.mayo.informatics.indexer.api.generators.DocumentFromStringsGenerator
org.lexevs.dao.index.indexer.LuceneLoaderCodeIndexer
org.lexevs.dao.index.indexer.LuceneLoaderCode

//While most code paths for indexing flow through this class accessed through a BaseLoader class
//a number of alternative paths include two special case loaders and the revision/authoring API
org.lexevs.dao.index.indexer.EntityBatchingIndexCreator

//Some query methods have remained the same and should not require updating when moving to 5.0.
//This query building class and method should remain relatively untouched
org.LexGrid.LexBIG.Impl.dataAccess.RestrictionImplementations.getQuery(Restriction, String, String) 

Boundary Docs Query Related Classes

Query Related Classes
//This listing shows the hierarchy of methods when called by CodedNodeSetImpl through
//AbstractMultiSingleLuceneIndexCodedNodeSet but represents one of several code paths to CodedNodeSetImpl
org.lexevs.dao.index.lucenesupport.BaseLuceneIndexTemplate.search(Query, Filter, HitCollector)
org.lexevs.dao.index.lucene.v2010.entity.SingleTemplateDisposableLuceneCommonEntityDao.query(Query)
org.lexevs.dao.index.service.entity.LuceneEntityIndexService.queryCommonIndex(List<AbsoluteCodingSchemeVersionReference>, Query)
org.LexGrid.LexBIG.Impl.helpers.lazyloading.AbstractLazyCodeHolderFactory.buildCodeHolder(List<AbsoluteCodingSchemeVersionReference>, Query)
org.LexGrid.LexBIG.Impl.codednodeset.UnionSingleLuceneIndexCodedNodeSet.buildCodeHolder()
org.LexGrid.LexBIG.Impl.codednodeset.AbstractMultiSingleLuceneIndexCodedNodeSet.toBruteForceMode(String, String)
org.LexGrid.LexBIG.Impl.CodedNodeSetImpl.getCodeHolder()

//Other code paths reach into the Simple Search Extension.  Portions of this will go away
org.lexevs.dao.index.lucenesupport.BaseLuceneIndexTemplate.search(Query, Filter, HitCollector)
org.lexevs.dao.index.lucene.v2013.search.LuceneSearchDao.query(Query)
org.lexevs.dao.index.service.search.LuceneSearchIndexService.query(Set<AbsoluteCodingSchemeVersionReference>, Set<AbsoluteCodingSchemeVersionReference>, Query)
org.LexGrid.LexBIG.Impl.Extensions.GenericExtensions.search.SearchExtensionImpl.search(String, Set<CodingSchemeReference>, Set<CodingSchemeReference>, MatchAlgorithm, boolean, boolean)
  • No labels