NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{scrollbar:icons=false}
Panel
titleDocument Information

Author: Craig Stancl, Scott Bauer, Cory Endle
Email: Stancl.craig@mayo.edu, bauer.scott@mayo.edu, endle.cory@mayo.edu
Team: LexEVS
Contract: S13-500 MOD4
Client: NCI CBIIT
National Institutes of Heath
US Department of Health and Human Services

Panel
titleColorblack
titleTable of Contents
borderStylesolid

Table of Contents

Relational Indexing Implementation in Current LexEVS

All Documents have a unique document Id that identifies them as "the same" entity, but the boundary docs provide additional evidence that they are the start and finish of a set of documents all related to the same concept by providing a "CodeBoundary" identifier indicating this is the start or end of the document set for this entity representation.  This effectively flattens the entity to properties relationship by adding all entity information to each property document which includes up to 34 indexed fields and about 9 stored fields.

 

Proposed Relational Indexing Implementation under Lucene 5.0.0

Lucene provides support for limited relational expression starting with Lucene 3.4.  A single level parent/child relationship can be maintained as a one to many relationship between one document and several others.  This support provides LexEVS with an opportunity to improve index search times and reduce index size.

Relational Querying in Current LexEVS Implementation

LexEVS applies Lucene filters to get to the correct coding scheme level in the index and subsequently applies a series of queries. 

Relational Querying Using Lucene 5.0.0 BlockJoinQuery

Regardless of whether we query from the Child or from the Parent we have access from one to the other. 

A query is constructed to provide a wrapper to queries to the children of the parent document returning all matching document contained in the parent.  Filters on the parent, such as whether it is active or not work against the entity metadata contained there.  A collector verifies and provides sorting support for the result (formerly HitCollector)

 

The Relational Property Querying Mechanism Problem in Lucene

Add Page
nameLexEVS Lucene Relational Query Implementation
linkTextLexEVS Lucene Relational Query Implementation
typepage

Boundary Docs Index Related Classes

 

Code Block
languagejava
titleBoundary Docs Related Classes
//This appears to be where a good part of the filtering on boundary docs takes place.  A rethinking and reimplementing of all of these will be necessary.
org.lexevs.dao.index.lucene.v2010.entity.SingleTemplateDisposableLuceneCommonEntityDao
org.lexevs.dao.index.lucene.AbstractBaseLuceneIndexTemplateDao
org.lexevs.dao.index.access.entity.CommonEntityDao

//Deeply integrated with boundary doc position and scoring.  Will need to be rethought and reimplemented against the new Collector class
//in Lucene.  This will also be moved from the Indexer into the DAO project.
edu.mayo.informatics.indexer.lucene.hitcollector.AbstractBestScoreOfEntityHitCollector<T>
//This companion to it doesn’t seem to have a code path that calls it and may be able to be disposed of
edu.mayo.informatics.indexer.lucene.hitcollector.BitSetFilteringBestScoreOfEntityHitCollector

//These classes will all need to be revised or replaced to implement
//the BlockJoinQuery indexing support
edu.mayo.informatics.indexer.api.generators.DocumentFromStringsGenerator
org.lexevs.dao.index.indexer.LuceneLoaderCodeIndexer
org.lexevs.dao.index.indexer.LuceneLoaderCode

//While most code paths for indexing flow through this class accessed through a BaseLoader class
//a number of alternative paths include two special case loaders and the revision/authoring API
org.lexevs.dao.index.indexer.EntityBatchingIndexCreator

//Some query methods have remained the same and should not require updating when moving to 5.0.
//This query building class and method should remain relatively untouched
org.LexGrid.LexBIG.Impl.dataAccess.RestrictionImplementations.getQuery(Restriction, String, String) 

Boundary Docs Query Related Classes

Code Block
languagejava
titleQuery Related Classes
//This listing shows the hierarchy of methods when called by CodedNodeSetImpl through
//AbstractMultiSingleLuceneIndexCodedNodeSet but represents one of several code paths to CodedNodeSetImpl
org.lexevs.dao.index.lucenesupport.BaseLuceneIndexTemplate.search(Query, Filter, HitCollector)
org.lexevs.dao.index.lucene.v2010.entity.SingleTemplateDisposableLuceneCommonEntityDao.query(Query)
org.lexevs.dao.index.service.entity.LuceneEntityIndexService.queryCommonIndex(List<AbsoluteCodingSchemeVersionReference>, Query)
org.LexGrid.LexBIG.Impl.helpers.lazyloading.AbstractLazyCodeHolderFactory.buildCodeHolder(List<AbsoluteCodingSchemeVersionReference>, Query)
org.LexGrid.LexBIG.Impl.codednodeset.UnionSingleLuceneIndexCodedNodeSet.buildCodeHolder()
org.LexGrid.LexBIG.Impl.codednodeset.AbstractMultiSingleLuceneIndexCodedNodeSet.toBruteForceMode(String, String)
org.LexGrid.LexBIG.Impl.CodedNodeSetImpl.getCodeHolder()

//Other code paths reach into the Simple Search Extension.  Portions of this will go away
org.lexevs.dao.index.lucenesupport.BaseLuceneIndexTemplate.search(Query, Filter, HitCollector)
org.lexevs.dao.index.lucene.v2013.search.LuceneSearchDao.query(Query)
org.lexevs.dao.index.service.search.LuceneSearchIndexService.query(Set<AbsoluteCodingSchemeVersionReference>, Set<AbsoluteCodingSchemeVersionReference>, Query)
org.LexGrid.LexBIG.Impl.Extensions.GenericExtensions.search.SearchExtensionImpl.search(String, Set<CodingSchemeReference>, Set<CodingSchemeReference>, MatchAlgorithm, boolean, boolean)