LexEVS Analyzer Updates

Table of Contents

Lucene Analyzers in LexEVS

Analyzer implementation under Lucene 5.0 is perhaps the most critical activity in the entire project. Getting them right ensures that the LexEVS API continues to return values in the same way it previously did. The analyzers are largely implemented per field, meaning there is a wrapper around several analyzers, each of which provides guidance towards tokenizing these fields in one manner or another. Some of these analyzers might be able to be replaced with corresponding implementations from the Lucene libraries, but others are highly customized and make use of specially curated character escaping sets and special treatment for certain characters. Any changes in the way these work should be fully vetted with stakeholders. In general the entire set of text matching strategies should be audited for use and potential exclusion for purposes maintainability. This may result in a reduced dependency on custom analytics that serve a narrow purpose and requires more attention than off the shelf components.

Current Analyzer Classes

//Encodes for DoubleMetaphone.  Will replace with Lucene implementation of a Double Metaphone Analyzer
//if possible
edu.mayo.informatics.indexer.lucene.analyzers.EncoderAnalyzer

Content

Space Tools

LexEVS Analyzer Updates

Lucene Analyzers in LexEVS