Lucene Analyzers in LexEVS
Analyzer implementation under Lucene 5.0 is perhaps the most critical activity in the entire project. Getting them right ensures that the LexEVS API continues to return values in the same way it previously did. The analyzers are largely implemented per field, meaning there is a wrapper around several analyzers, each of which provides functionality towards tokenizing individual fields in one manner or another. Some of these analyzers might be able to be replaced with corresponding implementations from the Lucene libraries, but others are highly customized and make use of specially curated character escaping sets and special treatment for certain characters. Any changes in the way these work should be fully vetted with stakeholders. In general the entire set of text matching strategies should be audited for use and potential exclusion for purposes maintainability. This may result in a reduced dependency on custom analytics that serve a narrow purpose and requires more attention than off the shelf components.