NIH | National Cancer Institute | NCI Wiki  




Discussion items

University of Chicago Mapping

UC wants to map their local vocab to NCIt.  Lyubov will be meeting with them in mid-December.

Current state of mapping

Matching tools

  • OWL Compare - owl 1 only
  • LexEVS Compare - strings vs LexEVS query
  • Other direct LexEVS queries
    • Metathesaurus co-location
    • Filter based on branch or branches
    • Falling back on ascendents or descendents
    • Filter on semantic type
    • Matching on a specific annotation (Ex: DN or PT)
    • Lucene has some NLP (Levensteins distance, double metaphone)
      • In combination with filters
  • Kim's matching tool
    • Maps using triplestore loaded vocabularies

What we can have by end of year
  • Need access to UC data set
    • Need to extract their terms from their data
  • Can tweak matching algorithm to their set
  • Can show results, but not necessarily a tool in action
  • LexEVS Java client app that takes a list of terms and does various matches
    • General matching
    • Refinements
      • Expanding
        • Stemming
        • NLP
      • Limiting
        • Filters

What we can have in 6 months

UI for review

We will get many more mappings. 

User A has a map of their terms that involve ICD9 terms to match to ICD10.

User B comes in with map of their terms that involve ICD9 terms to match to ICD10 - but the terms already mapped above would be presented.

Application will need to evolve over time - as we get more and more mappings.

Direct Lucene (Elastic Search) searches

Text search service

We want to be able to store mappings, and use that knowledge to inform new mappings.

  • LexEVS stored mapping coding schemes
    • Add new mappings using authoring api
    • Would need to store mappings where one or both schemes not in LexEVS
  • Editing tool store mapping

Kim's tool demonstration

text file mapped against any vocabulary in TripleStore - through EVSRestAPI

Source code in github - in old Mapping tool project.

Action items