Page History
Panel | ||
---|---|---|
| ||
Author: Craig Stancl, Scott Bauer, Cory Endle |
Panel | ||||
---|---|---|---|---|
| ||||
|
The purpose of this document is to document the technical face to face meeting details between the NCI and the LexEVS Team.
2016 November/December Face-to-Face Meeting Notes
Wednesday, November 30th, 2016
Time | Location | Topics | Participants |
---|---|---|---|
9:00 AM - 10:00 AM | 4W030 | User Group Discussion
| CTRP, caDSR, GDC |
Attendees: Larry, Jason, Kim, Craig, Scott, Cory, John, Liz, Sima, Rui, Natalia, Tracy, Sana, Tin, Gilberto
Discussion Points:
- caDSR Team represented by Sima, Natalia, Vikram
- caDSR Applications that use EVS
- Sentinel - Alerts for concepts, job that does concept clean up (compares concepts)
- Curation tooling - links to concepts, and search results.
- CDE Browser - Concepts used from search results.
- Semantic integration workbench - concepts used from search results
- CDEs
- Utilize the NCIT, NCI meta,
- Look into NCIT - will use the concept to describe the CDE.
- Organizing concepts to build CDE terminology.
- CDEs are used for forms (permissible values on forms)
- Tooling hasn't changed or been replaced.
- Currently use JARS and put in /lib
- Using Remote API today.
- Recently removed EJB
- Need to consider architecture in the future.
- MDR is planning to architect a solution moving forward.
- Currently searches are restricted to preferred terms.
- Building data element - definitional information, preferred name
- Existing CDE - pull back perferred name.
- Current Tooling Issues
- Need to have a data load completed to PROD.
- Confirmed data load and ready once things move to production.
- New LexEVS Jars will be included in next release.
- Remote API Architecture
- Issues
- Replacement of JARS
- Serialization of objects.
- Issues
- Proposed Architecture?
- REST-ful API
Decision Points:
- caDSR to provide list of what is currently used in the Java API
- EVS team to provide feedback as to how to do things better.
- EVS team to ensure that if REST-ful API is created, functionality to be prioritized.
Time | Location | Topics | Participants |
---|---|---|---|
10:00 AM - 11:00 AM | 4W030 | RESTful API Discussion
|
Attendees: Larry, Jason, Kim, Craig, Scott, Cory, John, Liz, Sima, Rui, Natalia, Tracy, Sana, Tin, Gilberto, Jacob
Discussion Points:
- Browser use cases reviewed.
- Additional cases
- History - the browser currently uses.
- Security - for Medra or other licensed vocabularies.
- May be able to use CTS2 APIs, but may need to have separate REST-ful services for customized/specialized content.
- Current browser wouldn't use REST services.
- Unknown coverage for REST possibilities. So additional investigation required.
- Additional cases
- Previous F2F considerations.
- Custom Lucene may need to be provided for clinical trials.
- Group Value Sets - may be useful.
- Restrict to Properties - need to better understand this usecase.
- History - suggested by the CTRP but has some requirements in scope of the Browser.
- Graph and Association
- No need from caDSR
- Additional requirements
- Bulk Download
- ability to download full or part of a complete terminology.
- Bulk Download
- Align rest calls to support Moonshot API services.
- Make others aware services are available.
- Tracy suggested to look at Data.gov
- Browser team not going to use REST at this time
- caDSR not going to use REST at this time.
Decision Points:
- Identify Moonshot Clinical Trial API services to be supported by REST services.
- Identify ways to promote REST services.
- Identify possibilities of participating in Data.gov
Time | Location | Topics | Participants |
---|---|---|---|
11:00 AM - 12:00 PM | 4W030 | Triple store/RDF Discussion
|
Attendees:
Discussion Points:
- There is a pilot ongoing to review triple stores
- selected 3 triple stores (StarDog, Allegrograph (http://allegrograph.com/), Viritouso) and have been working for the past 6 months
- evaluations covering restricting operations, loading data, performance testing, examine security (secured and anonymous access via proxies)
- nature of queries haven't been as representative of what is needed for production.
- more focus on real use queries.
- operations - hosting model not supported by CBIIT - so no support. (ie, patching support not provided)
- all can be queried with standard SPARQL
- will still want REST services available to the end users.
- Transition to SPAQRL gives raw access to the data (unlike API)
- However, you need to understand the data - and this could differ from endpoint to endpoint.
- What do TripleStores provide - that differ (better than) LexEVS
- Representation of Hierarchy
- Level of expressivity
- 3 use cases for triplestore evaluation:
- Expressivity (reasoning support)
- Linked open data
- use vocabulary as a "glue" between different data repositories. Ability to "join" distributed repositories.
- Closer integration of vocabulary and meta data (part of the MDR).
- Report writer now uses the triple store
- Loading of Value sets take much less time
- Access of Value Sets is better, but not hugely different.
- Need to determine where Triple Store is better and where LexEVS is better.
Decision Points:
- Investigate where Triple Store usage may augment LexEVS.
Time | Location | Topics | Participants |
---|---|---|---|
1:00 PM - 2:00 PM | 5E030 | EVS Project Group Discussion (During regular call-in time)
| EVS project meeting |
Attendees: Larry, Kim, Craig, Scott, Cory, John, Liz, Gilberto, Lori, Tin, Teri, Sharon, Sana, Nick, Nels, Margaret, George, Brenda, Abigail, Erin, Joanne
Discussion Points:
- Value Sets
- Would like LexEVS to support production of value sets with more rich structure. (to more efficiently assemble this deliverable)
- 100K downloads from FTP site. Fewer users use the browser to download the value sets.
- Mapping
- On a mapping page, you can download an excel or cvs file.
- ie, chebi has mapping.
- For GDC - ICD9 or 10 coding - to be able to use NCIT coding, there needs to be way to translate between ICD9 and NCIT codes. There currently isn't a good way to do that today.
- ie ICD9 - Brest cancer - corresponds to ABC in NCIT
- Determine how such a map could be published (browser and LexEVS)
- Other Services
- IUPAC - there are 2 flavors to be considered - but can be managed.
- HUGO - the slash and hyphens have been problematic, but have been mostly resolved in the NCI Browser.
- Review of 4 identified searching issues (differences between results in LexEVS and Protege).
- NGram tokenizers may provide solution if we implement an Expert System.
Decision Points:
- Revisit the mapping discussion during a future project meeting.
- Consider and prioritize the expert system solution.
- Review the usage of NGrams in Lucene.
Time | Location | Topics | Participants |
---|---|---|---|
2:00 PM - 3:00 PM | 5E030 | LexEVS Mapping DiscussionDetermine requirements and propose solution for mapping.
|
Attendees: Larry, Kim, Craig, Scott, Cory, John, Liz, Gilberto, Tracy
Discussion Points:
- Use case provided by external LexEVS user.
- one to many (one terminology to many) is currently not a priority.
- There may have been time when loading maps from UMLS, but not sure why it wasn't completed. Brian may have more information.
- Ability to capture synonymous (non-) - Query API, Loader. Would want a use case to specifically describe this mapping.
- Consider loading MRMAP and review how it is loaded.
Decision Points:
- Investigate the MRMAP load and determine why that work wasn't completed.
- Investigate the ability to capture non-synonymous
Time | Location | Topics | Participants |
---|---|---|---|
3:00 PM - 4:30 PM | 5E030 | Lucene DiscussionPropose additional features of Lucene to be used within LexEVS.
|
Attendees: Larry, Kim, Craig, Scott, Cory, John, Liz, Gilberto, Tracy
Discussion Points:
- Facets - ability to perform categorical search
- Coding Scheme Types
- Value Set Categories
- Auto complete
- Interest for the browser and caDSR
- concerns about the results being overwhelming
- might be more useful if combined with facets - i.e. search cancers with facets of neoplasms
- Elastic Search
- There are still many custom analyzers
- If possible to make search more portable - across LexEVS and TripleStore - may want to look at this.
- Prefer to have a single interface - instead of having the user decide if they use LexEVS or TripleStore.
- SOLR vs Elastic Search
- SOLR documents more flat
- Elastic Search document more complex.
- Search down NeoPlasms and then stop at a certain level. There is no mechanism to capture that. Similar cases for drug searches.
- http://www.immport.org/
- John demoed - this uses facets and auto complete (based on 3 chars or more and typing speed).
- John demoed - this uses facets and auto complete (based on 3 chars or more and typing speed).
Decision Points:
- Investigate ability to use Facets and where it could be used.
- Investigate ability to design a usable auto complete.
Time | Location | Topics | Participants | Resources |
---|---|---|---|---|
4:30 PM - 5:00 PM | 5E030 | Overflow/Additional Topics |
Attendees:
Discussion Points:
Decision Points:
Thursday, December 1st, 2016
Time | Location | Topics | Participants |
---|---|---|---|
9:00 AM - 11:00 AM | 3W030 | Value Set management and workflow
| Rob, Tracy |
Attendees: Jason, Gilberto, Rob, Tracy, Scott, Cory, Craig, Tin, Larry, Kim, Sana, Liz
Discussion Points:
- Properties in Thesaurus that support the browsers
- Subsets in Thesaurus (Protege)
- Publish_Value_Set
- Term_Browser_Value_Set_Description
- Value_Set_Location - where browser fetches the report from (ftp location and path within evs, and filename, BNF)
- TVS_Location - Terminology Value Set OWL file - hierarchy and components
- Properties are scrubbed before loading into LexEVS - these are private/internal properties.
- Subsets in Thesaurus (Protege)
- Baseline - A diff is done on the value sets from month to month. Triggers update load procedures monthly.
- Load, Remove and Resolve scripts are generated for changes
- OWL file provides information about where the value set lives in the hierarchy.
- NCI Thesaurus and TVS (provides structure to value sets) - used by the browser
- Process was created to provide structure/hierarchy to value sets
- Value Set loads - 700 coding schemes - loaded in 24 hours
- If value set resolution is performed against a new version of the code system, does LexEVS handle the versions?
- Process currently isn't limited to NCIT.
- Script created to create a txt file that views hierarchy of concepts on value set.
- value set downloads are ~10k a month and used across agencies by diverse set of consumers.
- curation and delivery processes are driven by the users and consumers.
- EVS editors work directly with CDISC and other groups.
- CDISC and FDA have different formats and standards
- Resolved value sets as coding schemes - done that way for performance.
- Process can be error prone for the EVS Editors. The editors need to do specific things to drive Browser.
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
11:00 AM - 12:00 PM | 3W030 | Value Set and Mapping Data with Hierarchical structure Discussion
|
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
1:00 PM - 3:30 PM | 3W030 | NCI Systems Discussions
| Sara, Shireesha, Phil
Jacob, Yeon (Systems Team) |
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
3:30 PM - 4:00 PM | 3W030 | FHIR and terminology services (CTS2)
| Harold |
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
4:00 PM - 5:00 PM | 3W030 | OWL Restrictions in LexGrid Model
|
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
5:00 PM - 5:30 PM | 3W030 | Overflow/Additional Topics |
Attendees:
Discussion Points:
Decision Points:
Friday, December 2nd, 2016
Topic:
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
9:00 AM - 10:00 AM | 1W030 | LexEVS AdminDiscuss current and future requirements.
|
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
10:00 AM - 12:00 PM | 1W030 | Prioritization and Debrief
|
Attendees:
Discussion Points:
Decision Points:
Time | Location | Topics | Participants |
---|---|---|---|
1:00 PM - 2:00 PM | 1W030 | Prioritization and Debrief (Continued if needed) |
Attendees:
Discussion Points:
Decision Points: