Page History

Versions Compared

Old Version 27

changes.mady.by.user Unknown User (stanclcr2)

Saved on Dec 08, 2015

compared with

New Version Current

changes.mady.by.user Unknown User (stanclcr2)

Saved on Dec 10, 2015

Key

This line was added.
This line was removed.
Formatting was changed.

Comment: Migration of unmigrated content due to installation of a new plugin

Scrollbar

icons	false

...

Panel

title	Document Information

Author: Craig Stancl, Scott Bauer, Cory Endle
Email: Stancl.craig@mayo.edu, bauer.scott@mayo.edu, endle.cory@mayo.edu
Team: LexEVS
Contract: S13-500 MOD4
Client: NCI CBIIT
National Institutes of Heath
US Department of Health and Human Services

...

Wednesday, December 9, 2015

...

8:00 AM - 9:00 AM

2E914

Discussion: Review Lucene

Attendees: Kim, Cory, Scott, Craig

Kim reviewed the work had has done with the Loader tutorial code that Scott provided.
Scott discussed how property values are tied to entities in Lucene.
It was noted that the 6.4 Lucene development isn't complete so some of the functionality is not working today.
Scott reviewed how set theory is currently implemented. (2 parent block join)

9:00 AM - 9:30 AM

2E914

Recap and Planning

Attendees: Kim, Cory, Scott, Craig, Jason, Larry, Tin

The 2015.12 Technical Face-To-Face Prioritization List was reviewed and updated to capture additional items.
It was noted that Partonomy should be considered for CTRP requirements.
Larry discussed structured presentation of Value Set and Mapping data. Currently flat lists (concepts with terms) or terms with source and target. To be useful, would prefer to have hierarchy viewable to represent the internal structures. There may be existing JIRA items, but we should look at this again. Scott suggested the use of codedNodeGraph call to create this hierarchy. Usage needs to be considered - requirements need to be established from the users and then look at the technology to support.

9:30AM - Noon

2E914

Discussion: Coding Scheme Search and Indexing

Determine requirements/use cases for horizontal coding scheme searches
Overview of indexing of qualifiers
Overview of search results in 6.4

Discussion: LEXEVS Loader Improvements

Discuss known issues for OWL2, MedDRA and HL7 Loaders
Other loader improvements
Determine next steps/roadmap

Attendees: Kim, Cory, Scott, Craig, Jason, Larry, Tin, Rob, Tracy

Discussion: Coding Scheme Search and Indexing

Traversing Coding Schemes
- OBI and GO would be the starting place to determine ability needed to traverse from one coding scheme to the next. We have this captured and will consider.
Indexing
- Index Qualifiers
  - Scott described how we currently index qualifiers. Qualifiers are stored in a file as a list that are grouped together and parsed into the index. The list is added to the parent document as part of the block join implementation.
- LexEVS 6.4 Implementation
  - Scott discussed the status of 6.4 and noted that we've noticed some result differences in going form a single index to multi indexes. The scoring is based on the frequency of a term - and we can boost the score. There is a junit that tests the boosting of terms, but we aren't sure this is a credible issue or not. Approximations are going to make this difficult. Larry suggested that the raking be considered only for the individual source and not across all sources.
  - Gilberto described a search result page where results could be split by vocabulary sources. To do this, the list of sources could be presented to the user and the user could select the source from a pop-up. The browser would need to be updated.
  - Even with multiple indexes, exact matches will always be at the top of the results. Similar weighting should also be preserved.
  - Larry requested that we share with the group how it worked in 6.3 and how it now is returned in 6.4 once fully implemented.
  - Scott noted that we could always write our own analyzer, but then we'd need to maintain and support.
  - Stop word list is still valid in new implementation.

Discussion: LEXEVS Loader Improvements

OWL2 Loader
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-586
  - Used to support inferencing. GO and OBI may have this already.
  - Examples include - hasUncle, hasFather, etc.
  - Gilberto to provide examples and we can discuss during future meeting.
  - http://www.w3.org/TR/owl2-new-features/#F8:_Property_Chain_Inclusion
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-1160
  - This is related to OWL2 and should be considered part of OWL2 changes.
MedDRA Loader
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-339
  - This can be closed since we have a MedDRA loader.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-1169
  - This does not impact the load.
- Need to create JIRA for "Semantic Type" missing
- Need to provide comparison of MedDRA loads to understand what is missing. May decide to not pursue based on what is identified.
HL7 Loader
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-584
  - This has nothing of value and can be closed.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-1037
  - Scott noted that coding schemes should be loaded as separate coding schemes. Historically, we've loaded as a single coding scheme. There are varying views on how it should be loaded. There are approximately ~200 coding schemes and most have few concepts in them.
  - Namespaces need to be fixed - Scott asserted that there is no supported coding scheme (additional metadata)
- Gilberto described that on HL7 webpage, you can view the coding scheme and the branch/structure. It is desired to provide that structure.
  - We should look at what HL7 provides and determine what needs to be done.
Process Automation
- Tracy described the sequential loading of content (GO, ChEBI, etc).
  - This manual loading happens monthly.
  - There are 5 manual steps.
  - Propose a way to provide the loader a version and as a result it would build a manifest object.
Additional JIRA items
- Jira
  server NCI Tracker
  jqlQuery project = LEXEVS AND status in (Open, "In Progress", Reopened, "On Hold") AND (description ~loader OR summary ~loader)
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  - This list needs to be reviewed by the group to determine if there are additional loader considerations. If there are JIRA items that are no longer needed, they should be closed.
- Scott brought up the issue around a failed load and table locking.
  - This issue is related to 5.5.
  - LexEVS should be aware so it can fix the problem.
  - One option would be to update 5.6 or MariaDB.
  - JIRA item needs to be included.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-234
  - This should be considered in project backlog.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-347
  - Tracy noted this would be difficult as the indicator varies based on the source.
  - Scott suggested an evaluation would need to be completed to understand the methods used to indicate deprecated concepts.
  - It was suggested to focus on specific coding schemes - instead of a general approach. Priority should be for OWL2.
  - This should be considered in project backlog.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-459
  - Tracy - currently there is a script to load all the value sets (load value set definition).
    - Propose that the VS could be added to a directory and point the script at the directory to load the value sets.
    - This will be further discussed during the Value Set Editor discussion.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-535
  - This issue can be removed.
  - CLAML loader is no longer used. All JIRA items around CLAML can be closed.
- Jira
  server NCI Tracker
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId 7954a81f-12da-3366-a0ef-97c806660e7c
  key LEXEVS-464
  - This is part of the older OWL loader.
  - This may have been resolved in OWL2.
  - The OWL to loader is no longer the focus.
  - No decisions made.
It was noted that we should do a complete review of the JIRA backlog (ON HOLD) issues. Most issues associated to OBO may be closed after review.

1:00 PM - 1:30 PM

2E914

EVS Tools (LexEVS & EVS Focus) Meeting

Open topics

Attendees: EVS Group

Reviewed 2015.12 Technical Face-To-Face Prioritization List with the group.

1:30 PM - 4:00 PM

2E914

Tutorial/Discussion: Loader Implementation and Requirements

Overview of LexEVS Loader development process
Implement a simple LexEVS loader (hands-on)
Discussion of entry points into various loaders for purposes of updating
Discuss automating data loads and customization

Attendees: Tracy, Rob, Tin, Cory, Craig

Scott presented the loader tutorial.
Tracy wondered if an OWL2 to LexGrid Model spreadsheet (mapping) existed. This does not exist, but would be helpful. It is suggested that this mapping be documented.

...

Thursday, December 10, 2015

9:00 AM - 9:30 AM

6E030

Recap and Planning

Attendees: Tin, Cory, Craig, Scott, Yeon, Jason, Kim, Larry

Reviewed 2015.12 Technical Face-To-Face Prioritization List with the group

9:30 AM - Noon

6E030

Discussion: Triple Store/Graph Database

NCI to provide requirements/use cases
Discuss plans based on Triple Store Capability Research and Analysis wiki pages
Demo of ArangoDB (https://www.arangodb.com/)
Determine next steps/roadmap

Discussion: Cloud Considerations

NCI to provide requirements/use cases for supporting a cloud environment

Discussion: Build and Deployment Process

DevOps and continuous integration
Changes for build, deployment and distribution
- Changes to support 6.4
- Considerations for Triple Store/Graph DB
- Considerations for Cloud Infrastructure
Mayo's continuous integration server
Demo of Docker
Determine next steps/roadmap

Attendees: Tin, Cory, Craig, Scott, Yeon, Jason, Kim, Larry, Cuong, Jacob, Sara, Larry, Gilberto

Discussion: Triple Store/Graph Database

Mayo has looked at the report from the SI group from CBIIT.
Larry indicated that SPARQL query is the most focus for the NCIT. Also ability to federate queries across SPARQL end points. Would like have consistent results across LexEVS, SPARQL.
Jason and Kim have been working on a project
Gilberto - there are no use cases prepared. However, there are things that a terminology server cannot provide. Would like to have more integrated services.
- For example, if researching Cancer and looking for gene data (how do I glue this information together). If both are in RDF, then can query using all with SPARQL.
- Another example, is data elements - are there other data that exist that might be appropriate for my research. Users can start to explore ontologies for this data discovery.
- Federation of data from other SPARQL endpoints is the primary interest.
Larry suggested that Instead of LexEVS - Hierarchy and traversals might be better implemented in SPARQL.
Gilberto -
- Federated queries - yes, that is primary focus.
- SPARQL doesn't need to support reasoning - however, some minimal reasoning may be considered.
- Performance isn't priority, but it can't be a bottleneck.
- LexEVS/CTS2 doesn't need to tie to the triple-store (all would't be exposed through the triple store)
Kevin provided an overview of "what does a terminology database need to do?" and reviewed Key value store, document store (mongoDB, CouchDB), relational db and graph db usage to satisfy specific functionality required by a terminology.
- KVS - Key-Value store; DS - Document Store; RDBMS - Relation Database; GDB - Graph Database

Datastore Feature	Datastore Type that Performs Well
Store a resource with an ID	KVS, DS, RDBMS, GDB
Find a resource by ID	KVS, DS, RDBMS, GDB
Find a resource by a set of properties	DS, RDBMS, GDB
Find all edges of a resource	GDB, RDBMS
Traverse a graph	GDB
Compute subgraphs	GDB
Perform set operations on subgraphs	GDB
Calculate paths	GDB

- Need to best look at your requirements and needs when choosing the solution.
- Kevin looked at Neo4J, OrientDB, and others by performing benchmarks to determine how well these tools were improving.
- Overall, Kevin found arangoDB to be best all around solution. It is a mix of document and graph solution.
  - Modeling is open for documents, graphs, and key value pairs
  - Allows for Joins
  - Provides graph functionality.
- Gilberto - does arangoDB provide SPARQL endpoint plugin? Kevin indicated that arangoDB may not be supportive of SPARQL.
  - http://stackoverflow.com/questions/34015945/sparql-interface-for-arangodb
- Demo of arangoDB
  - CTS2 JSON for parts of SNOMED loaded into arangoDB.
  - Benchmarks attempted
    - Neighborhood (Qualifier value) - LexEVS and CTS2 does this
      - returns in less than a second
    - Decendants (Qualifier value) - more difficult as maxDepth -1 (all)
      - returns in just over a second
      - typically done by building a table to traverse
    - Leaves (Event) (Return all the leaves)
      - Expensive to do in a DB
      - SNOMED Event branch - return all the leaves.
      - 7300 returned in less than 2 seconds.
    - Sub-Graphs (value set resolution related)
      - SNOMED root note - all Event branch with everything below, all observation branch and all of organism branch.
      - Return how many in each branch and then provide intersection of these branches and see what is returned.
      - returns in 3 seconds.
        all - 354,000
        event - 8500
        obs - 855
        organism - 34000
        intersection - 1
      - Slightly slower results on OrientDB.
    - Graph neighbors - count only
      - How many nodes are in the graph - is difficult in LexEVS
      - extremely fast result.
    - JOINS from nodes to edges
      - Joining the edges to the entity.
      - returns relation, to and from
    - Shortest Path to Root
      - Returnes verticies and edges
  - Gilberto - how much difference were there between the reviewed tools?
    - Kevin - OrientDB and ArangoDB are similar. Neo4J is the most mature of all, but didn't have same performance and was more of a pure graph database.
  - Tracy - to satisfy the need for SPARQL endpoint is Neo4J best?
    - Kevin - suggests that ArangoDB is not the way to go for SPARQL requirements.
  - Kevin's usecases for using ArangoDB is based on performance and ability to quickly meet requirements of users.
  - Larry - how could this be used in combination with LexEVS and other tools.
    - Kevin - the use of multiple stores/services is becoming more common to accomplish specific tasks.
- NG (Kim and Jason) have been working on SPARQL endpoint for LexEVS
  - Doesn't have to go through database layer so it is faster.
  - Kim demoed some working code as part of the browser.
  - Trees and Hierarchy is faster.
  - Continuing to review and understand how SPARQL can apply to EVS tools.
- Larry - how difficult will it be to deploy triple-store and graph DB in the NCI environment?
  - Sara - if part of build and deploy (aside from security concerns) then the tools support team can use. (for example struts, spring, etc).
  - This impacts the DBAs more than systems. It depends if the project teams need DBA support.
  - CBIIT managed hosting (supported by infrastructure teams) is currently how EVS is supported.
- Scott indicated graph queries in LexEVS could be supported by triple stores. However, metadata cannot be supported in triple store. ArangoDB for example, could provide metadata on the edges.
  - Design would need to be considered to provide a hybrid solution. The LexEVS API would need to be transparent to the users. The API would need to wrap content from triple store and LexEVS DB.
- The Mayo and NCI team needs to clarify the strengths and weaknesses of each and determine how to best address.

Discussion: Cloud Considerations and Discussion: Build and Deployment Process

Attendees: Tin, Cory, Craig, Scott, Yeon, Jason, Kim, Larry, Cuong, Jacob, Sara, AJ, Larry, Gilberto

Scott described considerations for cloud usage
- Auto deploy
- Auto Scale resources
- Uptime
- Sharable instances
Kevin noted that technical is starting to provide ability to deliver what cloud promises. Cloud is more about changing your development lifecycle than hardware. Cloud is not server virtualization.
Kevin demo'ed the use of Docker with LexEVS
- Sara asked if it is possible to take a docker image and use it on different tiers - by passing in a variable to let the application know what to configure at that tier.
  - Kevin - this is the idea - and those variables can be stored in version control. This simplifies the process.
- Micro-architectures and micro-services is what is important today. LexEVS fits this model well.
- Attempting to document a LexEVS install is complex.
- Docker Example has:
  - LexEVS
  - LexEVS-cts2
  - LexEVS-remote
  - mysql
  - uriresolver
- This docker container configures a complete LexEVS environment .
- Kevin described the use of a Nexus server. Similar to maven, docker images can be hosted on a private or public nexus repository. Nexus has expanded to include docker support - internal docker repositories.
- Sarah - The tomcat, mysql and os come from public repositories.
- Application versions can be specified or simply pull the latest from the docker server.
- Sarah - CBIIT is not ready to support Docker and won't be available by March 2016.
- The goal is to provide docker images for on premise (NCI) installation or install to external cloud services as required by NCI.
- Sara - considerations for storing configuration files - we need to consider how passwords and other information

1:00 PM - 3:00 PM

1W030

Discussion: Value Set Editor (Authoring)

NCI to provide requirements/use cases

Discussion: The Future of lbGUI

Discuss future requirements
Review issues - JIRA and others
Develop a roadmap to address technical debt
Determine next steps/roadmap

Discussion: Build and Deployment Process and Discussion: Value Set Editor (Authoring)

Attendees: Cory, Craig, Scott, Jason, Kim, Larry, Gilberto, Rob

Dev Ops Discussions carried over from earlier this morning.
- Continuous Integration
  - Cory discussed the Continuous Integration Server usage and how it is used by the Mayo development team.
  - There was discussion about how to include CI server functionality to provide value to both the browser team and LexEVS team.
  - Currently Jenkins is unofficially supported at NCI, but they are supporting what is needed by the project teams.
  - Mayo does have Travis and Jenkins, but suggests the use of Jenkins.
  - Jason - there is interest and some value, but limited given few (one) developers. It was suggested that it would be best to discuss with the Dev Ops group before setting up something else.
Value Set Editor
- Requirements
  - Ability to maintain collections and sub-collections of value sets
    - https://nciterms.nci.nih.gov/ncitbrowser/ajax?action=create_src_vs_tree&nav_type=valuesets#

Image Added

- - - Ability to efficiently load the resolved value sets
- Tracy - Value Set authoring doesn't occur often. The value set resolution happens every time there is a new version of NCIT.
- FDA and CDISC value sets are based on NCIT concepts.
  - Report writer templates are used for individual value sets.
- Further review of value set workflow (including resolution) is needed to determine requirements and proposed changes.

Discussion: The Future of lbGUI

Attendees: Cory, Craig, Scott, Jason, Kim, Larry, Gilberto, Rob

Current usage of lbGUI:
- Mayo team uses the GUI to verify loads. (Development)
- Tracy uses when the admin scripts aren't working. (Admin)
- Rob uses to cleanup (Admin)
- Gilberto uses to determine if data is loaded correctly. (Data)
Scott noted that the representation of data in lbGUI isn't always correct. It is better to look at DB to determine how things are loaded.
Technology needs to be updated
- Usage is becoming unstable
- There are several know bugs.
The focus should be on expanding the admin scripts and move away from lbGUI.
Further review of admin workflow is needed to determine requirements and proposed admin script changes and additions.
lbGUI will continue to be minimally supported as needed by the development team.
Consider ability to provide viewing additional metadata in the GUI.

3:00 PM - 5:00 PM

1W030

Debrief

Prioritize
Determine next steps/roadmap

Debrief

Attendees: Cory, Craig, Scott, Jason, Kim, Larry, Gilberto

Reviewed and prioritized 2015.12 Technical Face-To-Face Prioritization List with the group

Scrollbar

icons	false

Content

Space Tools

Versions Compared

Old Version 27

New Version Current

Key

Wednesday, December 9, 2015

8:00 AM - 9:00 AM

9:00 AM - 9:30 AM

9:30AM - Noon

Discussion: Coding Scheme Search and Indexing

Discussion: LEXEVS Loader Improvements

1:00 PM - 1:30 PM

1:30 PM - 4:00 PM

Thursday, December 10, 2015

9:00 AM - 9:30 AM

9:30 AM - Noon

Discussion: Triple Store/Graph Database

Discussion: Cloud Considerations and Discussion: Build and Deployment Process

1:00 PM - 3:00 PM

Discussion: Build and Deployment Process and Discussion: Value Set Editor (Authoring)

Discussion: The Future of lbGUI

3:00 PM - 5:00 PM

Debrief