Page History
...
Need to best look at your requirements and needs when choosing the solution.
- Kevin looked at Neo4J, OrientDB, and others by performing benchmarks to determine how well these tools were improving.
- Overall, Kevin found arangoDB to be best all around solution. It is a mix of document and graph solution.
- Modeling is open for documents, graphs, and key value pairs
- Allows for Joins
- Provides graph functionality.
- Gilberto - does arangoDB provide SPARQL endpoint plugin? Kevin indicated that arangoDB may not be supportive of SPARQL.
- Demo of arangoDB
- CTS2 JSON for parts of SNOMED loaded into arangoDB.
- Benchmarks attempted
- Neighborhood (Qualifier value) - LexEVS and CTS2 does this
- returns in less than a second
- Decendants (Qualifier value) - more difficult as maxDepth -1 (all)
- returns in just over a second
- typically done by building a table to traverse
- Leaves (Event) (Return all the leaves)
- Expensive to do in a DB
- SNOMED Event branch - return all the leaves.
- 7300 returned in less than 2 seconds.
- Sub-Graphs (value set resolution related)
- SNOMED root note - all Event branch with everything below, all observation branch and all of organism branch.
- Return how many in each branch and then provide intersection of these branches and see what is returned.
- returns in 3 seconds.
- all - 354,000
- event - 8500
- obs - 855
- organism - 34000
- intersection - 1
- Slightly slower results on OrientDB.
- Graph neighbors - count only
- How many nodes are in the graph - is difficult in LexEVS
- extremely fast result.
- JOINS from nodes to edges
- Joining the edges to the entity.
- returns relation, to and from
- Shortest Path to Root
- Returnes verticies and edges
- Neighborhood (Qualifier value) - LexEVS and CTS2 does this
- Gilberto - how much difference were there between the reviewed tools?
- Kevin - OrientDB and ArangoDB are similar. Neo4J is the most mature of all, but didn't have same performance and was more of a pure graph database.
- Tracy - to satisfy the need for SPARQL endpoint is Neo4J best?
- Kevin - suggests that ArangoDB is not the way to go for SPARQL requirements.
- Kevin's usecases for using ArangoDB is based on performance and ability to quickly meet requirements of users.
- Larry - how could this be used in combination with LexEVS and other tools.
- Kevin - the use of multiple stores/services is becoming more common to accomplish specific tasks.
- NG (Kim and Jason) have been working on SPARQL endpoint for LexEVS
- Doesn't have to go through database layer so it is faster.
- Kim demoed some working code as part of the browser.
- Trees and Hierarchy is faster.
- Continuing to review and understand how SPARQL can apply to EVS tools.
- Larry - how difficult will it be to deploy triple-store and graph DB in the NCI environment?
- Sara - if part of build and deploy (aside from security concerns) then the tools support team can use. (for example struts, spring, etc).
- This impacts the DBAs more than systems. It depends if the project teams need DBA support.
- CBIIT managed hosting (supported by infrastructure teams) is currently how EVS is supported.
- Scott indicated graph queries in LexEVS could be supported by triple stores. However, metadata cannot be supported in triple store. ArangoDB for example, could provide metadata on the edges.
- Design would need to be considered to provide a hybrid solution. The LexEVS API would need to be transparent to the users. The API would need to wrap content from triple store and LexEVS DB.
- The Mayo and NCI team needs to clarify the strengths and weaknesses of each and determine how to best address.
Discussion: Cloud Considerations and Discussion: Build and Deployment Process
- Scott described considerations for cloud usage
- Auto deploy
- Auto Scale resources
- Uptime
- Sharable instances
- Kevin noted that technical is starting to provide ability to deliver what cloud promises. Cloud is more about changing your development lifecycle than hardware. Cloud is not server virtualization.
- Kevin demo'ed the use of Docker with LexEVS
- Sara - is it possible to take a docker image and use it on different tiers - by passing in a variable to let the application know what to configure at that tier.
- Kevin - this is the idea - and those variables can be stored in version control. This simplifies the process.
- Micro-architectures and micro-services is what is important today. LexEVS fits this model well.
- Attempting to document a LexEVS install is complex.
- Docker Example has:
- LexEVS
- LexEVS-cts2
- LexEVS-remote
- mysql
- uriresolver
- This docker container configures a complete LexEVS environment .
- Sara - is it possible to take a docker image and use it on different tiers - by passing in a variable to let the application know what to configure at that tier.
- Kevin described the use of a Nexus server. Similar to maven, docker images can be hosted on a private or public nexus repository. Nexus has expanded to include docker support - internal docker repositories.
Wiki Markup |
---|
{scrollbar:icons=false} |
...