Page History
...
Attendees: Tin, Cory, Craig, Scott, Yeon, Jason, Kim, Larry, Cuong, Jacob, Sara
Discussion: Triple Store/Graph Database
- Mayo has looked at the report from the SI group from CBIIT.
- Larry indicated that SPARQL query is the most focus for the NCIT. Also ability to federate queries across SPARQL end points. Would like have consistent results across LexEVS, SPARQL.
- Jason and Kim have been working on a project
- Gilberto - there are no use cases prepared. However, there are things that a terminology server cannot provide. Would like to have more integrated services.
- For example, if researching Cancer and looking for gene data (how do I glue this information together). If both are in RDF, then can query using all with SPARQL.
- Another example, is data elements - are there other data that exist that might be appropriate for my research. Users can start to explore ontologies for this data discovery.
- Federation of data from other SPARQL endpoints is the primary interest.
- Larry suggested that Instead of LexEVS - Hierarchy and traversals might be better implemented in SPARQL.
- Gilberto -
- Federated queries - yes, that is primary focus.
- SPARQL doesn't need to support reasoning - however, some minimal reasoning reasoning may be considered.
- Performance isn't priority, but it can't be a bottleneck. (graph DB isn't in the consideration)
- LexEVS/CTS2 doesn't need to tie to the triple-store (all shouldnwould't be exposed through the triple store)
- Kevin provided an overview of "what does a terminology database need to do?" and reviewed Key value store, document store (mongoDB, CouchDB), relational db and graph db usage to satisfy specific functionality required by a terminology.
- KVS - Key-Value store; DS - Document Store; RDBMS - Relation Database; GDB - Graph Database
...
Need to best look at your requirements and needs when choosing the solution.
- Kevin looked at Neo4J, OrientDB, and others by performing benchmarks to determine how well these tools were improving.
- Overall, Kevin found arangoDB to be best all around solution. It is a mix of document and graph solution.
- Modeling is open for documents, graphs, and key value pairs
- Allows for Joins
- Provides graph functionality.
- Gilberto - does arangoDB provide SPARQL endpoint plugin? Kevin indicated that arangoDB may not be supportive of SPARQL.
- Demo of arangoDB
- CTS2 JSON for parts of SNOMED loaded into arangoDB.
- Benchmarks attempted
- Neighborhood (Qualifier value) - LexEVS and CTS2 does this
- returns in less than a second
- Decendants (Qualifier value) - more difficult as maxDepth -1 (all)
- returns in just over a second
- typically done by building a table to traverse
- Leaves (Event) (Return all the leaves)
- Expensive to do in a DB
- SNOMED Event branch - return all the leaves.
- 7300 returned in less than 2 seconds.
- Sub-Graphs (value set resolution related)
- SNOMED root note - all Event branch with everything below, all observation branch and all of organism branch.
- Return how many in each branch and then provide intersection of these branches and see what is returned.
- returns in 3 seconds.
- all - 354,000
- event - 8500
- obs - 855
- organism - 34000
- intersection - 1
- Slightly slower results on OrientDB.
- Graph neighbors - count only
- How many nodes are in the graph - is difficult in LexEVS
- extremely fast result.
- JOINS from nodes to edges
- Joining the edges to the entity.
- returns relation, to and from
- Shortest Path to Root
- Returnes verticies and edges
- Neighborhood (Qualifier value) - LexEVS and CTS2 does this
- Gilberto - how much difference were there between the reviewed tools?
- Kevin - OrientDB and ArangoDB are similar. Neo4J is the most mature of all, but didn't have same performance and was more of a pure graph database.
- Tracy - to satisfy the need for SPARQL endpoint is Neo4J best?
- Kevin - suggests that ArangoDB is not the way to go for SPARQL requirements.
- Kevin's usecases for using ArangoDB is based on performance and ability to quickly meet requirements of users.
- Larry - how could this be used in combination with LexEVS and other tools.
- Kevin - the use of multiple stores/services is becoming more common to accomplish specific tasks.
- NG (Kim and Jason) have been working on SPARQL endpoint for LexEVS
- Doesn't have to go through database layer so it is faster.
- Kim demoed some working code as part of the browser.
- Trees and Hierarchy is faster.
- Continuing to review and understand how SPARQL can apply to EVS tools.
- Larry - how difficult will it be to deploy triple-store and graph DB in the NCI environment?
- Sara - if part of build and deploy (aside from security concerns) then the tools support team can use. (for example struts, spring, etc).
- This impacts the DBAs more than systems. It depends if the project teams need DBA support.
- CBIIT managed hosting (supported by infrastructure teams) is currently how EVS is supported.
- Scott indicated graph queries in LexEVS could be supported by triple stores. However, metadata cannot be supported in triple store. ArangoDB for example, could provide metadata on the edges.
- Design would need to be considered to provide a hybrid solution. The LexEVS API would need to be transparent to the users. The API would need to wrap content from triple store and LexEVS DB.
- The Mayo and NCI team needs to clarify the strengths and weaknesses of each and determine how to best address.
Discussion: Cloud Considerations
Wiki Markup |
---|
{scrollbar:icons=false} |
...