Page History

Versions Compared

Old Version 47

changes.mady.by.user Unknown User (stanclcr2)

Saved on Dec 10, 2015

compared with

New Version 48

changes.mady.by.user Unknown User (stanclcr2)

Saved on Dec 10, 2015

Key

This line was added.
This line was removed.
Formatting was changed.

...

Attendees: Tin, Cory, Craig, Scott, Yeon, Jason, Kim, Larry, Cuong, Jacob, Sara

Discussion: Triple Store/Graph Database

Mayo has looked at the report from the SI group from CBIIT.
Larry indicated that SPARQL query is the most focus for the NCIT. Also ability to federate queries across SPARQL end points. Would like have consistent results across LexEVS, SPARQL.
Jason and Kim have been working on a project
Gilberto - there are no use cases prepared. However, there are things that a terminology server cannot provide. Would like to have more integrated services.
- For example, if researching Cancer and looking for gene data (how do I glue this information together). If both are in RDF, then can query using all with SPARQL.
- Another example, is data elements - are there other data that exist that might be appropriate for my research. Users can start to explore ontologies for this data discovery.
- Federation of data from other SPARQL endpoints is the primary interest.
Larry suggested that Instead of LexEVS - Hierarchy and traversals might be better implemented in SPARQL.
Gilberto -
- Federated queries - yes, that is primary focus.
- SPARQL doesn't need to support reasoning - however, some minimal reasoning reasoning may be considered.
- Performance isn't priority, but it can't be a bottleneck. (graph DB isn't in the consideration)
- LexEVS/CTS2 doesn't need to tie to the triple-store (all shouldnwould't be exposed through the triple store)
Kevin provided an overview of "what does a terminology database need to do?" and reviewed Key value store, document store (mongoDB, CouchDB), relational db and graph db usage to satisfy specific functionality required by a terminology.
- KVS - Key-Value store; DS - Document Store; RDBMS - Relation Database; GDB - Graph Database

...

- Need to best look at your requirements and needs when choosing the solution.
- Kevin looked at Neo4J, OrientDB, and others by performing benchmarks to determine how well these tools were improving.
- Overall, Kevin found arangoDB to be best all around solution. It is a mix of document and graph solution.
  - Modeling is open for documents, graphs, and key value pairs
  - Allows for Joins
  - Provides graph functionality.
- Gilberto - does arangoDB provide SPARQL endpoint plugin? Kevin indicated that arangoDB may not be supportive of SPARQL.
  - http://stackoverflow.com/questions/34015945/sparql-interface-for-arangodb
- Demo of arangoDB
  - CTS2 JSON for parts of SNOMED loaded into arangoDB.
  - Benchmarks attempted
    - Neighborhood (Qualifier value) - LexEVS and CTS2 does this
      - returns in less than a second
    - Decendants (Qualifier value) - more difficult as maxDepth -1 (all)
      - returns in just over a second
      - typically done by building a table to traverse
    - Leaves (Event) (Return all the leaves)
      - Expensive to do in a DB
      - SNOMED Event branch - return all the leaves.
      - 7300 returned in less than 2 seconds.
    - Sub-Graphs (value set resolution related)
      - SNOMED root note - all Event branch with everything below, all observation branch and all of organism branch.
      - Return how many in each branch and then provide intersection of these branches and see what is returned.
      - returns in 3 seconds.
        all - 354,000
        event - 8500
        obs - 855
        organism - 34000
        intersection - 1
      - Slightly slower results on OrientDB.
    - Graph neighbors - count only
      - How many nodes are in the graph - is difficult in LexEVS
      - extremely fast result.
    - JOINS from nodes to edges
      - Joining the edges to the entity.
      - returns relation, to and from
    - Shortest Path to Root
      - Returnes verticies and edges
  - Gilberto - how much difference were there between the reviewed tools?
    - Kevin - OrientDB and ArangoDB are similar. Neo4J is the most mature of all, but didn't have same performance and was more of a pure graph database.
  - Tracy - to satisfy the need for SPARQL endpoint is Neo4J best?
    - Kevin - suggests that ArangoDB is not the way to go for SPARQL requirements.
  - Kevin's usecases for using ArangoDB is based on performance and ability to quickly meet requirements of users.
  - Larry - how could this be used in combination with LexEVS and other tools.
    - Kevin - the use of multiple stores/services is becoming more common to accomplish specific tasks.
- NG (Kim and Jason) have been working on SPARQL endpoint for LexEVS
  - Doesn't have to go through database layer so it is faster.
  - Kim demoed some working code as part of the browser.
  - Trees and Hierarchy is faster.
  - Continuing to review and understand how SPARQL can apply to EVS tools.
- Larry - how difficult will it be to deploy triple-store and graph DB in the NCI environment?
  - Sara - if part of build and deploy (aside from security concerns) then the tools support team can use. (for example struts, spring, etc).
  - This impacts the DBAs more than systems. It depends if the project teams need DBA support.
  - CBIIT managed hosting (supported by infrastructure teams) is currently how EVS is supported.
- Scott indicated graph queries in LexEVS could be supported by triple stores. However, metadata cannot be supported in triple store. ArangoDB for example, could provide metadata on the edges.
  - Design would need to be considered to provide a hybrid solution. The LexEVS API would need to be transparent to the users. The API would need to wrap content from triple store and LexEVS DB.
- The Mayo and NCI team needs to clarify the strengths and weaknesses of each and determine how to best address.

Discussion: Cloud Considerations

Wiki Markup
{scrollbar:icons=false}

...

Content

Space Tools

Versions Compared

Old Version 47

New Version 48

Key

Discussion: Triple Store/Graph Database

Discussion: Cloud Considerations