NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    • Need to best look at your requirements and needs when choosing the solution.  

    • Kevin looked at Neo4J, OrientDB, and others by performing benchmarks to determine how well these tools were improving.   
    • Overall, Kevin found arangoDB to be best all around solution.  It is a mix of document and graph solution.  
      • Modeling is open for documents, graphs, and key value pairs
      • Allows for Joins
      • Provides graph functionality.
    • Gilberto - does arangoDB provide SPARQL endpoint plugin?  Kevin indicated that arangoDB may not be supportive of SPARQL.
    • Demo of arangoDB
      • CTS2 JSON for parts of SNOMED loaded into arangoDB.
      • Benchmarks attempted
        • Neighborhood (Qualifier value) - LexEVS and CTS2 does this
          • returns in less than a second
        • Decendants (Qualifier value) - more difficult as maxDepth -1 (all)
          • returns in just over a second
          • typically done by building a table to traverse
        • Leaves (Event) (Return all the leaves)
          • Expensive to do in a DB
          • SNOMED Event branch - return all the leaves.
          • 7300 returned in less than 2 seconds.
        • Sub-Graphs (value set resolution related)
          • SNOMED root note - all Event branch with everything below, all observation branch and all of organism branch.
          • Return how many in each branch and then provide intersection of these branches and see what is returned.
          • returns in 3 seconds.
            • all - 354,000 
            • event - 8500
            • obs - 855
            • organism - 34000
            • intersection - 1
          • Slightly slower results on OrientDB.
        • Graph neighbors - count only
          • How many nodes are in the graph - is difficult in LexEVS
          • extremely fast result. 
        • JOINS from nodes to edges
          • Joining the edges to the entity.
          • returns relation, to and from
        • Shortest Path to Root
          • Returnes verticies and edges
      • Gilberto - how much difference were there between the reviewed tools?  
        • Kevin - OrientDB and ArangoDB are similar.   Neo4J is the most mature of all, but didn't have same performance and was more of a pure graph database.  
      • Tracy - to satisfy the need for SPARQL endpoint is Neo4J best?
        • Kevin - suggests that ArangoDB is not the way to go for SPARQL requirements.  
      • Kevin's usecases for using ArangoDB is based on performance and ability to quickly meet requirements of users.  
      • Larry - how could this be used in combination with LexEVS and other tools.   
        • Kevin - the use of multiple stores/services is becoming more common to accomplish specific tasks.
    • NG (Kim and Jason) have been working on SPARQL endpoint for LexEVS
      • Doesn't have to go through database layer so it is faster. 
      • Kim demoed some working code as part of the browser.  
      • Trees and Hierarchy is faster. 
      • Continuing to review and understand how SPARQL can apply to EVS tools.
    • Larry - how difficult will it be to deploy triple-store and graph DB in the NCI environment?
      • Sara - if part of build and deploy (aside from security concerns) then the tools support team can use.  (for example struts, spring, etc).
      • This impacts the DBAs more than systems.  It depends if the project teams need DBA support.
      • CBIIT managed hosting (supported by infrastructure teams) is currently how EVS is supported.
    • Scott indicated graph queries in LexEVS could be supported by triple stores.   However, metadata cannot be supported in triple store.  ArangoDB for example, could provide metadata on the edges.
      • Design would need to be considered to provide a hybrid solution.  The LexEVS API would need to be transparent to the users.   The API would need to wrap content from triple store and LexEVS DB.
    • The Mayo and NCI team needs to clarify the strengths and weaknesses of each and determine how to best address.

Discussion: Cloud Considerations and Discussion: Build and Deployment Process

  • Scott described considerations for cloud usage
    • Auto deploy
    • Auto Scale resources
    • Uptime
    • Sharable instances
  • Kevin noted that technical is starting to provide ability to deliver what cloud promises.   Cloud is more about changing your development lifecycle than hardware.   Cloud is not server virtualization.  
  • Kevin demo'ed the use of Docker with LexEVS
    • Sara - is it possible to take a docker image and use it on different tiers - by passing in a variable to let the application know what to configure at that tier.  
      • Kevin - this is the idea - and those variables can be stored in version control.  This simplifies the process.
    • Micro-architectures and micro-services is what is important today.  LexEVS fits this model well.
    • Attempting to document a LexEVS install is complex.  
    • Docker Example has:
      • LexEVS
      • LexEVS-cts2
      • LexEVS-remote
      • mysql
      • uriresolver
    • This docker container configures a complete LexEVS environment .
  • Kevin described the use of a Nexus server.  Similar to maven, docker images can be hosted on a private or public nexus repository.  Nexus has expanded to include docker support - internal docker repositories.

 

Wiki Markup
{scrollbar:icons=false}

...