Author: Bauer, Scott
Email: bauer.scott@mayo.edu
Team: LexEVS
Contract: ST12-1106
Client: NCI CBIIT
National Institutes of Heath
US Department of Health and Human Services
Revision History
Version | Date | Description of Changes | Author |
---|---|---|---|
1.0 | 2013/03/05 | Initial Version | Bauer, Scott |
Overview
LexEVS has long relied on a relational database to provide the data store for semantic assertions made about the entity level constructs in terminologies and ontologies. Recently it has become clear that graph database technology has matured enough to allow the the relationships between entities defined by these assertions to be stored in a way that better reflects the nodes and edges of these relationships. Benchmarking tests and practicality reviews have led the LexEVS team to the conclusion that a graph database back end for LexEVS associations will vastly improve traversal performance time and potentially simplify implementation of the association API.
Database Hierarchy Performance Evaluation
New technologies such as the MVRB-tree algorithm implmented in the OrientDB graph database have proved far more efficient and scalable than the traditional relational data base management system.
LexEVS Association Logical Model
The LexGrid Model defines relationships in terms of a source and target node with an edge defined separately in the AssociationPredicate model element. These are the construction basics for larger coded node graphs which are currently represented in a relational schema. The performance restrictions of the relational schema have been well documented above. The source and target structure of LexGrid will be mapped to the structure of the higher performing graph database OrientDB.
While the graph based database seems capable to handle the functions shown in the diagram above, some calls to LexEVS will continue to access some of the model elements that define metadata about the association.
LexGrid in the LexEVS schema (From the MySQL workbench)
Mapping LexGrid data model elements to OrientDB
LexEVS Hierarchy Performance Architecture
While the new implementation of the node graph will largely run against the OrientDB service, some portions of the legacy LexEVS API will be needed to access various metadata and property elements.
Code Considerations
A CodedNodeFactory will determine whether this is an implementation that uses the graph database in conjunction with the relational database or a purely relational database. And a newly implemented DAO and OrientDBCodedNodeGraph provide the underpinnings of what will be a higher performance version of LexEVS' traversal of relationship hierarchies in stored terminologies.