NIH | National Cancer Institute | NCI Wiki  

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Use Case Templated Example


Project Decision Log


Question/DescriptionDecisionEntry DateDecision Date
1What does the prototype look like and what is its schedule?Prototype will be developed using basic Gen3 technology (including existing stack of database and Windmill portal) by January 31, 2019. The system will incorporate data from a cohort of 3 canines as provided by COP. This will allow us to test and understand basic functionality of Gen3.10/19/1810/22/18
2What type of underlying data should we use? Graph/RDBMS/Postgres-Graph model used by UChicago?Will explore using an experimental model, testing for speed and ease of use, after deployment of prototype. Decision is still in flux. In discussions on 12/13/18, it was decided that this will be simplified down to just Graph/Postgres-Graph because the RDBMS model is known to be too inflexible and slow. A new decision will be entered in the log to decide between the two.10/18/1812/13/18
3What should we use to Extract, Transform and Load the data?

Several choices have been suggested including KNIME and Pentaho. Creating our own system has not be excluded. Will explore this after deployment of prototype.

Initial decision is to use Pentaho and link this using a plug-in to Neo4j. Phil has become familiar with the mechanism for transforming the data in Pentaho and has taught this to Kevin. We continue to examine NiFi to see if it is relevant on a higher level and have contracted with Asymmetrik to demonstrate how to use it.

10/12/186/1/19
4For the underlying database, should we use a Graph database (like Neo4J or Neptune) or stick with the Postgres-Graph database that came with the system?We decided to move to a graph database and the consequent changes to the software as required.12/13/183/26/19
5(predicated by Decision 4) If we decide to go with a Graph database, should we utilize Neo4J or Neptune? TigerGraph is excluded from this decision based on their initial cost ($40K for a 1 year license when only a trial version was desired).Neo4J was decided upon. We will begin with the community edition and upgrade to the enterprise edition. This was decided upon in conjunction with considerations with the Clinical Trials Data Commons which is being co-developed with ICDC.12/13/185/15/19
6Should we utilize the full-fledged version of Gen3 as provided by UChicago, or should we utilize a dockerized version that is simpler in nature by maybe easier to deploy?We will go with the dockerized version at this time. See attached documentation. EKS or ECS Decision for ICDC.docx12/13/1812/28/18
7Cypher vs. Gremlin? Which language should communicate with the underlying database? Gremlin is more universal, Cypher is much faster.

After looking into this a bit more, Gremlin's last update to Neo4J was in 2017 and does not appear to be further supported, including in the current version of Neo4J. Cypher is to be supported more universally in the next version of Apache and so the decision was made to go with Cypher.

5/15/195/30/19
  • No labels