Assessment Report
This section will provide the Assessment Report generated during the prototyping work of the inception phase.
Tools Recommended for Semantic Infrastructure Work
The tools and libraries listed here have been identified as being of possible help in certain aspects of the Semantic Infrastructure 2.0 Roadmap project and development of the architecture. The supported CBIIT application structures, such as the Java platform, Tomcat, JBOSS, Ant, and Maven, are identified at <<Insert Reference to CBIIT supported standards here >>. The tools listed below have not been formally described as supported by CBIIT at this time. However, they suggest the type of components and architecture expected to best satisfy project requirements.
Triple Store Access
Access tools represent methods for interacting with information represented using Resource Description Framework (RDF) and Web Ontology Language (OWL) representations of metadata. In addition, these tools may provide additional support for inference engines.
Jena (sourceforge.net) - The Jena Semantic Web Framework is a tool for building semantic web applications. It provides a programmatic environment for RDF, RDF Schema (RDFS) and OWL as well as a SPARQL engine and rule-based inference. This is a general purpose tool that supports combinations of models in memory as well as transactional persistence of triples. Jena works well with ontologies with large numbers of classes and individuals which require read and write functionality.
OwlAPI (sourceforge.net) - The OWL API is a Java API for reference implementation for creating, manipulating and serializing OWL ontologies. This interface provides a fast, in-memory representation for manipulating OWL ontologies and provides persistence to OWL in XML format.
Sesame (opendrf.org) - Sesame is an open source RDF framework with support for RDF Schema inferencing and querying.
ARQ (openjena.org) - Supports the W3C standard for RDF queries. This is packaged with Jena and can be used against Jena models.
Note
These systems can be used in combinations. For example, OWL files created with OwlAPI can be read by Jena. However, Jena TDB is specific to Jena, and you must use Jena to access it. See below for persistence methods.
Semantic Knowledge Store
Stores represent the ability to store RDF and OWL representations. These tools provide support for larger datasets, and datasets which can be used in a way that allows for some amount of use within service environments. Although the tools listed below are open source, there are also non-open source tools which provide similar functionality including AlegroGraph and Oracle 11G. These system may also use Jena or Sesame for access.
Jena TDB (openjena.org) - This is an integration between the Jena model factory which provides a high speed persistence of RDF triples. Writes to this system are not fully transactional, and so care must be taken to manage transactions externally, where required.
Editing resumes here.
Jena SDB : http://openjena.org/SDB/ - Provides a persistance model using a verity of RDBMS back ends. Utilizing JDBC connections, SDF persists into traditional tables the RDF triple information. This solution is fully transaction, but suffers from insert and query limitations for performance.
Sesame SAILS : http://www.openrdf.org/- Sesame provides a way to integrate various representations for access (called SAILS). Sesame SAILS have been created for in-memory, Relational Database, and a variety of other formats.
OwLIM : http://www.ontotext.com/owlim/ - OwLIM is implemented as a Sesame SAIL and comes in both an open source and commercial implementation. OwLIM boasts that it is the most scaleable semantic repository in the world (The commercial implementation). It also offers a high speed reasoning system built into the system.
db2rq : http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/ - Not technically a general purpose store, db2rq provides a semantic reference layer to existing RDBMS environment, allowing for SPARQL and other interactions within a given environment.
Note : These systems can also interact. Specifically Jena can access Sesame through a model factory component and Sesame can access Jena through a SAIL. This does not indicate compatibility, but rather an abstraction. This means that the database tables created by Jena SDF do not match the RDBMS tables generated by Sesame. However, as an example, code written using Jena as an interface, if written correctly, can be independent of the persistence method.
Data Conversion and Artifact Access
These are tools which aid in the programatic access to artifact sources to aid in transformations and processing.
poi - http://poi.apache.org/ - Poi is a general purpose tool for accessing microsoft documents in the DOC, XLS, and traditional Microsoft proprietary format, in addition Poi also supports the openXML standard including Microsoft DOCX and related formats through the openxml4j api. Information for this standard can be found at http://www.openxml.biz/
OBO-Edit : http://oboedit.org/?page=index - For accessing OBO based ontologies the editor provides an API.
Eclipse EMF : http://www.eclipse.org/modeling/emf/?project=emf - For accessing models in XMI for conversion to other representations. Specifically the ecore component, but other aspects of the Eclipse EMF may be useful.
In addition the standard access tools for accessing RDF and OWL representations see above.
Integration Support
Spring : http://www.springsource.org/ - Spring provides a number of components which are designed to either ease the adoption of new technologies, or to provide greater control over certain integrations. The number of Spring components is large, however, some significant components include Spring Framework, Spring Flow, Spring Web Services, and Spring Security. These components are all based on certain core patterns that make components more flexible.
Inference, Rules and Expert Systems
These are defined as a way to provide methods of representing and executing decision support, orchestration, analysis and many other aspects of application functionality. They share a way to represent certain behaviors for which a more concise language has been created then traditional programming languages. Some of them support standards such as OWL DL, or RuleML but may support extensions or additional functionality which make them suitable.
RDF and OWL Tools
Pellet : http://clarkparsia.com/pellet - Pellet is an OWL 2 (partial) reasoner providing the core classification functionality. Pellet is broadly used and integrated into various platforms including Protege 4 and TopBraid Composer. It is written directly in Java and so can easily be integrated into other java applications directly without external configurations or implementations.
Fact++ : http://owl.man.ac.uk/factplusplus/ - Fact++ is an implementation of an OWL 2 (partial) reasoner written in C+. Fact ++ requires the implementation of a component which is accessed by applications. Because it is written in C+ it has the potential to be faster then Java implementations.
HermiT : http://hermit-reasoner.com/ - Is an OWL 2 reasoner implementing a high performance algorithm in Java. It is dependent on the OwlAPI.
TopBraid SPIN : http://www.topquadrant.com/topbraid/spin/api/ - is an implementation of the SPARQL Inferencing Notation. It is an open source implementation, and can be integrate in a number of ways. It has many uses including an RDF Constraint language, a Rules language, a SPARQL Function language (used as a way to extend SPARQL), and a method of storing reusable queries. Envisioned and implemented by TopBraid, it expands functional behaviors in ways that are impossible to declare in DL, or where it would be inappropriate. Use of TopBraid SPIN requires the use of the Jena API.
General Purpose
Jess : http://www.jessrules.com/ - Stands for Java Expert System Shell. Jess is an implementation of the RETE algorithm and supports a number of rule definition languages. It is the reference implementation of JSR 94 standard for java rule engines. It supports CLIPS and RuleML languages, as well as it's own XML representation of CLIPS. It provides many ways to extend the functionality into Java Applications in both direction (able to call java functions from the rules, as well as call rule functionality from java). It is available for without cost for academic uses as well as various commercial licenses. It does not have a cost for development, as there is a trial download that times out after a number of days, and can be re-downloaded.
Drools Expert : http://jboss.org/drools/ - Drools is a component of the JBoss community. It is described as a business logic integration platform. It has a number of components which may be optional integrated to provide different support including a managed rule repository. It is an implementation of the RETE algorithm. Drools supports a proprietary language as well as an XML representation of it's own language. Transformations of RuleML to Drools may be available.
Flow Management for services, processes and web applications
Open ODE : http://ode.apache.org/ - Open ODE is an apache project which utilizes the WS-BPEL standard for organization of work flow. It is supported by Apache Service MIX and can be used to manage web service choreography where that is appropriate.
Drools Flow : http://www.jboss.org/drools/drools-flow.html - Drools flow is an integration of the Drools rule engine designed to manage business or process flow. Drools Flow definitions can be rendered in the BPMN notations, but there is also a provided eclipse plugin for visual design of workflows. Drools Flow also works with Drools Guvner to provide a repository of workflows, and provides audit and control over workflow processes. Drools flow has built in support to provide monitoring of flow activities. Drools flow is an implementation of BPM.
Bonita Open Solution : http://www.bonitasoft.com/- The Bonita Open Solution is an implementation of BPM that provides an environment for designing, managing and executing flow control. Flows in Bonita are design graphically and can be executed directly through deployment as applications, or uploaded to a functional engine for execution. Bonita provides both an API for integration, but also provides a web based tool to provide execution points for state transitions and management.
Spring Flow : http://www.springsource.org/go-webflow2 - Using an integration with the Spring MVC, Spring flow allows for the definition of flows which control activity within a given session. This tool separates the page flow from the business logic allowing for many alternative flows using the same pages. This simplifies applications which perform activities in different modes (create vs edit) or through different means (create "Wizards").
Design
OWL and RDF provide the ability to represent information both as metadata as well as functional components of a system. As a result, individuals may produce RDF or OWL ontologies which will be integrated into the fabric of the system. In addition to the standard tools supported by CBIIT relating to design, the use of good ontology editors will help promote the consistency of representation and functionality. In some senses these are IDEs in the fact that development occurs, however, they can also be considered as PSM design tools because the output becomes a documentable model representation.
Protege 4 : [http://protege.stanford.edu/|http://protege.stanford.edu/] - Developed and supported at Stanford University, Protege has been used for Ontology development for many years. Protege 4 is an attempt to reach beyond the Frame based roots of Protege and provide a newly envisioned representation of OWL ontologies. Protege 4 utilizes the OwlAPI for accessing owl ontologies and so shares it's limitations. Specifically it does not support persistence to general purpose triple stores, and must be able to load the ontology entirely in memory. However, the use of the OwlAPI in design and editing where it is appropriate, give Protege 4 a performance advantage in the loading of ontologies, and provide unique functionality as it relates to ontology integration. Protege 4 is an Eclipse RCP Application.
TopBraid Composer : http://www.topquadrant.com/products/TB_Composer.html - Available in both a community edition as well as a paid license version. TopBraid Composer Community Edition provides an eclipse plugin based approach to ontology editing. This allows for the integration of other tools via the OSGI standard, as well as provides a basis for using other eclipse based tools. TopBraid Composer uses the Jena tool to access ontologies and shares it's limitations. In addition, the community edition is limited to the editing of OWL or RDF files and does not support access of database stores. It does support SPARQL as part of its functionality, and utilizes the Eclipse approach of projects.
Note : Since there is no open source or community tool for accessing stores other then text files, many developers use Protege 4 or TopBraid Composer, and then create scripted or programmed solutions to upload models into those stores. However, there are other significant additions to functionality in the paid versions of TopBraid Composer that are not addressed here.
Component Repository
There are some current repository systems that may help in the management of elements such as rules and flow controls.
Drools Guvnor : http://jboss.org/drools/drools-guvnor.html - The Drools Govnor is a tool which provides access to a common rule repository, flow repository and other aspects of the Drools system. Providing browsing and access control. In addition it integrates with graphical editor for rules and flows.