NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Scrollbar

Page info
title
title

The requirements for semantic infrastructure are defined as they relate to the architecture, use cases, and stakeholders.  This This section presents those functional requirements with tracing up to the use cases and down to the service capabilities specified later in this document.  Note this This section is not an exhaustive list of requirements and is expected to evolve as additional requirements are analyzed and defined.

This section includes the following:

Table of Contents
minLevel3

...

This section provides a description of the following requirement categories:

  • Artifact Management
  • Service Discovery and Governance
  • Forms Definition and Modeling
  • Decision Support and Reasoning
  • Conformance Testing
  • caGRID 2.0 Platform and Terminology Integration
Table of Contents
minLevel3

The requirements listed above address one or more use cases in each domain. In addition to the domain specific use-cases, the requirements also address CBIIT internal development and architecture requirements. Specifically, CBIIT has standardized on Service-Oriented Architecture as the foundational principle for applications architecture and interoperability. CBIIT has also adopted a formal approach (, Enterprise Conformance and Compliance Framework) , for defining service specifications. The specifications address both the requirements for supporting semantic interoperability, and the need to publish formal specifications that can be adopted by external organizations and vendors.

The following sections provide a detailed description of the requirements categories. Where possible the requirements are tied to specific use cases described in the previous section.

Artifact Management

Artifacts include support for different formats of models, both static and dynamic. Artifact management also includes the ability to manage content and clinical forms. A service specification is made up of service metadata, artifacts and the metadata supporting these artifacts. Artifact management primarily deals with managing artifact lifecycle and authoring of artifact metadata.

...

Support for Governance. By allowing for artifact versioning, along with state representation, artifact elements which require governance can be located and interacted with.  This functional aspect of the artifact management provides a change history as well as links to external change control systems.

Types of Artifacts

Static Models

Static models include a variety of models with different representations. Static models include but are not limited to:

  • Syntactical and semantic models 
    • XML, OWL, RDF
  • Information Models
    • UML, HL7 MIF, 11179
  • Meta Models
    • HL7 RIM
    • BRIDG
    • LS-DAM
  • Transforms
    • OMG Ontology Definition Metamodel Tranforms
  • Model Constraints
    • OCL, Schematron
  • Data Types
    • ISO 21090/HL7 R2
    • HL7 R1
    • Primitives
Behavioral Models

In the context of this paper, behavioral dynamic models capture the behavior of services. Behavior of services provides an unambiguous definition of the service constraints, capabilities, dependencies and interactions. The metadata and grammar required to realize service behavior is called behavioral semantics. Behavioral semantics provide a mechanism for better service discovery and enforcing the constraints at design and runtime.

...

  • HL7 SAIF behavioral model (which provides a formal model and grammar for service contracts)
  • Orchestrations and Workflows
  • Business Rules
Content

Content includes all unstructured text and other forms of content that make up a service specification. Examples include storyboards, and scope. Content is an integral part of service specification, and content is leveraged across the enterprise for documentation and communicaitons. Content includes:

  • Service specification content, primarily unstructured text
  • Images and other representations of static content
Forms

Forms include CDISC Operational Data Model (ODM), HL7 Clinical Document Architecture (CDA) documents, and HL7 Version 3 RIM derived forms. This includes all aspects of the document including the style, definitions and semantics. CDISC and NCI CBIIT require a Distributed, Collaborative Form Template Development Environment and a Distributed Knowledge Repository to capture and manage its Metadata.

  • Form Templates
  • Reusable Form Sections
  • Form Definitions
Specification Content

The National Cancer Institute has created many specification documents which include extended datatype flavors for the iso-21090 datatypes as well as the ECCF specifications for the behavioral framework, information framework, and governance framework. The specifications are an integral part of the semantic infrastructure, allowing the user to fully understand and appropriately apply the many artifacts stored in the ECCF registry.

Artifact Management Functions

Artifact lifecycle management and metadata requirements include the ability to:

...

Clinical Trials: Clinical trials use forms to capture clinical information, and the semantics captured by these forms are critical for interoperability and reporting. The semantic infrastructure must provide a mechanism to manage the lifecycle of these forms.

Service Discovery and Governance

Service discovery and governance allows service developers to specify rich metadata about services. This enables better discovery, and governance of services. Service discovery and governance help to accomplish the following.

...

Enable Better Discovery: Complex search offers a natural and user-friendly way to find services by progressively refining search results using a variety of criteria including attributes, artifacts, classification, usage scenarios, and dependencies. This includes runtime contract discovery, a powerful query mechanism that allows either the service orchestrator or a program to find the services that best fit the requirements of a given process. This increases both runtime and design time flexibility by enabling selection of services based on computable metadata.

Service Discovery Functions

  • Identify service endpoint for analysis
  • Identify service directory endpoint for analysis
  • Extract service interface
  • Annotate service interface providing undiscovered features or behaviors
  • Manage lifecycle, governance and versioning of the service Interfaces

...

caGRID 2.0 Platform: The caGRID 2.0 Platform provides a runtime registry for service discovery. This service registry relies on a small subset of information for discovery. The semantic infrastructure provides a mechanism to leverage rich service and artifact metadata to extend this capability.

Clinical Data Forms Definition and Modeling

Clinical Data Forms are the primary channel for capturing information in the healthcare and clinical domain. Forms also play a key role in information exchange and are critical to supporting interoperability in healthcare.

...

A document in this context is specifically a clinical document which represents information about a clinical activity. The document contains the specific information gained during that clinical activity and supports the broader definitions of a document. Documents can be transformed into human readable forms, and be transferred or transmitted for use across different systems.

Clinical Data Forms Functions

  • Define model objects for reuse
  • Define form
  • Bind value set to data element
  • Provide default form delivery
  • Provide form data transformation
  • Manage lifecycle, governance and version of forms and document schemas.

...

  • Electronic Health Records
  • ONC and Other external EHR adopters
  • Clinical Trails

Decision Support and Reasoning

One of the primary reasons for having structured data is to provide the ability to automate decision support and reasoning across information models, data types, and the terminology associated with the attributes of each data type. For the ECCF registry to provide maximal value to end users, it is necessary to support common decision support functions across the enterprise and to extend that through services to the end users.  In effect the semantic infrastructure must provide the tools to support Decision Support solutions.

...

Integration with Service Registries. Since the artifact metadata provides definitions of data, the service registry provide the data access needed to process information.  If a given artifact is a service, the decision support system determines the necessary definitions to integrate a service into decision support for the gathering of data.

Decision Support Functions

  • Query artifact metadata to locate useful artifacts for decision support.
  • Query service metadata to locate services matching artifacts and metadata definitions
  • Create a decision support definition
  • Create a decision support session
  • Provide scheduling and access information to choreographer
  • Selection of rules and rule system environment
  • Execution of reasoning systems against gathered data providing classification and additional data

...

  • Electronic Health Records
  • Clinical Trials

Conformance Testing

Services specifications developed by NCI and the community have to be testable to ensure that the implementation conforms to the specification.  Conformance testing leverages the artifact and service registries along with predefined reasoning systems to validate that an implementation adequately addresses the requirements stated in the service specification. An example of service requirement is the ability to specify a response time in the specification (design time) and validate that this response time is valid for an implementation of the service. Aadditional test points include but are not limited to binding to specific terminologies and domain models.

...

Analysis of accessibility and interoperability. Used to determine if a given service matches it's proposed service specification. Also determine if an artifact or specification is complete as it relates to data binding and value set binding.

Conformance Testing Functions are as follows:

  • Analyse Analyze artifact for ECCF Conformance and traceability
  • Produce non-conformancy statement 
  • Interact with governance systems 

...

Other National Initiatives: Other national organizations like NIST are adopting a similar approach to conformance testing.

caGRID 2.0 Platform and Terminology Integration

The Semantic Infrastructure has to support seamless integration with the caGRID 2.0 platform. The following are some high-level platform and terminology requirements that are either supported or addressed by the Semantic Infrastructure.

Service Generation

Service generation is the ability to generate services from user defined service metadata. The semantic infrastructure provides this metadata and the platform leverages this metadata for service generation. The constraints and policies specified in the semantic infrastructure are inherited by the platform and are enforced as runtime policies.

Additional platform specific and runtime information is provided by the developer at the time of service generation.

Service Discovery and Utilization

This group of requirements focuses on enabling developers of composite services and applications to discover, compose, and invoke services. This includes the discovery of published services based on service metadata and the generation of client APIs in multiple languages to provide cross-platform access to existing services.

...

Link to use case satisfied from caGRID 2.0 Roadmap: all of the data management and access services in the use case are utilized by application developers to build the user interfaces that the clinicians use during the course of patient care.

Service Orchestration and Choreography

Service orchestration and choreography allows both application developers and non-developers to discover service "building blocks" that can be composed dynamically to provide business capabilities. Special cases include the orchestration of multiple services for a distributed query, or for a transactional workflow. Service orchestration and choreography will leverage static and behavioral semantics from the Semantic Infrastructure 2.0.

...

Link to use case satisfied from caGRID 2.0 Roadmap: Federated query over the TCGA data and other data sets is performed using a service orchestration.

Policy and Rules Management

Policy and Rules Management allow non-developer secondary users to create policies and rules and apply them to services. The scope of policies includes, but is not limited to, definition and configuration of business processing policy and related rules, compliance policies, quality of service policies, and security policies. Some key functional requirements for managing policies include capabilities to author policies and store policies, and to approve and validate policies and execute policies at runtime.

...

Link to use case satisfied from caGRID 2.0 Roadmap: Each institution has different data sharing needs, access control needs, and business rules for processing that are defined and customized. For example, policy at the pathologist's institution may state that the patient is scheduled for a visit when the review is complete.

Event Processing and Notifications

Event Processing and Notifications enables monitoring of services in the ecosystem and provides for asynchronous updates by services, effectively allowing a loose coordination of services that both provide and respond to conditions (possibly defined in business rules).

...

Link to use case satisfied from caGRID 2.0 Roadmap: As patient care proceeds, the system notifies the designated clinicians that data (for example, images) are ready for review. Similarly, when notifications are received, event processing logic allows the appropriate parties to assign clinicians for care. In order to facilitate better treatment (a learning healthcare system), as new de-identified glioblastoma data is made available, notifications are sent that could indicate a recommended change in the treatment plan.

Data Representation and Information Models

This set of requirements includes providing an application developer with the ability to define application-specific attributes (for example, defined using ISO 21090 healthcare datatypes) and an information model that defines the relationships between these attributes and other attributes in the broader ecosystem. In particular, the last requirement suggests linked datasets, where application developers can connect data in disparate repositories as if the repositories are part of a larger federated data ecosystem. Additional requirements include the ability to publish and discover information models. Support is needed for forms data and common clinical document standards, such as HL7 CDA. To support the use of binary data throughout the system, the binary data must be typed and semantically annotated.

...

Link to use case satisfied from caGRID 2.0 Roadmap: The pathology, radiology and other data have various data formats which must be described, and the information model for the patient record must link between these various datatypes. The complete information model includes semantic links between datasets to build a comprehensive electronic medical record. Annotations on data are defined and included in the information model.

Data Management

Data management includes linking of disparate data sets and updates of data across the ecosystem. Data updates may include updates to multiple data sources, necessitating the need for transactions.

...

Link to use case satisfied from caGRID 2.0 Roadmap: the patient has an electronic medical record that spans multiple institutions. The clinical workup data (for example, genomics and proteomics data) is linked to the clinical care record; similarly pathology and radiology findings must be attached to the patient's electronic medical record.

Data Exploration and Query

The wealth of data must be accessible, resulting in the need for exploration of available datasets. This includes the ability to view seamlessly across independent data sets, allowing a secondary user to integrate data from multiple sources. In addition, the query capability must support sophisticated queries such as temporal queries and spatial queries.

...

Link to use case satisfied from caGRID 2.0 Roadmap: The oncologist must be able to quickly find glioblastoma data sets, indicating the fields that he is interested in comparing from his clinical data in order to find similar disease conditions and associated treatment plans. Temporal queries allow clinicians to identify changes in patient condition and treatment over time.

Provenance

Provenance encompasses the origin and traceability of data throughout an ecosystem. This is a clear requirement directly from the use case in order to ensure that all steps of patient care and research are clearly linked via the patient record.

...

Link to use case satisfied from caGRID 2.0 Roadmap: The origin of data is tied to the data creator, allowing the oncologist performing the match against TCGA data and other datasets to include and exclude data sets based on their origin.

Data Semantics

In a diverse information environment, semantics must be used to clearly indicate the meaning of data. This requirement is expected to be addressed by the Semantic Infrastructure, although there will be a touchpoint between the caGrid 2.0 and the Semantic Infrastructure to annotate data with semantics. Integration with the Semantic Infrastructure will enable reasoning, semantic query, data mediation (for example, ad hoc data transformation) and other powerful capabilities.

...

Link to use case satisfied from caGRID 2.0 Roadmap: The oncologist accesses the TCGA database to search for de-identified glioblastoma tumor data that is similar to the patient data exported from the hospital medical record. During this search, the semantics of the data fields are leveraged to indicate matches between TCGA data fields and the hospital medical record data fields.

External Data Repositories

There are numerous data repositories on the web today. These data repositories contain essential information that must be accessible to services in the ecosystem. As a result, caGrid 2.0 must provide capabilities to integrate these external repositories into the grid with the assumption that the remote service cannot be changed.

...