NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'


Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Page info

To be provided.

This section includes the the high-level semantic requirements derived from the use-cases. The semantic requirements provides provide a framework for a detailed description of services in the architecture section.


  • Semantic Infrastructure Users and Roles
  • Functional Requirements
    • Artifact Management
      • Static Models
      • Behavioral Models
      • Forms
      • Specification Content
    • Service Discovery & Governance
      • Discovery
      • Lifecycle Management
      • Governance
    • Case Report Form Modeling
      • Form template authoring
    • Conformance Testing
    • caGRID 2.0 Platform & Terminology Integration

Requirements Analysis

This section presents the dervied derived requirements, as a result of the requirements analysis of the use cases presented in previous section. The analysis includes tracing of requirements up to the use case and stakeholders and down to service capabilities specified later in this document. Note that the requirements section is not complete and this section is expected to evolve as additional requirements are added.

Semantic Infrastructure Consumers and Roles

The semantic infrastructure is expected to address the needs of a broad group of stakeholders. The semantic infrastructure as defined in this section provides foundational specifications and capabilities for that address the requirements of the following key users:

  • Clinicians
  • Model Developers
  • Service Developers
  • Service Architects
  • Service Analysts
  • CBIIT Enterprise Architecture Governance
  • Vendors
  • Platforms, including caGrid 2.0
  • BioInformatics Specialists

Functional Requirements

This section includesprovides a description of the following requirement categories:

  • Artifact Management
  • Service Discovery and Governance
  • Forms Defintion Definition & Modeling
  • Conformance Testing
  • caGRID 2.0 Platform & Terminology Integration

The requirements listed above address one or more use cases in each domain. In addition to the domain specific use-cases, the requirements also address CBIIT's internal development and architecture requirements. Specifically, CBIIT has stardardized standardized on Services Oriented Architecture as the foundational principle for applications architecture and interoperability. CBIIT has also adopted a formal approach to (Enterprise Conformance and Compliance Framework) for defining service specifications, supporting both . The specifications address both the requirements for supporting semantic interoperability, and the need to publish formal specifications that can be adopted by external organizations and vendors.

The following sections provide detail on these categories of requirements, defining the requirement as well as describing the relevance to our primary and secondary use-casesa detailed description of the requirements categories, where possible the requirements are tied to specific use-cases described in the previous section

Artifact Management

Artifacts includes include support for different formats of models, both static and dynamic. Artifact management also includes the ability to manage content and clinical documentsforms. A service specification is made up of service metadata, artifacts and the metadata supporting these artifacts. Artifact management primarily deals with managing artifacts lifecycle and authoring of artifact metadata.

Static Models: 

Static models include a variety of models with different represenationsrepresentations. These models Static models include (but not limited to):

  • XML Schemas
  • UML/HL7 Models, including domain models like BRIDG and LS-DAM.
  • OWL
  • Meta Models
  • Transforms
  • Model Constraints, like OCL
  • Data Types

Behavioral Models: 

In the context of this paper, behavior/dynamic models capture the behavior of services. Behavior of services provides an uambigous unambiguous definition of the service constraints, capabilities, dependencies and interactions. The metadata and grammar required to realize service behavior is called behavioral semantics, behavioral semantics provide a mechanism for better service discovery and enforcing the constraints at design and runtime.

Dynamic models include (but not limited to):

  • HL7 SAIF behavioral modelmodel (which provides a formal model and grammar for service contracts)
  • Orchestrations & Workflows
  • Business Rules.


Content includes all instructured unstructured text and other forms of content that make up a service specification, examples include storyboards, scope, etc. Content is an integral part of service specification that , and content is leveraged across multiple initiativesthe enterprise for documentation, etc.

  • Service specification content, primarily unstructured text
  • Images and other representations of static content 


Forms include both ODM and CDA documents. This includes all aspects of the document including the style, defintions definitions and semantics.

  • Form Templates
  • Form DefintionsDefinitions

 Artifact lifecycle management and metadata requirements include the ability to:

  • Manage lifecycle/governance/versioning of the models, content and forms.
  • Establishing relationships and dependecies dependencies between models, content and forms
  • Provenance, Jurisdiction, authority and intelluctual intellectual property
  • Representation and views of the information, realized through the appropriate transforms
  • Access control and other security constraints
  • Annotations for better discovery and searching of artifacts
  • Usage Scenarios and Context for the information
  • Terminology and Value Set binding

The artifacts are bound to the services via the service metadata, the service metadata combined with the artifacts and its supporting metadata provide a comprehensive service specification.

The artifact management requirements listed above are is derived from the following use cases in the previous section:

caEHR: The caEHR project has adopted ECCF for specifications and CDA documents for interoperability. The caEHR project requirements include, the need for an infrastructure for managing all the artifacts generated during specification process, including HL7 models and documents. The caEHR project also intends to publish these artifacts to the community and vendors. The infrastructure needs to support better discovery, making all the relevant information avaialble available in the right context.

ONC and Other external EHR adopters: ONC has adopted CCD and CCR for meaninfuly meaningful use. All national EHR implementations are expected to support forms and the semantics of these forms play a critical role in interoperability. The semantic infrastructure must provide a mechanism to create, store and manage these forms.

Clinical TrailsTrials: Clinical trails Trials use forms to capture clinical information, and the semantics captured by these forms are critical for interoperability and reporting. The semantic infrastructure must provide a mechanism to manage the lifecycle of these forms. 

Service Discovery and Governance

Service discovery and governance allows service developers to specify rich metadata about services. This enables better discovery, and governance of services. Service discovery and governance help:

Promote Service Reuse: The use of well defined service metadata promotes better discovery and reuse of services during design and run time, service metadata includes information about service interactions and dependencies. It also includes a classification scheme for organizing services based on business objectives, domain, usage, etc. It also links services to all the supporting artifacts in the specification and provides a placeholder for conformance statements. This enables better reuse across the enterprise and eliminates redundancy.

Establish Service policies: Service policies help establish constraints on the service specifications and mandate an approach. Policies can be specified around governance, access control and other design/runtime constraints.

Governance: This includes predefined templates, workflows, and governance policies for governing the service lifecycle. An approval and review process for service specifications and the ability to promote services through the stages of the service lifecycle

Better Discovery: Complex  Complex search that offers a natural and user-friendly way to find services by progressively refining search results using a variety of criteria including attributes, artifacts, classification, usage scenarios, and dependencies. This includes runtime contract discovery, a powerful query mechanism that allows either the service orchestrator or a program to find the services that best fit the requirements of a given process. This increases both runtime and design time flexibility by enabling selection of services based on computable metadata. 

The requirements listed above are derived from the following use cases in the previous section: 

caEHR: The caEHR project is developing service specifications and lacks the infrastructure to govern these services. Vendors and external implementators implementations are expected to leverage the caEHR service specifications and there is currently no infrastructure that allows easy discovery and consumption of this information.

CBIIT Projects: CBIIT has adopted SOA. Service lifecycle management and governance are industry best practices for all organizations adopting SOA. Better service discovery and reuse improves productivity, avoids redundancy and makes it easier for the CBIIT enteprise enterprise architecture governance team to manage NCI's enterprise services portfolio:

Life Sciences: Service discovery based on a rich metadata and semantics of the underlying data play a critical role in developing research pipelines. Reseach pipeline Research pipelines are developed by stringing connecting data and analytical services together to achieve a research objective. Example, .objective.

Other National Initiatives: All EHR vendors and national initiatives rely on a services paradigm for integration and interoperability. A standadized standardized services metamodel makes it easier for participating organization to discover and reuse services.

caGRID 2.0 Platform: The caGRID 2.0 Platform provides a runtime registry for service discovery. This service registry relies on a small subset of information for discovery. The semantic infrastructure provides a mechanism to leverage rich service and artifact metadata to extend this capability.

Forms Definition & Modeling

Case Report Forms are the primarily channel for capturing information in the healthcare and clinical domain. Forms also play a key role in information exchange and are critical to supporting interoperability in healthcare.

A form differs from a document, ; a document is used to captures capture information. A form defines skip patterns, validation rules, and any other aspect required to render or capture information for a document.


  • Tools and Services for defining form templates
  • Ability to leverage models and reusable segments for defining these forms
  • User friendly modeling tools that support rich semantics hide the complexity of the underlying semantics

The requirements listed above are derived from the following use cases in the previous section:

  • caEHR
  • ONC and Other external EHR adopters
  • Clinical Trails

Conformance Testing

Services specifications developed by NCI and the community have to be testable to ensure that the implementation is conformant with the specification.


  • CBIIT's adoption of ECCF: ECCF requires all specification developers to make conformance statements, the conformance testing framework leverages these conformance statements to generate validation tests
  • .. Other National Initiatives: Other national organizations like NIST are adopting a similar approach to conformance testing

caGRID 2.0 Platform and Terminology Integration

The Semantic Infrastructure has to support seamless integration with the caGRID 2.0 P/S/Tplatform. The following are some high-level P/S/T and platform and terminology requirements that are either supported or addressed by the semantic infrastructure: 


Service Generation


Service generation is the ability to generate services from user defined service metadata, the semantic infrastructure provides this metadata and the P/S/T platform leverages this metadata for service generation. The constraints and policies specified in the semantic infrastructure are inherited by the platform and are enforced at runtime policies.

Additional platform specific and runtime information is provided by the developer for at the time of service generation.


Service Discovery & Utilization


This group of requirements focuses on enabling developers of composite services and applications to discover, compose, and invoke services. This includes the discovery of published services based on service metadata and the generation of client APIs in multiple languages to provide cross-platform access to existing services.


The platform will use the semantic infrastructure service metadata to address all the service discovery requirements. The semantic infrastructure relies on metadata about services and artifacts.

Link to use case satisfied from caGRID 2.0 roadmap: As institutions share de-identified glioblastoma data sets, they are available to others via data discovery. The treatment recommendation service used by the oncologist is able to discover these new data sets and their corresponding information models, and include that data for subsequent use in recommendation of treatment.

Link to use case satisfied from caGRID 2.0 roadmap: all of the data management and access services in the use case are utilized by application developers to build the user interfaces that the clinicians use during the course of patient care.


Service Orchestration and Choreography


Service orchestration and choreography allows both application developers and non-developers to discover service "building blocks" that can be composed dynamically to provide business capabilities. Special cases include the orchestration of multiple services for a distributed query, or for a transactional workflow. Service orchestration and choreography will leverage static and behavioral semantics from the Semantic Infrastructure v2.

The semantic infrastructure provides the behavioral semantics required for dynamic composibility of services or generation of distributed queries. This includes runtime contract discovery/negotiation to determinine composibility of services based on service capabilities and constraints/negotiation to determine composibility of services based on service capabilities and constraints.

Another use case is dynamic retrieval and enforcement of the policies that are in effect for a service interaction in the areas of logging, validations, data transformation, or routing. This information can be used either during the design of the orchestration or during the execution of the defined flow.

Link to use case satisfied from caGRID 2.0 roadmap: Federated query over the TCGA data and other data sets is performed using a service orchestration.


Policy and Rules Management


Policy and Rules Management allow non-developer secondary users to create policies and rules and apply them to services. The scope of policies includes, but is not limited to, definition and configuration of business processing policy and related rules, compliance policies, quality of service policies, and security policies. Some key functional requirements to manage policies include capabilities to author policies and store policies, and for approval, validation, and run-time execution of policies.

The semantic infrastructure will provide a mechanism to specify policies, incuding including business processing policy policies and related rules, compliance policies, quality of service policies. Tools and services for creating security specific policies will be provided by the caGRID 2.0 platform and will be used by the semantic infrastructure. All other policies specified inthe in the semantic infrastructure will be enforced by the platform at runtime.

Link to use case satisfied from caGRID 2.0 roadmap: Each institution has different data sharing needs, access control needs, and business rules for processing that are defined and customized. For example, policy at the pathologist's institution may state that the patient is scheduled for a visit when the review is complete.


Event Processing and Notifications


Event Processing and Notifications enables monitoring of services in the ecosystem and provides for asynchronous updates by services, effectively allowing a loose coordination of services that both provide and respond to conditions (possibly defined in business rules).

The semantic infrastructure will provide a placeholder to specify events and triggering conditions for data and services, the platform monitors these events at runtime and acts on these events.

Link to use case satisfied from caGRID 2.0 roadmap: As patient care proceeds, the system notifies the designated clinicians that data (for example, images) are ready for review. Similarly, when notifications are received, event processing logic allows the appropriate parties to assign clinicians for care. In order to facilitate better treatment (a learning healthcare system), as new de-identified glioblastoma data is made available, notifications are sent that could indicate a recommended change in the treatment plan.


Data Representation and Information Models


This set of requirements includes providing an application developer with the ability to define application-specific attributes (for example., defined using ISO 21090 healthcare datatypes) and an information model that defines the relationships between these attributes and other attributes in the broader ecosystem. In particular, the last requirement suggests linked datasets, where application developers can connect data in disparate repositories as if the repositories are part of a larger federated data ecosystem. Additional requirements include the ability to publish and discover information models. Support is needed for forms data and common clinical document standards, such as HL7 CDA. To support the use of binary data throughout the system, the binary data must be typed and semantically annotated.

All Information models, their represenation representation and binding to datatypesdata-types/terminologies will be managed by the semantic infrastructure. The ability to publish and disover discover information models will be supported by the semantic infrastructure, and the platform will leverage these capabilities.

Link to use case satisfied from caGRID 2.0 roadmap: The pathology, radiology and other data have various data formats which must be described, and the information model for the patient record must link between these various datatypes. The complete information model includes semantic links between datasets to build a comprehensive electronic medical record. Annotations on data are defined and included in the information model.


Data Management


Data management includes linking of disparate data sets and updates of data across the ecosystem. Data updates may include updates to multiple data sources, necessitating the need for transactions.

Linkages between the different disparate data sets will be managed by the semantic infrastructure. Data upates updates that trigger transactions are captured by the platform and propogated are propagated upstream to the semantic infrastructure. An example would be the platform monitoring events to identify changes to data,

Link to use case satisfied from caGRID 2.0 roadmap: the patient has an electronic medical record that spans multiple institutions. The clinical workup data (for example, genomics and proteomics data) is linked to the clinical care record; similarly pathology and radiology findings must be attached to the patient's electronic medical record.


Data Exploration and Query


The wealth of data must be accessible, resulting in the need for exploration of available datasets. This includes the ability to view seamlessly across independent data sets, allowing a secondary user to integrate data from multiple sources. In addition, the query capability must support sophisticated queries such as temporal queries and spatial queries.

The semantic infrastructure will provide metadata for discovery of these datasets, complex temporal and spatial queries will be informed by the metadata but will be formulated and exectured executed by the platform.

Link to use case satisfied from caGRID 2.0 roadmap: The oncologist must be able to quickly find glioblastoma data sets, indicating the fields that he is interested in comparing from his clinical data in order to find similar disease conditions and associated treatment plans. Temporal queries allow clinicians to identify changes in patient condition and treatment over time.




Provenance encompasses the origin and traceability of data throughout an ecosystem. This is a clear requirement directly from the use case in order to ensure that all steps of patient care and research are clearly linked via the patient record.

The semantic infrastructure will provide data provenance support.

Link to use case satisfied from caGRID 2.0 roadmap: The origin of data is tied to the data creator, allowing the oncologist performing the match against TCGA data and other datasets to include and exclude data sets based on their origin.


Data Semantics


In a diverse information environment, semantics must be used to clearly indicate the meaning of data. This requirement is expected to be addressed by the Semantics Infrastructure, although there will be a touchpoint between the caGrid 2.0 and the semantics infrastructure to annotate data with semantics. Integration with the semantics infrastructure will enable reasoning, semantic query, data mediation (for example, ad hoc data transformation) and other powerful capabilities.

Data Semantic are captured in the semantic infrastructure,

Link to use case satisfied from caGRID 2.0 roadmap: The oncologist accesses the TCGA database to search for de-identified glioblastoma tumor data that is similar to the patient data exported from the hospital medical record. During this search, the semantics of the data fields are leveraged to indicate matches between TCGA data fields and the hospital medical record data fields.


External Data Repositories


There are numerous data repositories on the web today. These data repositories contain essential information that must be accessible to services in the ecosystem. As a result, caGrid 2.0 must provide capabilities to integrate these external repositories into the Grid with the assumption that the remote service cannot be changed.

Link to use case satisfied from caGRID 2.0 roadmap: The oncologist searches both TCGA glioblastoma data as well as de-identified data that has been added by care providers around the country. The additional data sets are external data repositories.Terminology Integration