NIH | National Cancer Institute | NCI Wiki  

DRAFT

Project Scope

The Knowledge Repository (KR) project is charged with supporting the activities defined in Initiative 1 of the caBIG Semantic Infrastructure effort. Therefore, the document on the Vocabulary Knowledge Center Wiki that describes Initiative 1 is the starting point for defining the high-level scope of this project. The NCI caBIG Semantic Infrastructure and Operations (SIO) group's representatives (Denise Warzel & Dave Hau), referred to here as NCI representatives, will provide direction and ensure that we interpret Initiative 1 in a way that aligns with the vision and mission of SIO, caBIG, and NCI.

The scope statements below represent our current understanding of the scope of this project. However, as we gain further understanding of the needs of the NCI stakeholder community, these scope statements will evolve. Therefore, these statements serve to guide our near- to mid-term activities, but do not constrain the ultimate scope of the project.

The scope for Release 1, which will be delivered in March of 2011, will be specified in the Release 1 Scope document. That document is derived from the prioritization of lower-level requirements that are specified in the KR Software Requirements Specification document. Furthermore, there are multiple touch-points among initiatives. The #Inter-Initiative Touch-Points section below describes those and how they affect the scope of this project.

Scope Statements

S1: ISO 11179 ed. 3 Registry Support

Description:

Provide design and reference implementation of a ISO 11179 ed. 3 compliant metadata repository.

Source:

Initiative 1 explicitly calls for "a production realization of the PIM of the distributed ISO 11179 Ed3 repository..."

Justification:

According to the standard, the purposes of ISO/IEC 11179 are:

  • Standard description of data
  • Common understanding of data across organizational elements and between organizations
  • Re-use and standardization of data over time, space, and applications
  • Harmonization and standardization of data within an organization and across organizations
  • Management of the components of data
  • Re-use of the components of data

To the extent that ISO 11179 achieves these purposes, it provides support for data integration. This aligns with caBIG goals of supporting translational research, which benefits from data integration.

S2: Model Repository Support

Description:

Provide the design and reference implementation of Model Repository, initially to include representations of UML Models, but also to allow recording/registering of other types of end user models that represent the information model for a given application.

Source:

Initiative 1 calls for "...model repositories and tooling including storage and sharing of DAMs and of 'lower-level" models derived from DAMs." It describes the need to use UML-based information models such as BRIDG and LS-DAM to provide a "shared semantic view." It describes the need to enable "model repositories to be decomposed into data description..." We interpret this to mean that there should be some alignment of UML information models to ISO 11179 metadata elements, similar to what is currently in use within caBIG. The purposes for this alignment include 1) encouraging reuse of data elements; 2) enabling description of semantic relationships among UML models. However, alternatives should be considered that may not include ISO 11179, but do support creation of a shared semantic view.

Justification:

BRIDG and LS-DAM represent the consensus, shared semantic views of the clinical research and care and life sciences domains. By providing the capability to describe the semantic relationships among these models and other UML-based information models, the repository would enable view-based data integration, which can support query re-writing or construction of data warehouses.

Additional Notes

One of the goals of this effort is to reduce barriers to sharing metadata. Therefore, the repository should be not be restricted to recording only UML or 11179 metadata elements. Instead, it should allow users to record metadata in any form (e.g. spreadsheets, XSDs, etc.). The ISO 11179 ed. 3 model allows for this kind of unconstrained recording of metadata. So, ultimately, we may be able to use that feature of 11179 in order to achieve the goal of lowering barriers.

S3: ISO 21090 Support

Description:

Extend the 11179 Value Domain to support representation of ISO 21090 Heathcare datatypes and accommodate UML models that use these datatypes.

Source:

Initiative 1 does not explicitly call for ISO 21090 support. However, it might be inferred from a call for support of DAMs such as BRIDG which use these datatypes. The Statement of Work for the Knowledge Repository project specifically calls for "Extension of ISO 11179 Value Domain Datatype to include representation of ISO 21090 Healthcare data types" and "support for representation of complex ISO 21090 Healthcare data types" by the services that this project produces.

Justification:

According to the ISO 21090 standard, its purpose is to:

  • provide set of data type definitions for representing and exchanging basic concepts that are commonly encountered in healthcare environments in support of information exchange in the healthcare environment,
  • specify a collection of healthcare related data types suitable for use in a number of health related information environments,
  • declare the semantics of these data types using the terminology, notations and data types defined in ISO 11404 rev 2005,
  • provide UML definitions of the same data types using the terminology, notation and types defined in Unified Modeling Language (UML) version 2.0,
  • define an XML (Extensible Markup Language) based representation of the data types suitable for use when exchanging information between information processing entities.

To the extent that use of these datatypes will support interoperability, this item aligns with the caBIG goal to "Connect scientists and practitioners through a shareable and interoperable infrastructure."

Additional Notes

The system should provide support for selecting and describing specific 21090 types, and allow localization of 21090 data types (constraining/expanding).

S4: Semantic Transformation Support

Description:

The system should allow metadata that has been stored in the repository to be queried and manipulated from the perspective of multiple views. Therefore, we will define isomorphic transformation between all supported views. ISO 11179 and UML are the currently identified views.

Source:

Initiative 1 calls for "...transformation capabilities to allow model repositories to be decomposed into data descriptions that can be reused through the existing infrastructure to support deployment of these semantics in practical end user solutions via software engineering techniques such as forms development and forms generation. "
Here, the main requirement seems to be, enabling use of the existing infrastructure, which is oriented toward a ISO 11179 view of metadata.
The RFP expands the notion of transformation to include higher-level services such as comparison. This is captured here in a separate scope item.

Justification:

There are existing tools (e.g. form builders) that rely on a ISO 11179 view of metadata. This project should support them.

S5: Distributed, Federated Repositories

Description:

The architecture of the 11179 and UML repositories should enable physical distributed and logical federation. The architecture should support distributed linking of content and federated workflows (e.g. versioning, curation, query) over that content.

Source:

The sub title of Initiative 1 is "Distributed, federated metadata repositories and model repositories and operations." It indicates that the "...architecture for the distributed metadata repository (MDR) will be decentralized in nature, allowing multiple peer repositories to be present at the same time, for sharing of data elements."

Justification:

According to the SI Conops Initiatives overview, "The need for all semantic metadata to be formally recorded in a single central repository would limit or preclude application of the semantic infrastructure to very large, diverse communities such national health care. Distributed, federated metadata resources will clearly be required."

S6: Semantic Service Oriented Architecture

Description:

The architecture of the metadata and model repositories will be service-oriented, according to the principles defined here. Functionality will be exposed through semantically annotated service interfaces.

Source:

The Si Conops calls for the use of the Services Aware Interoperability Framework (SAIF), which prescribes a SOA-based approach.

Justification:

NCI is using SOA as it strategy to "ensure working interoperability between differing systems that need to access or exchange specific classes of information and/or coordinate cross-application behaviors."

S7: Semantic Integration Tooling

Description:

This project will produce tooling that utilizes the distributed, federated metadata and model repository services to support an updated and revised semantic integration workflow (i.e. the ECCF) that enables decentralized and localized additions of models and data elements.

Source:

Initiative 1 calls for "Development of a set of user applications or services as needed for creation, management, search and retrieval and of metadata." And, the SI Conop Mission statement calls for:

  • Employ the Enterprise Compliance and Conformance Framework to represent frameworks and models in an implementation independent manner;
  • Build and adapt tools and interfaces for generation, curation, storage and use of semantic information, and for convenient lookup, retrieval and transformation of this information by both end-users and applications;

Justification:

The semantic infrastructure requirements have both run-time and design-time aspects. The primary design-time requirements involve application of the ECCF to build systems that achieve Working Interoperability.

S8: Semantic Infrastructure Interoperability

Description:

The NCI's approach to metadata registries should support some level of interoperability with registries of other agencies and external groups.

Source:

<source>

Justification:

<justification>

Inter-Initiative Touch-Points

<working>

  • No labels