Introduction

The NCI-supported caBIG program is facing a major expansion of its mission and of the community of stakeholders that it must engage. The success of the program to date has been widely recognized by numerous thought leaders, technology innovators, and program managers in the fields of biomedical computing and health information technology. These leaders have been influenced by both the technological sophistication and the open community processes of the caBIG program, and are seeking to adapt caBIG to meet the much broader agenda of health care information management, sharing, and integration across institutions, communities of practice, and patients.

As a means towards engaging this larger community and contributing to the creation of solutions for its needs, the BIG Health Consortium was formed, a non-governmental organization that is affiliated with caBIG, but which has a complementary agenda that is focused on enabling the integration of research, care delivery, and patient-driven decision making in practical ways. As a consequence, the institutional stakeholders associated with BIG Health Consortium extend well beyond the cancer centers, universities, and cooperative groups that make up the core NCI-funded constituencies that have participated in caBIG since its original incarnation. The extended community thus goes beyond what is considered caBIG-proper today, and includes groups that are not yet connected to caGrid.

Background/Scope

The primary goal for this Architecture, Development, and Deployment of a Knowledge Repository and Service is to address the needs of this extended community for a scalable, decentralized infrastructure for managing and disseminating operational metadata and information models, and their associated semantic constructs. The vision for the project is to re-imagine the caBIG technology environment as a more open and more readily extensible framework, one that can grow with less dependency on the centralized processes and systems that are manifest in the first generation of caBIG technology. In particular, the role of the central metadata registry, the caDSR, must be redefined as a federation of metadata registries that can be instantiated and plugged into the caBIG grid or an extended community "cloud" by any qualified entity.

The caDSR has a suite of tools and APIs that support workflows for metadata development, browsing and retrieval. In addition, the caDSR has been adapted to support the UML model-driven development paradigm adopted by caBIG. UML-defined information models such as those from the BRIDG project, caArray, caTissue, and others are each registered in the caDSR through conversion of the model elements into ISO11179 metadata constructs. This functionality, and the workflows that it supports, has evolved over an 8-year period and is now quite mature. It satisfies the current requirements for semantic representation in the current caBIG developer and user community, but it is ill-suited to serve the new requirements for decentralization and indefinite scalability in the broader health care community. The goal of this program is therefore to harvest and recycle the best elements of the first generation of caBIG metadata infrastructure, and to then incorporate those elements into a redesigned and modernized technology stack that is engineered from the start to support a federated deployment topology with far less centralized administration.

Users and Characteristics

An Actor models a type of role played by an entity that interacts with the subject (e.g., by exchanging signals and data), but which is external to the subject. Actors may represent roles played by human users, external hardware, or other subjects. Note that an actor does not necessarily represent a specific physical entity but merely a particular facet (i.e., "role") of some entity that is relevant to the specification of its associated use cases. Thus, a single physical instance may play the role of several different actors and, conversely, a given actor may be played by multiple different instances.

While UML 2 does not permit associations between Actors, this constraint is often violated in practice since the generalization/specialization relationship between actors is useful in modeling overlapping behaviours between actors. The actors below are represented as having a hierarchical relationship for ease of understanding; however, these relationships can easily be removed.

Cancer Researcher: plans and performs activities related to discovery of new knowledge, drugs, and treatments in the field of oncology
- Clinical Researcher: works directly with patients and/or patient data while performing cancer research
- Basic Science Researcher: works with scientifically generated data while performing cancer research
- Protocol Designer: defines the methods used to perform cancer research
Information Technologist: designs, develops, and manages the software and hardware necessary to perform cancer research
- Business Analyst: analyzes the business processes and describes the goals and activities of the user community
- Information Modeler: designs, defines, and describes the data that will be captured during cancer research activities
- Software Engineer: implements the software systems that are used to manage and perform cancer research
- Systems Architect: designs the software and hardware systems that are used to manage and perform cancer research
- System Administrator: manages the software and hardware systems that are used to perform cancer research
Metadata Specialist: having a deep understanding of semantics and syntactic, assists Cancer Researchers and Information Technologists on modeling and managing metadata
- Forms Author: constructs data collection forms based on a library of common data elements
- Metadata Curator: works hands-on with Cancer Researchers and Information Technologists to model and manage their metadata
- Metadata Systems Specialist: manages centralized metadata systems and assists with the design of metadata systems
- Terminologist: a metadata expert that manages and maintains the semantic concepts that underlie information models
- Compatibility Reviewer: while the nature of "caBIG Compatibility" may change as need dictates, the role of the Compatibility Reviewer will likely continue to be a Metadata Specialist that reviews a variety of documents/artifacts to determine the level of interoperability that a system meets
Patient: any person who receives medical attention, care, or treatment
- Subject: a Patient that is participating in Cancer Research

Resources

The following is a list of documents that provide background material, requirements, and related topics:

Con Ops Supplemental Page for Requirements Gathering: Supplemental VCDE Requirements Elicitation Initiative 2009 - 2010;
Semantic Requirements Forum: https://cabig-kc.nci.nih.gov/Vocab/forums/viewforum.php?f=34
Con Ops Stakeholders: Semantic Infrastructure Concept of Operations Stakeholders
Con Ops Requirements: Requirements Questionnaires
Con Ops Use Cases: Use Cases for Semantic Requirements

Terms & Definitions

Term	Definition
MUST	This word means that the definition is an absolute requirement of the specification.
MUST NOT	This phrase means that the definition is an absolute prohibition of the specification.
WILL	This word means that the definition is an absolute future requirement of the specification.
WILL NOT	This phrase mean that the definition is an absolute future prohibition of the specification.
SHOULD	This word means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
SHOULD NOT	This phrase means that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.
MAY	This word means that a requirement is truly optional. The developer may choose to include the item based on the needs of their design.

Use Case Level Criteria

When designing use cases, it is important to maintain a consistent approach to determining at which level use cases are placed when they are authored. The granularity of the use case directly relates to its implementability, so maintaining a leveling scheme will insure that use cases are implemented, tested, etc. at a consistent level. In alignment with this, we will maintain the following use case leveling criteria:

Global statement
Example: manage patient's health
Use Case Level 0: most general, high level business process
Example: treat this patient for cancer
Use Case Level 1: next level of business flow, such as inter-department level
Example: treat this patient for cancer by using diagnostics, education, treatments, etc.
Use Case Level 2: specific enough to drive major model dimensions (static/information model, behavioral model, governance model, etc.) and can include some exception conditions
Example: treat this patient for cancer by ordering these lab tests, evaluating the results, customizing a treatment or treatment plan to specifically address concerns
Use Case Level 3: includes details of each exchange of information, as well as assumptions about system boundaries
Example: treat this patient for cancer by ordering these lab tests in the caEHR system
i. find the patient, if new add patient, if not check for update
ii. create request(s) for testing for patient, then evaluate the results,
ii-a. read the faxed copy of the result or
ii-b. receive notification in Provider email that a result is ready for viewing, etc...
Use Case Level 4: project-specific solution for exchanges detailed in use case level-3 (could be MSG, SOA, or both)
Example:Standard Process Flow
1. Provider Refers Patient for Cancer Treatment
2. Provider and Patient have an encounter
3. Provider evaluates Patient's condition, treatment plan, current statistics, requests diagnostics
4. Provider updates treatment plan
5. Patient is treated, outcomes documented
6. Flow returns to any of the above steps or patient is released from cancer treatment

Assumptions and Dependencies

Usability

The user interface shall be designed for ease-of-use by the designated end-users, shall use terms common to the user's normal business environment, and shall require little to no additional training on the system. Drop down menus, Google-like searches, and tooltips should be used wherever appropriate.

Accessibility

The user interface should be accessible via a web browser.

General Assumptions

Following is a list of basic assumptions:

Federated Discovery Services: Knowledge Repository Services will be distributable and discoverable in a federated manner
Security Considerations: services and underlying data will be secured using the caBIG security architecture
Subscriptions/Notifications: services requiring subscription and notification functionality will be enabled by the caBIG pub/sub architecture
ISO 11179 Ed 3: the Knowledge Repository will comply with ISO 11179 Ed 3 specifications. See here for our analysis.
Ability to have non-administered items: the Knowledge Repository will provide for non-administered items in the ISO 11179 notion
Duplication/identification: in compliance with the Federated Discovery Services assumption, reference of data, data duplication, and unique identification of data will be handled

Dependencies

TBD

Functional Requirements

Model Services

Some basic description of model services.

ID	Requirement	Source	Release
MOS-1	The Model Service MUST support the abilities to record new model elements.	ConOps Init 1	---
MOS-2	The Model Service MUST support the abilities to record new assertions.	ConOps Init 1	---
MOS-3	The Model Service MUST support the abilities to record new rules.	ConOps Init 1	---
MOS-4	The Model Service SHOULD support the abilities to record queries.	ConOps Init 1	---
MOS-10	The Model Service MUST allow for the discovery of models based on querying model parts.	ConOps Init 1	---
MOS-11	The Model Service MAY support the federated discovery of models.	ConOps Init 1	---
MOS-12	The Model Service SHOULD allow the discovery of related model parts.	ConOps Init 1	---
MOS-20	The Model Service MUST support the comparison of model parts.	ConOps Init 1	---
MOS-21	The Model Service MUST support the harmonization of models through the comparison of model parts.	ConOps Init 1	---
MOS-30	The Model Service SHOULD support the merging of different parts of different models.	ConOps Init 1	---
MOS-40	The Model Service MAY support the graphical visualization of models.	ConOps Init 1	---
MOS-50	The Model Service MUST support the versioning of models.	ConOps Init 1	---
MOS-60	The Model Service MUST allow the retiring of models.	ConOps Init 1	---
MOS-70	The Model Service MUST allow the of updating of parts of a model.	ConOps Init 1	---
MOS-80	The Model Service MUST allow the deletion/unregistering of models.	ConOps Init 1	---
MOS-90	The Model Service SHOULD support the transformation of models.	ConOps Init 1	---
MOS-100	The Model Service SHOULD support the validation of models.	ConOps Init 1	---
MOS-110	The Model Service MUST support the reuse of model parts.	ConOps Init 1	---
MOS-111	The Model Service MUST support the entire reuse of direct copies of model parts.	ConOps Init 1	---
MOS-120	The Model Service MUST support the extension of model parts.	ConOps Init 1	---
MOS-121	The Model Service MUST allow the extension of classes.	ConOps Init 1	---
MOS-122	The Model Service MUST allow the extensions of model part relationships.	ConOps Init 1	---
MOS-123	The Model Service SHOULD allow the extension of attribute value lists.	ConOps Init 1	---
MOS-130	The Model Service SHOULD support the constraining of model parts.	ConOps Init 1	---
MOS-131	The Model Service SHOULD allow for classes to be subsetted.	ConOps Init 1	---
MOS-132	The Model Service SHOULD allow for attribute value lists to be subsetted.	ConOps Init 1	---
MOS-140	The Model Service MUST support the discovery of services related by model similarities.	ConOps Init 1	---
MOS-150	The Model Service SHOULD support the discovery of related documentation.	ConOps Init 1	---

Metadata Services

Some basic description of metadata services.

ID	Requirement	Source	Release
MDS-1	The Metadata Service MUST support the notion of data elements.	ConOps Init 1	---
MDS-2	The Metadata Service MUST allow the recording of data elements.	ConOps Init 1	---
MDS-3	The Metadata Service MUST allow new data element assertions to be made.	ConOps Init 1	---
MDS-4	The Metadata Service MUST allow for data elements to be updated.	ConOps Init 1	---
MDS-5	The Metadata Service MUST allow for data elements to be versioned.	ConOps Init 1	---
MDS-6	The Metadata Service SHOULD allow for data element business rules to be validated.	ConOps Init 1	---
MDS-7	The Metadata Service MUST allow for data elements to be retired.	ConOps Init 1	---
MDS-8	The Metadata Service MUST allow for data elements to be deleted/rolled back.	ConOps Init 1	---
MDS-9	The Metadata Service MUST support the discovery of reusable data element content.	ConOps Init 1	---
MDS-10	The Metadata Service SHOULD support the notion of data element usage information.	ConOps Init 1	---
MDS-11	The Metadata Service MUST support metadata to be queried by data element.	ConOps Init 1	---
MDS-12	The Metadata Service SHOULD support metadata to be queried by data element in a federated manner.	ConOps Init 1	---
MDS-13	The Metadata Service MUST support the comparison of data elements	ConOps Init 1	---
MDS-14	The Metadata Service SHOULD support the creation of a new data element from an existing data element.	ConOps Init 1	---
MDS-15	The Metadata Service MUST support the discovery of related models by data element.	ConOps Init 1	---
MDS-16	The Metadata Service SHOULD support the discover related metadata items by data element.	ConOps Init 1	---
MDS-17	The Metadata Service SHOULD support the discovery of related services by data element.	ConOps Init 1	---
MDS-18	The Metadata Service SHOULD support the discovery of related data element rules.	ConOps Init 1	---
MDS-19	The Metadata Service SHOULD support the discovery of forms related by data element.	ConOps Init 1	---
MDS-101	The Metadata Service MUST support the notion of value domains.	ConOps Init 1	---
MDS-102	The Metadata Service MUST allow the recording of value domains.	ConOps Init 1	---
MDS-103	The Metadata Service MUST allow new value domain assertions to be made.	ConOps Init 1	---
MDS-104	The Metadata Service MUST allow for value domains to be updated.	ConOps Init 1	---
MDS-105	The Metadata Service MUST allow for value domains to be versioned.	ConOps Init 1	---
MDS-106	The Metadata Service SHOULD allow for value domain business rules to be validated.	ConOps Init 1	---
MDS-107	The Metadata Service MUST allow for value domains to be retired.	ConOps Init 1	---
MDS-108	The Metadata Service MUST allow for value domains to be deleted/rolled back.	ConOps Init 1	---
MDS-109	The Metadata Service MUST support the discovery of reusable value domain content.	ConOps Init 1	---
MDS-110	The Metadata Service SHOULD support the notion of value domain usage information.	ConOps Init 1	---
MDS-111	The Metadata Service MUST support metadata to be queried by value domain.	ConOps Init 1	---
MDS-112	The Metadata Service SHOULD support metadata to be queried by value domain in a federated manner.	ConOps Init 1	---
MDS-113	The Metadata Service MUST support the comparison of value domains	ConOps Init 1	---
MDS-114	The Metadata Service SHOULD support the creation of a new value domain from an existing data element.	ConOps Init 1	---
MDS-115	The Metadata Service MUST support the discovery of related models by value domain.	ConOps Init 1	---
MDS-116	The Metadata Service SHOULD support the discover related metadata items by value domain.	ConOps Init 1	---
MDS-117	The Metadata Service SHOULD support the discovery of related services by value domain.	ConOps Init 1	---
MDS-118	The Metadata Service SHOULD support the discovery of related value domain rules.	ConOps Init 1	---
MDS-119	The Metadata Service SHOULD support the discovery of forms related by value domain.	ConOps Init 1	---
MDS-120	The Metadata Service SHOULD support the ability to create a value domain through subsetting/constraining.	---	---
MDS-121	The Metadata Service SHOULD allow value domains to be extended through the creation of new from existing.	---	---
MDS-130	The Metadata Service SHOULD support semantic transformations that would be explicitly based on value domain mapping, e.g. to the same value meaning.	ConOps Init 1	---
MDS-131	The Metadata Service MAY support syntactic transformations that would transform source value domain representation to target value domain representation.	ConOps Init 1	---
MDS-201	The Metadata Service MUST support the notion of data element concepts.	ConOps Init 1	---
MDS-202	The Metadata Service MUST allow the recording of data element concepts.	ConOps Init 1	---
MDS-203	The Metadata Service MUST allow new data element concept assertions to be made.	ConOps Init 1	---
MDS-204	The Metadata Service MUST allow for data element concepts to be updated.	ConOps Init 1	---
MDS-205	The Metadata Service MUST allow for data element concepts to be versioned.	ConOps Init 1	---
MDS-206	The Metadata Service SHOULD allow for data element concept business rules to be validated.	ConOps Init 1	---
MDS-207	The Metadata Service MUST allow for data element concepts to be retired.	ConOps Init 1	---
MDS-208	The Metadata Service MUST allow for data element concepts to be deleted/rolled back.	ConOps Init 1	---
MDS-209	The Metadata Service MUST support the discovery of reusable data element concept content.	ConOps Init 1	---
MDS-210	The Metadata Service SHOULD support the notion of data element concept usage information.	ConOps Init 1	---
MDS-211	The Metadata Service MUST support metadata to be queried by data element concept.	ConOps Init 1	---
MDS-212	The Metadata Service SHOULD support metadata to be queried by data element concept in a federated manner.	ConOps Init 1	---
MDS-213	The Metadata Service MUST support the comparison of data element concepts	ConOps Init 1	---
MDS-214	The Metadata Service SHOULD support the creation of a new data element concept from an existing data element concept.	ConOps Init 1	---
MDS-215	The Metadata Service MUST support the discovery of related models by data element concept.	ConOps Init 1	---
MDS-216	The Metadata Service SHOULD support the discover related metadata items by data element concept.	ConOps Init 1	---
MDS-217	The Metadata Service SHOULD support the discovery of related services by data element concept.	ConOps Init 1	---
MDS-218	The Metadata Service SHOULD support the discovery of related data element concept rules.	ConOps Init 1	---
MDS-219	The Metadata Service SHOULD support the discovery of forms related by data element concept.	ConOps Init 1	---
MDS-220	The Metadata Service MUST support the discovery of related data elements by data element concept.	ConOps Init 1	---
MDS-221	The Metadata Service SHOULD support the discovery of related value sets by data element concept.	ConOps Init 1	---
MDS-230	The Metadata Service SHOULD support the creation of a data element from an existing data element concept and value domain.	ConOps Init 1	---
MDS-231	The Metadata Service SHOULD support the creation of a new data element concept from an existing data element concept.	ConOps Init 1	---

Registry-Registry Service

ID	Requirement	Source	Release
RRS-1	The Registry-Registry service MUST support the import of content from one registry to another.	ConOps Init 1	---
RRS-10	The Registry-Registry service MUST support the export of content in a convenient data format.	ConOps Init 1	---
RRS-20	The Registry-Registry service SHOULD support the update of edited content from one registry to another.	ConOps Init 1	---
RRS-30	The Registry-Registry service MAY support the search of one registry from another.	ConOps Init 1	---
RRS-40	The Registry-Registry service MUST support content to be submitted from one registry to another.	ConOps Init 1	---
RRS-50	The Registry-Registry service MUST support the registration of content from one registry to another.	ConOps Init 1	---
RRS-60	The Registry-Registry service MUST support the updating of the registration of content from one registry to another.	ConOps Init 1	---

General Service

ID	Requirement	Source	Release
GEN-1	Services SHOULD support the notion of annotating any data with well defined concepts.	ConOps Init 1	---
GEN-10	Services SHOULD allow users to subscribe for notifications of changes to data.	ConOps Init 1	---
GEN-20	Services MUST support the notion of data reuse.	ConOps Init 1	---

Metadata Registry Tools

ID	Requirement	Source	Release
MRT-1	A Metadata Registry Tool MUST support a clinician friendly browser.	ConOps Init 1	---
MRT-10	A Metadata Registry Tool MUST support an information specialist browser.	ConOps Init 1	---
MRT-20	A Metadata Registry Tool SHOULD support a customizable browser.	ConOps Init 1	---
MRT-30	A Metadata Registry Tool MAY support a generalized portal that integrates various metadata registry tools.	ConOps Init 1	---
MRT-40	A Metadata Registry Tool SHOULD allow workflow management to support ECCF artifact creation.	ConOps Init 1	---
MRT-50	A Metadata Registry Tool MUST support an interface of browser/editing models and metadata with modeling tools.	ConOps Init 1	---

Non-functional Requirements

Performance

Performance refers to the qualitative or quantitative measure of how well a system reacts in a user workflow. This can be measured in time from a user or system perspective, as well as the amount of resources (CPU, memory, etc.) that software must consume to complete a task.

ID	Requirement
PE-1	Where not otherwise specified, web pages should be completely returned within seconds of request.

Auditing, Logging, and Provenance

Auditing, logging, and provenance is the process of recording events in an automated and/or manual way within a certain scope in order to provide an audit trail that can be used to understand the activity of the system and/or to diagnose problems.

ID	Requirement
ALP-1	The application will address Title 21 Code of Federal Regulations (21 CFR Part 11) Electronic Records where appropriate and reasonable.
ALP-10	The system must audit each and every user action that results in database access (read or write). Examples include: add/edit study or participant data, user login, query etc. The audit information must contain the following information: User who performed the action IP address of the computer from which the action is performed Timestamp of action Object and data element (i.e. table name and column name) Previous value and current value of the data element
ALP-20	Auditing information must be accessible in a timely manner to system administrators.
ALP-30	Auditing features must at least be available through standard database logging/auditing.
ALP-40	Logging must be implemented in all the architectural layers - presentation, business logic and data access layers

Fault Handling

Fault handling is a mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of software or user execution.

ID

Requirement

FH-1

Any runtime exceptions or errors must be reported to the user in a graphical window containing the probable cause of the problem and how to rectify that.

FH-10

The exceptions and errors shall be divided into two groups:

User errors
OS and System/Application errors

Quality of Service

Quality of service is the ability to provide sufficient uptime and availability of software to guarantee a certain level of performance to access and data flow.

ID	Requirement
QS-1	The system must be adequately validated during the system development lifecycle.
QS-10	The system must provide the functionality to generate and manage accurate data records during the development processes.
QS-20	The system development lifecycle must include validity checks for all data fields.

Usability

In design, Usability is the study of the ease with which people can employ a particular tool or other human-made object in order to achieve a particular goal.

ID	Requirement
US-1	An intuitive user friendly graphical user interface must be developed.
PE-10	Web page requests must resolve in a timely manner. Where not otherwise specified, this is on the order of seconds.
PE-20	The application will address section 508 of the Rehabilitation Act of 1973 where appropriate and reasonable.

Security

Security is the protecting of information and information systems from unauthorized access, use, disclosure, disruption, modification or destruction.

ID	Requirement
SE-1	The system must limit access to authorized individuals.
SE-10	Electronic Signatures should meet necessary requirements as described in 21 CFR part 11.
SE-20	The application will address section 508 of the Rehabilitation Act of 1973 where appropriate and reasonable.
SE-30	System developers must adhere to the caBIG Data Sharing & Intellectual Capital Policy and Procedures

Portability

Portability is the software codebase feature to be able to reuse the existing code instead of creating new code when moving software from an environment to another. The prerequirement for portability is the generalized abstraction between the application logic and system interfaces.

ID	Requirement
PO-1	Operating system native libraries should not be used.
PO-10	All the paths for the local file system must not be hard coded. Example C:\myDir etc.

Content

Space Tools

Table of Contents

Introduction

Background/Scope

Users and Characteristics

Related Documentation

Resources

Terms & Definitions

Use Case Level Criteria

Assumptions and Dependencies

Usability

Accessibility

General Assumptions

Dependencies

Functional Requirements

Model Services

Metadata Services

Registry-Registry Service

General Service

Metadata Registry Tools

Non-functional Requirements

Performance

Auditing, Logging, and Provenance

Fault Handling

Quality of Service

Usability

Security

Portability

Content

Space Tools

Knowledge Repository Requirements Specification

Table of Contents

Introduction

Background/Scope

Users and Characteristics

Related Documentation

Resources

Terms & Definitions

Use Case Level Criteria

Assumptions and Dependencies

Usability

Accessibility

General Assumptions

Dependencies

Functional Requirements

Model Services

Metadata Services

Registry-Registry Service

General Service

Metadata Registry Tools

Non-functional Requirements

Performance

Auditing, Logging, and Provenance

Fault Handling

Quality of Service

Usability

Security

Portability