Table of Contents
Introduction
The NCI-supported caBIG program is facing a major expansion of its mission and of the community of stakeholders that it must engage. The success of the program to date has been widely recognized by numerous thought leaders, technology innovators, and program managers in the fields of biomedical computing and health information technology. These leaders have been influenced by both the technological sophistication and the open community processes of the caBIG program, and are seeking to adapt caBIG to meet the much broader agenda of health care information management, sharing, and integration across institutions, communities of practice, and patients.
As a means towards engaging this larger community and contributing to the creation of solutions for its needs, the BIG Health Consortium was formed, a non-governmental organization that is affiliated with caBIG, but which has a complementary agenda that is focused on enabling the integration of research, care delivery, and patient-driven decision making in practical ways. As a consequence, the institutional stakeholders associated with BIG Health Consortium extend well beyond the cancer centers, universities, and cooperative groups that make up the core NCI-funded constituencies that have participated in caBIG since its original incarnation. The extended community thus goes beyond what is considered caBIG-proper today, and includes groups that are not yet connected to caGrid.
Background
The primary goal for this Architecture, Development, and Deployment of a Knowledge Repository and Service is to address the needs of this extended community for a scalable, decentralized infrastructure for managing and disseminating operational metadata and information models, and their associated semantic constructs. The vision for the project is to re-imagine the caBIG technology environment as a more open and more readily extensible framework, one that can grow with less dependency on the centralized processes and systems that are manifest in the first generation of caBIG technology. In particular, the role of the central metadata registry, the caDSR, must be redefined as a federation of metadata registries that can be instantiated and plugged into the caBIG grid or an extended community "cloud" by any qualified entity.
The caDSR has a suite of tools and APIs that support workflows for metadata development, browsing and retrieval. In addition, the caDSR has been adapted to support the UML model-driven development paradigm adopted by caBIG. UML-defined information models such as those from the BRIDG project, caArray, caTissue, and others are each registered in the caDSR through conversion of the model elements into ISO11179 metadata constructs. This functionality, and the workflows that it supports, has evolved over an 8-year period and is now quite mature. It satisfies the current requirements for semantic representation in the current caBIG developer and user community, but it is ill-suited to serve the new requirements for decentralization and indefinite scalability in the broader health care community. The goal of this program is therefore to harvest and recycle the best elements of the first generation of caBIG metadata infrastructure, and to then incorporate those elements into a redesigned and modernized technology stack that is engineered from the start to support a federated deployment topology with far less centralized administration.
Scope
This program calls for the design and implementation of a metadata registry and model repository that can be instantiated in multiple physical locations in a federated manner, and which supports the caBIG and BIG Health community needs for operational semantic metadata management and integration. The program must provide a number of capabilities and services:
- Design and implementation of the ISO 11179 Edition 3 standard in a metadata repository (MDR) architecture. The current caDSR implements Edition 2 of the standard, and the NCI has published a gap analysis between caDSR and ISO 11179 Edition 3 that will inform this effort.
- Implementation of an information Model Repository that supports caBIG/BIG Health information model registration and alignment of models with ISO 11179 metadata elements.
- Extension of the metadata standard and associated workflow tools to cleanly accommodate ISO 21090 data types. Some work has been done to represent ISO 21090 in caDSR, but a number of gaps remain.
- Service-oriented programming interfaces that can be registered and exposed in caGrid, and which are compliant with emerging W3C standards.
- Tools to support an updated and revised semantic integration workflow that enables decentralized and localized additions of models and data elements.
- Transformation services that can traverse and extract the various semantic metadata constructs, including information models, data elements, and associated Description Logic (DL)-based terminologies. The preferred or native format for the representations should be available through these services rather than forcing users to interpret one format through the lens of another. Moreover, higher level services that cater to use cases for run-time semantics of consumer systems will need to be defined and implemented.
- Migration path from the current caDSR and associated tooling to the new MDR and Model Repository architecture.
Users and Characteristics
An Actor models a type of role played by an entity that interacts with the subject (e.g., by exchanging signals and data), but which is external to the subject. Actors may represent roles played by human users, external hardware, or other subjects. Note that an actor does not necessarily represent a specific physical entity but merely a particular facet (i.e., "role") of some entity that is relevant to the specification of its associated use cases. Thus, a single physical instance may play the role of several different actors and, conversely, a given actor may be played by multiple different instances.
While UML 2 does not permit associations between Actors, this constraint is often violated in practice since the generalization/specialization relationship between actors is useful in modeling overlapping behaviours between actors. The actors below are represented as having a hierarchical relationship for ease of understanding; however, these relationships can easily be removed.
- Cancer Researcher: plans and performs activities related to discovery of new knowledge, drugs, and treatments in the field of oncology
- Clinical Researcher: works directly with patients and/or patient data while performing cancer research
- Basic Science Researcher: works with scientifically generated data while performing cancer research
- Protocol Designer: defines the methods used to perform cancer research
- Information Technologist: designs, develops, and manages the software and hardware necessary to perform cancer research
- Business Analyst: analyzes the business processes and describes the goals and activities of the user community
- Information Modeler: designs, defines, and describes the data that will be captured during cancer research activities
- Software Engineer: implements the software systems that are used to manage and perform cancer research
- Systems Architect: designs the software and hardware systems that are used to manage and perform cancer research
- System Administrator: manages the software and hardware systems that are used to perform cancer research
- Metadata Specialist: having a deep understanding of semantics and syntactic, assists Cancer Researchers and Information Technologists on modeling and managing metadata
- Forms Author: constructs data collection forms based on a library of common data elements
- Metadata Curator: works hands-on with Cancer Researchers and Information Technologists to model and manage their metadata
- Metadata Systems Specialist: manages centralized metadata systems and assists with the design of metadata systems
- Terminologist: a metadata expert that manages and maintains the semantic concepts that underlie information models
- Compatibility Reviewer: while the nature of "caBIG Compatibility" may change as need dictates, the role of the Compatibility Reviewer will likely continue to be a Metadata Specialist that reviews a variety of documents/artifacts to determine the level of interoperability that a system meets
- Patient: any person who receives medical attention, care, or treatment
- Subject: a Patient that is participating in Cancer Research
Related Documentation
The following is a list of documents that provide background material, requirements, and related topics:
- Con Ops Supplemental Page for Requirements Gathering: https://wiki.nci.nih.gov/display/VCDE/Supplemental+Page+for+Requirements+Gathering
- Semantic Requirements Forum: https://cabig-kc.nci.nih.gov/Vocab/forums/viewforum.php?f=34
- Con Ops Stakeholders: https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/SI_Conop_Stakeholders
- Con Ops Requirements: https://wiki.nci.nih.gov/display/VCDE/Requirements+Questionnaires
- Con Ops Use Cases: https://wiki.nci.nih.gov/display/VCDE/Use+Cases+for+Semantic+Requirements
Terms & Definitions
Term |
Definition |
---|---|
MUST |
This word means that the definition is an absolute requirement of the specification. |
MUST NOT |
This phrase means that the definition is an absolute prohibition of the specification. |
WILL |
This word means that the definition is an absolute future requirement of the specification. |
WILL NOT |
This phrase mean that the definition is an absolute future prohibition of the specification. |
SHOULD |
This word means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. |
SHOULD NOT |
This phrase means that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label. |
MAY |
This word means that a requirement is truly optional. The developer may choose to include the item based on the needs of their design. |
Assumptions and Dependencies
Usability
The user interface shall be designed for ease-of-use by the designated end-users, shall use terms common to the user's normal business environment, and shall require little to no additional training on the system. Drop down menus, Google-like searches, and tooltips should be used wherever appropriate.
Accessibility
The user interface should be accessible via a web browser.
General Assumptions
Following is a list of basic assumptions:
- Federated Discovery Services: Knowledge Repository Services will be distributable and discoverable in a federated manner
- Security Considerations: services and underlying data will be secured using the caBIG security architecture
- Subscriptions/Notifications: services requiring subscription and notification functionality will be enabled by the caBIG pub/sub architecture
- ISO 11179 Ed 3: the Knowledge Repository will comply with ISO 11179 Ed 3 specifications
- Ability to have non-administered items: the Knowledge Repository will provide for non-administered items in the ISO 11179 notion
- Duplication/identification: in compliance with the Federated Discovery Services assumption, reference of data, data duplication, and unique identification of data will be handled
Dependencies
TBD
Functional Requirements
Model Services
Some basic description of model services.
ID |
Requirement |
---|---|
MOS-1 |
Record |
MOS-2 |
New assertions |
MOS-3 |
Rules/results |
MOS-4 |
Queries/assertions |
MOS-10 |
Discover |
MOS-11 |
Federated query |
MOS-12 |
Find related metadata |
MOS-20 |
Compare |
MOS-21 |
Harmonize |
MOS-30 |
Merge |
MOS-40 |
Graphical/visualize |
MOS-50 |
Version |
MOS-60 |
Retire |
MOS-70 |
Update |
MOS-80 |
Delete/unregister |
MOS-90 |
Transform |
MOS-100 |
Validate |
MOS-110 |
Reuse |
MOS-111 |
Copy |
MOS-120 |
Extend |
MOS-121 |
Extend classes |
MOS-122 |
Add relationships |
MOS-123 |
Extend attribute value list |
MOS-130 |
Constrain |
MOS-131 |
Subset classes |
MOS-132 |
Subset/constrain attribute value lists |
MOS-140 |
Find related services |
MOS-150 |
Find related documentation |
Metadata Services
Some basic description of metadata services.
ID |
Requirement |
---|---|
MDS-1 |
Data Element |
MDS-2 |
Record (no rules) |
MDS-3 |
New assertions |
MDS-4 |
Update |
MDS-5 |
Version |
MDS-6 |
Validate business rules |
MDS-7 |
Retire |
MDS-8 |
Delete/rollback |
MDS-9 |
Discover reusable content |
MDS-10 |
Usage information |
MDS-11 |
Search/query |
MDS-12 |
Federated search/query |
MDS-13 |
Compare |
MDS-14 |
Create new from existing |
MDS-15 |
Discover related models |
MDS-16 |
Discover related metadata items |
MDS-17 |
Discover related services |
MDS-18 |
Discover related rules |
MDS-19 |
Discover related forms |
MDS-101 |
Value Domain |
MDS-102 |
Record (no rules) |
MDS-103 |
New assertions |
MDS-104 |
Update |
MDS-105 |
Version |
MDS-106 |
Validate business rules |
MDS-107 |
Retire |
MDS-108 |
Delete/rollback |
MDS-109 |
Discover reusable content |
MDS-110 |
Usage information |
MDS-111 |
Search/query |
MDS-112 |
Federated search/query |
MDS-113 |
Compare |
MDS-114 |
Create new from existing |
MDS-115 |
Discover related models |
MDS-116 |
Discover related metadata items |
MDS-117 |
Discover related services |
MDS-118 |
Discover related rules |
MDS-119 |
Discover related forms |
MDS-120 |
Create subset/constrain |
MDS-120 |
Extend (create new from existing) |
MDS-120 |
Semantic transformations (Explicitly based on mapping e.g. to the same Value Meaning) |
MDS-120 |
Syntactic transformations ( Source representation to Target representation) |
MDS-201 |
Data Element Concept |
MDS-202 |
Record (no rules) |
MDS-203 |
New assertions |
MDS-204 |
Update |
MDS-205 |
Version |
MDS-206 |
Validate business rules |
MDS-207 |
Retire |
MDS-208 |
Delete/rollback |
MDS-209 |
Discover reusable content |
MDS-210 |
Usage information |
MDS-211 |
Search/query |
MDS-212 |
Federated search/query |
MDS-213 |
Compare |
MDS-214 |
Create new from existing |
MDS-215 |
Discover related models |
MDS-216 |
Discover related metadata items |
MDS-217 |
Discover related services |
MDS-218 |
Discover related rules |
MDS-219 |
Discover related forms |
MDS-220 |
Discover related data elements |
MDS-222 |
Discover related value sets |
MDS-223 |
Create data element from existing data element concept and value domain |
MDS-224 |
Create new data element concept from existing data element concept |
Non-functional Requirements
Performance
Performance refers to the qualitative or quantitative measure of how well a system reacts in a user workflow. This can be measured in time from a user or system perspective, as well as the amount of resources (CPU, memory, etc.) that software must consume to complete a task.
ID |
Requirement |
---|---|
PE-1 |
Where not otherwise specified, web pages should be completely returned within seconds of request. |
Auditing, Logging, and Provenance
Auditing, logging, and provenance is the process of recording events in an automated and/or manual way within a certain scope in order to provide an audit trail that can be used to understand the activity of the system and/or to diagnose problems.
ID |
Requirement |
---|---|
ALP-1 |
The application will address Title 21 Code of Federal Regulations (21 CFR Part 11) Electronic Records where appropriate and reasonable. |
ALP-10 |
The system must audit each and every user action that results in database access (read or write). Examples include: add/edit study or participant data, user login, query etc.
|
ALP-20 |
Auditing information must be accessible in a timely manner to system administrators. |
ALP-30 |
Auditing features must at least be available through standard database logging/auditing. |
ALP-40 |
Logging must be implemented in all the architectural layers - presentation, business logic and data access layers |
Fault Handling
Fault handling is a mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of software or user execution.
ID |
Requirement |
---|---|
FH-1 |
Any runtime exceptions or errors must be reported to the user in a graphical window containing the probable cause of the problem and how to rectify that. |
FH-10 |
The exceptions and errors shall be divided into two groups:
|
Quality of Service
Quality of service is the ability to provide sufficient uptime and availability of software to guarantee a certain level of performance to access and data flow.
ID |
Requirement |
---|---|
QS-1 |
The system must be adequately validated during the system development lifecycle. |
QS-10 |
The system must provide the functionality to generate and manage accurate data records during the development processes. |
QS-20 |
The system development lifecycle must include validity checks for all data fields. |
Usability
In design, Usability is the study of the ease with which people can employ a particular tool or other human-made object in order to achieve a particular goal.
ID |
Requirement |
---|---|
US-1 |
An intuitive user friendly graphical user interface must be developed. |
PE-10 |
Web page requests must resolve in a timely manner. Where not otherwise specified, this is on the order of seconds. |
PE-20 |
The application will address section 508 of the Rehabilitation Act of 1973 where appropriate and reasonable. |
Security
Security is the protecting of information and information systems from unauthorized access, use, disclosure, disruption, modification or destruction.
ID |
Requirement |
---|---|
SE-1 |
The system must limit access to authorized individuals. |
SE-10 |
Electronic Signatures should meet necessary requirements as described in 21 CFR part 11. |
SE-20 |
The application will address section 508 of the Rehabilitation Act of 1973 where appropriate and reasonable. |
SE-30 |
System developers must adhere to the caBIG™ Data Sharing & Intellectual Capital Policy and Procedures |
Portability
Portability is the software codebase feature to be able to reuse the existing code instead of creating new code when moving software from an environment to another. The prerequirement for portability is the generalized abstraction between the application logic and system interfaces.
ID |
Requirement |
---|---|
PO-1 |
Operating system native libraries should not be used. |
PO-10 |
All the paths for the local file system must not be hard coded. Example C:\myDir etc. |