NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added a Page Tree macro. Revised the heading and section intro.
Section
Column
width30%

caDSR LogoImage AddedImage Removed

panel
Panel
titleContents of this Page
Table of Contents
maxLevelminLevel2

caDSR Overview

The Cancer Data Standards Registry (caDSR) consists of both a database and a toolset used to create, edit, and deploy data elements for metadata consumers.

The caDSR is a metadata registry based upon
Column
width40%
titleProject and Product Information
Panel
titleIn Depth on the NCICB Site
caDSR Links and Information
Panel
titleDocumentation
Panel
titleFAQs

caCORE FAQs Index Page

Column
Related Links
Column

Include Page
wikicontent:Included Page of CBIIT and NCIP Links
wikicontent:Included Page of CBIIT and NCIP Links


Metadata Needs and Support

The caDSR (Cancer Data Standards Registry and Repository) is a metadata repository based on the ISO/IEC 11179 Metadata Registry standard.

...

The idea of standardizing and registering metadata addresses a significant problem in biomedical data management: the wide variety of ways that similar data are collected and described. Metadata is defined as "data about data" or more simply, the description of a piece of information.

The fundamental unit of data in the ISO/IEC 11179 standard is called a data element. According to the ISO metadata standard, any item represented by a data element has two distinct parts: an explicit definition that is independent of any particular implementation, and an explicit description of implementation-specific details regarding how the item is represented in computer storage. Capturing these two aspects makes it possible to compare data elements that describe the same thing across different applications, and to understand what data transformation may be necessary in order to make the data comparable.

caCORE-like systems follow an object-oriented paradigm where classes of data are described using UML models. A UML model, serialized into XMI, can then be used to transform the UML model objects into caDSR registered items. Once registered, the items in caDSR can be re-used in other systems' models. If different systems are using the same registered terms (metadata) for the data in their models, those systems can more easily communicate and share information.

The caDSR itself is a database that contains Administered Items. As defined in the ISO/IEC 11179 standard, an Administered Item is an item for which administrative information must be recorded. The item may be a Data Element or one of the associated components that comprise a Data Element. caDSR administered items are supported by the use of externally defined terminologies and controlled vocabularies, such as the NCI Thesaurus.

To support the database, the caDSR also has a suite of tools for creating, sharing, and deploying data elements (also called common data elements or CDEs). This suite of tools includes a public CDE Browser that enables you to search for data elements, create forms, and download CDEs, and a UML Model Browser viewer that makes it easier to find CDEs that are registered as part of UML modeling projects. All of the caDSR tools and interfaces connect to the same central database. Links to further information regarding the caDSR toolset appear elsewhere on this page.

By complying with the ISO/IEC 11179 standard, caDSR provides, among other things, a semantic bridge between the data elements contained in registered data objects and standard vocabularies and ontologies. caDSR was originally designed to support the development and deployment of data elements as metadata descriptors for NCI-sponsored research, but now supports an ever-widening group of users and metadata consumers in caBIG™.

caDSR has also been defined as the "cancer Data Standards Repository"; however, the caDSR is a registry rather than a repository. Simply defined, a registry contains references to things, whereas a repository holds things. For a more detailed distinction, see http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1103660,00.html.

caDSR Database and Implementation

The caDSR is based on an Oracle database. All of the various tools and interfaces connect to the same central database.

The software applications that access caDSR content are based on open source standards and are freely available for use by other government agencies and for download and use by interested parties.

caDSR follows the ISO/IEC 11179 Information Technologies: Metadata Registries (MDR) standard to harmonize, register and integrate user-defined UML information models with existing and new caDSR content and to represent the CDEs in the database. This standard is somewhat complex, but it offers a richly expressive model for metadata that does a good job of supporting the variations needed for biomedical applications. If you are interested in working with the caDSR, please take some time to review the background material on the way we have implemented the ISO/IEC 11179 standard.

In addition to implementing the ISO/IEC 11179 model, we have added a few additional types of content to the caDSR. The two most important additional items are Forms and Protocols.

A Form is simply a collection of CDEs, and a Protocol is a collection of Forms. For clinical trials applications, the Forms correspond to Case Report Forms (CRFs), and Protocols correspond to a clinical trial protocol.

Template forms are generic forms that can be used as the basis for creating the actual forms used in a Protocol. Templates are stored both as a collection of CDEs that comprise the form, and an MS Word or PDF file that shows the CDEs laid out.

caDSR Tools

The caDSR and semantic tools include the following.

...

Tool name

...

Wiki home page

...

Production tool

...

Administration Tool

...

 

...

http://cadsradmin.nci.nih.gov/Image Removed

(Login required)

...

Sentinel Tool

...

 

...

http://cadsrsentinel.nci.nih.gov/cadsrsentinel/do/logonImage Removed

(Login required)

...

CDE Browser

...

 

...

Form Builder

...

 

...

Freestyle Search

...

 

http://freestyle.nci.nih.gov/Image Removed

...

UML Model Browser

...

 

...

CDE Curation Tool

...

 

...

http://cdecurate.nci.nih.gov/cdecurate/Image Removed

(Login required)

It supports a broad community of users both inside and outside of NCI that have requirements to ensure the longevity and consistency of biomedical research data by registering metadata standards in caDSR.

What is a metadata standard? A metadata standard is a high level document which establishes a common way of structuring and understanding data, and includes principles and implementation issues for utilizing the standard. There are many metadata standards purposed for specific disciplines.

Content owners and end users include NCI and its partners in clinical trials, academic institutions (including NCI Designated Cancer Centers, SPOREs and NCTN/ETCTN), other NIH institutes (including NICHD, NHLBI, NCATS and NIDCR), other federal agencies (in particular the FDA), pharmaceutical companies, standards development organizations (e.g. CDISC) and a range of international biomedical organizations. 

Requirements from researchers and/or their supporting informatics groups drive the creation of metadata in the caDSR. Metadata content development usually starts with a request for assistance by a researcher planning clinical or research data collection. Metadata curators work with the user and EVS to identify appropriate vocabulary while identifying a mix of new and existing CDE content to support the scientific requirement. Curators always attempt to reuse existing metadata (where that content supports the scientific requirement) as a way to help scientists ensure the compatibility of their data with other data collected across the enterprise, and sometimes researchers request that their content be harmonized with specific existing projects.

The word cloud that follows illustrates the broad variety of collections of data elements that are reposited in the caDSR for various communities and types of studies. 

caDSR word cloud as describedImage Added

About caDSR

CBIIT’s management of metadata began as part of an effort to support CTEP’s reporting for breast cancer trials, and from a need to develop and disseminate standards that would ensure consistency and accuracy in reporting across the NCI Clinical Trial Network (NCTN/ETCTN) and Lead Protocol Organizations (LPOs). This led to the establishment of a centralized resource and associated web-based tools for creating, clearly documenting, and sharing human- and machine-readable data descriptions. The need to maintain and share data about data, or metadata, became the basis for the NCI’s repository of CDEs, metadata, and data standards, what is now known as the caDSR. A CDE Steering Committee was formed to define what kind of metadata was needed for the repository. Driven by the needs from community to create, share, and manage CDEs over time, a set of metadata attributes was established, which included attributes such as human friendly name(s), text definition(s), valid values, unique identifiers, and workflow status. Consultation with appropriate experts identified ISO 11179, an international standard for metadata registries, as meeting the needs identified by the CDE Steering committee. As time went on, more groups wanted to record their data elements and share them via the caDSR, so additional features were added, including extensions of ISO 11179 to enable storage of metadata describing Case Report Forms (CRFs) that use CDE metadata as the basis for questions on the CRFs.

As more groups recorded their data elements in caDSR, the difficulty in creating high-quality names and definitions for data elements became recognized as a best practice for clarifying the meaning of the data, but also a challenge for data-element curators.  Consequently, ISO 11179-5 Naming Principles were used to establish naming conventions that could be applied across groups. The same naming conventions are used in the National Information Exchange Model (NIEM). Since NCI had the EVS terminology services available, this was seen as a reasonable means by which to aid this task by giving curators access to well-formed and NCI-preferred names for concepts that form the name of the CDEs. At present, curators find concepts in EVS based on synonym or concept id searches, and the EVS preferred-term name and definition streamline their task.  The challenge of ensuring that duplicate CDEs were not created led to leveraging the parts of the ISO 11179 metamodel along with a preference for the use of NCIt concepts so the system could semi-automatically recognize and promote reuse of existing content. NCIt is a specialized cancer terminology that includes additional knowledge from the literature about these concepts and is modeled as an OWL ontology were concepts have various types of relationships and mappings defined to other concepts. It also includes mappings to the UMLS where they exist. Therefore, a link to NCIt concepts from caDSR content can help test similarity between CDEs.  Links to NCIt concepts can also provide access to researchers for exploring the meaning of a data that conforms to a given CDE in greater depth.

Although this activity began as a means to support CTEPs trials networks, the caDSR now supports a much wider audience. This includes clinical trials run by the NCI intramural program, the Center for Cancer Research (CCR), and Division of Cancer Prevention (DCP); Specialized programs of Research Excellence (SPOREs), Cancer Centers and other academic medical centers, other NIH institutes and centers as well as standards groups such as CDISC and a variety of international partners.

A note on the term “Common Data Element” (CDE). While originally intended to mean a data element that was reused across groups, the term has come to mean any description of a variable and its valid values. In this document, we will use the standard NCI version of this definition, which is to say a variable description (including valid values) described in the caDSR using its implementation of the ISO 11179 variable, regardless of whether the element in question has been used more than once.

This Wiki Space

This is the home page for the caDSR wiki space. You may edit pages if you are working on them with the authors. You are welcome to leave comments. This wiki space includes the following pages:

Page Tree
expandDepth1
root@self

Documentation

For a complete list of current caDSR Tool user documentation, application guides, release notes, and FAQs, see the caDSR Documentation wiki page.

List/Forum

Email address or URL

Description

caDSR Users List

CADSR_USERS@LIST.NIH.GOV

Archive for content users such as Curators

caDSR Developers List

CADSR_SOFTWARE_DEVELOPERS@LIST.NIH.GOV

Archive for developers using caDSR Metadata, such as UML Model owners (subscription required)

caDSR Tools Download List

CADSR_TOOLS_DOWNLOAD@LIST.NIH.GOV

For adopters

All NIH List Servers

No email address

Index of all NIH mail lists

No list/forum

NCIAppSupport@mail.nih.gov

NCI Application Support

How to Cite caDSR

To cite the NCI Semantic Infrastructure, use the following reference.

Komatsoulis, G.A., Warzel, D.B., Hartel, F.W., Shanbhag, K, Chilukuri, R, Fragoso, G., de Coronado, S, Reeves, D.M., Hadfield, J.B., Ludet, C., and P.A. Covitz (2007) "caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability." Journal of Biomed Informatics. 2008 February; 41(1): 106--123. Published online 2007 April 2. doi: 10.1016/j.jbi.2007.03.009.

To cite the caDSR OneData general software, use the following reference.  

NCI caDSR OneData. <https://cadsr.cancer.gov/onedata/Home.jsp> National Cancer Institute, Center for Biomedical Informatics and Information Technology, 01 Oct. 2010. Web. 17 Jan. 2013.

To cite a specific collection of forms in caDSR, or to cite caDSR CDEs in a particular collection or group, use a reference like the following. 

"[Protocol or Classification Scheme Name]." [Context name], [Node name], NCI caDSR OneData. <https://cadsr.cancer.gov/onedata/Home.jsp> National Cancer Institute, Center for Biomedical Informatics and Information Technology, 01 Oct. 2010. Web. 17 Jan. 2013. 

Example: " CALGB: 10603 Treatment Form ." CTEP, Protocol Forms, NCI caDSR OneData. <https://cadsr.cancer.gov/onedata/Home.jsp> National Cancer Institute, Center for Biomedical Informatics and Information Technology, 01 Oct. 2010. Web. 17 Jan. 2013.

Example: "NAACCR 11.1." PS & CC (NCI Population Sciences & Cancer Control), Classifications, Division of Population Cancer Control and Population Sciences, NCI caDSR OneData. <https://cadsr.cancer.gov/onedata/Home.jsp> National Cancer Institute, Center for Biomedical Informatics and Information Technology, 01 Oct. 2010. Web. 17 Jan. 2013.

...

Semantic Integration Workbench (SIW)

...

 

http://cadsrsiw.nci.nih.gov/Image Removed

...

UML Loader

...

 

...

Several tools perform various tasks in creating, managing, and deploying CDEs. There are also tools that support reviewing externally generated forms to see if they are CDE-compliant, that is, are comprised of approved CDEs found in the caDSR. The public CDE Browser lets you search for data elements, create forms and download CDEs. The UML Model Browser is specifically designed for browsing registered UML information models. A CDE Tool Functionality Matrix is available to help users understand the differences among the tools. Online help is available, but you will find that using the tools is easier if you have first read through the description of the caDSR implementation of the ISO/IEC 11179 Standard.

About the caDSR Database and Tools Wiki

This is the wiki home page for the caDSR Database and Tools. The child pages have detailed information about specific tools. You may leave a comment on this page. Unless you are working on the page at the request of Larry Hebel, please do not edit it.

Your will also find working documents on this wiki and the related caDSR wikis.

...

titlecaDSR Wikis

...