NIH | National Cancer Institute | NCI Wiki  

Project Description

The CDE Metadata Curation, Maintenance and Harmonization effort supports curation of the Common Data Elements (CDEs) with a focus on recording well-formed metadata describing data elements that can be used by the broad health community. CDEs are recorded in the cancer Data Standards Registry (caDSR). After extensive vetting by workgroups made up of subject matter experts and user community members, a select group of CDEs for common subject areas have been elevated to Data Standards.  Demography, Radiation Therapy, and Adverse Events are examples of subject areas where Data Standards have been developed. Data Standards are the perferred way of recording and reporting specific information across the NCI user community.  There are on-going activities to identify CDEs that are similar to the Data Standards and reduce redunancy by harmonization of the CDE collection.

caDSR Baseline Description

The caDSR not only contains NCI standard CDEs and standard Case Report Forms (CRFs) but also has CDE metadata for UML models, data dictionaries, data collection tools (such as ISA-Tab-Nano), cancer center clinical trials, government organizations outside of NCI (such as the National Institute of Dental and Craniofacial Research), and standard development organizations (such as CDISC).  The current caDSR content fits the definition of Big Data - an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.  Big Data can be described by Volume, Velocity of change, Variety of content, and Veracity of the data.  The following is a list of characteristics that could describe the current collection of CDEs and related metadata found in the caDSR:

•Nearly 50K Released/Draft CDEs
•Approximately 120 to 130 Curators registered as metadata creators and or form builders
•33 owning contexts or owners responsible for specific content for their community
•More than 6400 Forms or Collections of CDEs
•CDEs from 145 UML Models
•Templates for Standards and common groupings
•Hundreds of classifications or sets of content linked in meaningful ways
•Huge community of users who browse and download CDEs, collections, and code lists
•Ongoing Harmonization and Quality Reviews of content
•Ongoing Modification, Reuse or Creation of content

caDSR Content Maintenance and Harmonization Activities

The six activities below have been identified as areas where future maintenance and harmonization tasks will be focused.  The user community is asked to participate in these tasks so that enhancements and changes will be of value to everyone using the caDSR metadata.  If you would like to contribute to any of these, please contact Brenda Duggan ( and indicate which activity you would like to join.  

  1. Review the current state of caDSR content using Best Practices, Business Rules, and Training guidance in order to discover usage patterns and gather data.
  2. Identify a set of CDE changes/enhancements to increase usefulness in applications.
  3. Reduce the set of redundant caDSR content through harmonization and retirement activities.
  4. Create a governance plan for enhanced support of the community creating and maintaining content in caDSR.
  5. Provide clear public instruction on how to use NCI standards to create CRFs in applications.
  6. Participate in an exercise to create a small set of CDEs for the purpose of identifying the range of quality that is assigned to content in caDSR. 



  • No labels