Metadata Needs and Support
The caDSR supports a broad community of users both inside and outside of NCI that have requirements to ensure the longevity and aggregability of biomedical research data. Participants have included NCI and its partners in academic institutions (including NCI Designated Cancer Centers, SPOREs and Cooperative Groups), other NIH institutes (including NICHD, NHLBI and NIDCR), other federal agencies (in particular the FDA), pharmaceutical companies, standards development organizations (e.g. CDISC) and a range of international biomedical organizations. For more information see the caDSR Collaborations and Use.
Requirements from researchers and/or their supporting informatics groups drive the creation of metadata in the caDSR, and not vice versa. Metadata content development usually starts with a request for assistance by a researcher planning a clinical or research data collection. Metadata curators work with the user and EVS to identify an appropriate vocabulary while identifying a mix of new and existing CDE content to support the scientific requirement. Curators always attempt to reuse existing metadata (where that content supports the scientific requirement) as a way to help scientists ensure the compatibility of their data with other data collected across the enterprise..
Because so many groups need metadata content, and to ensure that the recording of metadata does not become a bottleneck in the research process, all caDSR roles are open to individuals in the community upon completion of appropriate training. Training is role based and includes courses on infrastructure, methodologies, and tool usage. Most of the training is managed through self-paced modules, while the actual tool use modules are done through web sessions. More information can be found on the caCORE training wiki.
CBIIT’s management of metadata began as part of an effort to support CTEP’s reporting for breast cancer trials, and from a need to develop and disseminate standards that would ensure consistency and accuracy in reporting across the Cooperative Groups. This led to the establishment of a centralized resource and associated web-based tools for clearly documenting and sharing human- and machine-readable data descriptions. The need to maintain and share data about data, or metadata, became the basis for the NCI’s repository of CDEs, metadata and data standards, what is now known as the caDSR. A CDE Steering Committee was formed to define what kind of metadata was needed for the repository. Driven by the needs from community to create, share, and manage CDEs, a set of metadata attributes was established, which included name, definition, valid values, and workflow status. Consultation with appropriate experts identified ISO 11179, an international standard for data-element registries, as meeting the needs identified by the CDE Steering committee. As time went on, more groups wanted to record their data elements and share them via the caDSR, so additional features were added, including extensions of ISO 11179 to enable storage templates for CRFs that use CDEs.
As more groups recorded their data elements in caDSR, the difficulty in creating high-quality names and definitions for data elements became recognized as a challenge for data-element curators, as well as a best practice for clarifying the meaning of the data. Consequently, ISO 11179-5 Naming Principles were used to establish naming conventions that could be applied across groups. The same naming conventions are used in the National Information Exchange Model (NIEM). Since NCI had the EVS terminology services available, this was seen as a reasonable means by which to aid this task by giving curators access to well-formed and NCI-preferred names for concepts. At present, curators find concepts in EVS based on synonym searches, and the preferred-term name and definition streamline their task. The challenge of ensuring that duplicate CDEs were not created led to leveraging the parts of the ISO 11179 metamodel along with a preference for the use of NCIt concepts. NCIt is a specialized cancer terminology that includes additional knowledge from the literature about these concepts and is modeled as an OWL ontology with relationships to other concepts. Therefore, a link to NCIt concepts from caDSR concepts can help test similarity between CDEs. It can also provide access to researchers for exploring the meaning of a given CDE in greater depth.
Although this activity began as a means to support the cooperative groups, the caDSR now supports a much wider audience. This includes clinical trials run by the NCI intramural program, the Center for Cancer Research (CCR) and Division of Cancer Prevention (DCP); Specialized programs of Research Excellence (SPOREs), Cancer Centers and other academic medical centers, other NIH institutes and centers as well as standards groups such as CDISC and a variety of international partners.
A note on the term “Common Data Element”. While originally intended to mean a data element that was reused, the term has come to mean any description of a variable and its valid values. In this document, we will use the standard NCI version of this definition, which is to say a variable description (including valid values) described in the caDSR using its implementation of the ISO 11179 variable, regardless of whether the element in question has been used more than once.
About the caDSR Wiki
This is the wiki home page for caDSR. You may edit pages if you are working on them with the authors. You are welcome to leave comments. This wiki includes the following.
Example: "NAACCR 11.1." PS & CC (NCI Population Sciences & Cancer Control), Classifications, Division of Population Cancer Control and Population Sciences, NCI caDSR CDE Browser. <https://cdebrowser.nci.nih.gov> National Cancer Institute, Center for Biomedical Informatics and Information Technology, 01 Oct. 2010. Web. 17 Jan. 2013.