This document provides information about the National Cancer Institute Common Data Elements (CDEs) developed with the Cancer Therapy Evaluation Program (CTEP). For questions concerning CTEP data in the caDSR, please contact the NCI CTEP CDE Compliance Review Team.
Introduction to CDEs
Common Data Elements (CDEs) are standardized terms for the collection and exchange of data. CDEs are metadata; they describe the type of data being collected, not the data itself. A basic example of metadata is the question presented on a form, "Patient Name," whereas an example of data would be "Jane Smith."
Overview of the CDE Project
The National Cancer Institute (NCI) developed the CDE initiative to address the need for consistent cancer research terminology. To date, the Cancer Therapy Evaluation Program (CTEP) has focused its CDE efforts on metadata used in data collection and reporting for phase 3 clinical trials, by standardizing terminology for questions and values on case report forms (CRFs). The goals of this project are the following:
- to identify discrete, defined items for data collection
- to promote consistent data collection in the field
- to eliminate unneeded or redundant data collection
- to promote consistent reporting and analysis
- to reduce the possibility of error related to data translation and transmission
- to facilitate data sharing
Developing CDEs
To build its collection of CDEs, CTEP has established a collaborative process to engage members of expert committees to identify and define disease-specific terminology. Members of these committees include representatives from NCI and the Clinical Trials Cooperative Groups who are involved in study design, implementation, data collection, and analysis.
The CDE disease committees consider terms that are being used in their field of study to determine whether there is a general need for each, as well as what other terms may be needed. The committee then develops consensus to standardize the language for the question and any associated values. Where possible, committees base the language and values on established standards, such as the Commission on Cancer, the American Joint Committee on Cancer, the World Health Organization, and NCI resources.
As a result of the committee meetings, CDEs are identified, defined, refined, and classified. Template CRFs, which provide a graphic representation of the "core" CDEs for each disease, are also created during these meetings. The designation of a CDE as "core" for a disease indicates the committee's determination that it is likely to be used for most phase 3 clinical trials. Other CDEs, for which a less frequent need is anticipated, are marked as "non-core" for the disease.
Collections of disease-specific CDEs have been developed and released for public use for bladder, breast, colorectal, gynecologic, lung, prostate, and upper gastrointestinal cancers, as well as for melanoma and leukemia. In addition, expert committees were also established in collaboration with the Special Programs of Research Excellence (SPORES) program to develop CDEs related to pathology and specimen banking; these collections of CDEs have also been released and are available for use by the oncology community.
Disease committees were convened in 2002 to develop CDEs for brain and head and neck cancers, as well as for lymphoma, myeloma, sarcoma with release in Fall 2003. An effort is also in progress to develop CDEs specific to pediatric clinical trials and a plan to expand the effort to create CDEs for phase 1 and 2 clinical trials.
Using CDEs on CRFs
Once CDEs for a disease are released by CTEP, these CDEs must be implemented on CRFs for all phase 3 studies of that disease submitted to CTEP. A review process has been established to compare the CRF questions and values for a submitted protocol with existing CDEs. The result of this comparison is a series of reports that indicates whether CRF questions and values match the standard language of existing CDEs.
If the language of a question or its corresponding values do not match a CDE with a related definition, it is recommended that the CRF be revised to replace the CRF question and values with the existing CDE. If there is a match for neither language or meaning, a new term is developed for temporary use. This CDE may be used on the CRFs for the submitted protocol but will not be released for general use until it has been reviewed and approved by the appropriate CDE disease committee.
NCI CDEs are stored in the caDSR, a robust metatdata registry developed and maintained by the NCI Center for Bioinformatics and Information Technology (CBIIT), and storing important attributes that are useful both to those constructing CRFs and to those developing information systems. The CDE Browser is the primary user interface to search, browse, and export CDEs from the caDSR and offers information regarding the development and use of CDEs for the oncology community.
Application of ISO 11179 to CDEs
Understanding ISO 11179
The framework of the caDSR is based on ISO 11179: Information Technology - Metatdata Registries . Just as the goal of CDEs is to facilitate the sharing of data through common language, the goal of ISO 11179 is to facilitate the sharing of metadata though a common data model. As such, this standard specifies the data (that is, attributes and associated administered components) that need to be stored for each CDE and how the data should be stored. CBIIT has provided documentation on the caDSR wiki about how it has implemented this standard.
An ISO 11179 database is organized into Contexts. A Context may represent a business unit or some other content division. All administered components within the database are associated with a Context, either that in which they originated or are used. In the caDSR, Contexts represent NCI programs and divisions. All CDEs that were created by CTEP are associated with the "CTEP" Context. The caDSR also allows for Contexts to indicate their endorsement of a CDE created by another program or division. Such a designation indicates to users that the CDE is approved for use in this other Context as well.
ISO 11179 Terminology
An administered component is an item about which administrative data is collected. Four types of administered components are integral to an ISO 11179 database. Additional types of administered components also exist within the ISO 11179 data model and the caDSR.
The most familiar of these four is the Data Element. A Data Element is the basic unit of data that is being collected in an ISO 11179 database, a metadata descriptor. It represents a semantic concept and indicates the specific type of data to be collected. Data Elements are named and defined in a standardized manner according to Context-specific naming conventions. Within the "CTEP" Context, a Data Element can be thought of as a question on a CRF.
A Data Element Concept is similar in nature to a Data Element. It represents a semantic concept but is not tied to a specific data type. A Data Element Concept may, therefore, be associated with several Data Elements representing the same semantic concept. For example, the Data Elements "Patient Residence Country Code" and "Patient Residence Country Name" both represent the same semantic concept of "Patient Residence Country."
A Value Domain describes in detail the type of data to be collected, independent of the semantic concept. Attributes of a Value Domain include data type, maximum and minimum field lengths, high and low values, unit of measure, and number of decimal places. A Value Domain may also include an enumerated list of specific Valid Values. Within the "CTEP" Context, a Value Domain describes the type of data that is being collected by a question on a CRF. If there is an enumerated list of Valid Values, it is those Valid Values that may appear on the CRF as potential answers.
A Conceptual Domain is a collection or description of related Value Meanings. A Value Meaning is the essence of the data that is being collected, rather than the actual data itself. For example, a response to the question, "Patient Name" might be "Jane Smith". "Jane Smith" is actual data, whereas the essence of the data is "the name of a person." Another example is the question, "Country of Residence," which includes as responses the two-letter code for each country in the world. The codes would be Valid Values in a Value Domain, but the Value Meanings would be the list of countries in the world.
Basic Relationships of Administered Components
ISO 11179 specifies that each Data Element is associated with one and only one Data Element Concept and with one and only one Value Domain. In this way, the combination of a Data Element Concept and Value Domain define a Data Element.
Each Data Element Concept and each Value Domain are associated with one and only one Conceptual Domain. For a given element, the Data Element Concept and Value Domain do not have to be associated with the same Conceptual Domain, although they might be.
Naming Data Elements
ISO 11179 requires that Data Elements be named in a consistent manner, allowing for easier searching and retrieval of data. CTEP has developed naming conventions for Data Elements associated with the "CTEP" Context of the caDSR. Please refer to CTEP's Naming Conventions for a full explanation of these rules and guidelines.
Most names are composed of one or some combination of the following types of terms, defined by ISO 11179 as the basic components of Data Elements and other administered components.
Component of Data Element Names | Definition | Example |
---|---|---|
Object Class term | thing about which data is being collected; within the "CTEP" Context, typically represents an object or activity | Treatment |
Property term | a characteristic or possession of the Object Class | Report Period |
Representation term | specifies the form of the data that is being collected | Date |
Qualifier | a modifier that describes any other term, similar to an adjective; within the "CTEP" Context, qualifiers should be used sparingly because of limited name lengths | End |
Data Element Long Name
A Data Element Long Name is composed of one Object Class term, one Property term, and one Representation term. A maximum of three Qualifiers (optional), one modifying each of the other terms, may be added to the name if needed to further clarify the name or to make it distinct from other Data Element Long Names. Data Element Long Names must be unique and distinct from one another.
The Object Class term shall occupy the first position in the name and the Property term shall occupy the second position. A Qualifier shall directly precede the term it modifies. The Representation term shall occupy the last position in the name.
Words or terms in the name are to be separated by spaces. No punctuation or abbreviations are to be used. Each word should have the initial letter in uppercase, with all others in lower-case, unless the word is commonly written otherwise.
The total length of the name, including spaces, is restricted to a maximum of 120 characters.
Example: Treatment Report Period End Date
Data Element Short Name
A Data Element Short Name is an abbreviated form of the Data Element Long Name; it is, therefore, composed of one Object Class term, one Property term, and one Representation term, and up to a maximum of three Qualifiers, one modifying each of the other terms. The Data Element Long Name is to be determined first, and then abbreviated as described below. Data Element Preferred Names must be unique and distinct from one another.
The Object Class term shall occupy the first position in the name and the Property term shall occupy the second position. A Qualifier shall directly precede the term it modifies. The Representation term shall occupy the last position in the name.
Words or terms in the name are to be separated by underscores. No punctuation is to be used. The name should be written in uppercase.
The total length of the name, including underscores, is restricted to a maximum of 20 characters. All words are to be abbreviated if a standard abbreviation has been determined by CTEP. If after abbreviations are implemented, name length exceeds 20 characters, unabbreviated terms will be truncated to 3 letters as needed in the order of Qualifiers, Property term, Representation term, Object Class term.
Example: TX_REPPD_END_DT
Best Practice Recommendations have approved system-generated short name (preferred name) in curation for CTEP CDEs. The following recommendation was approved on Content Meeting on 12/14/09:
- Curators should retain system-generated short names created by the curation tool
- If user-entered short names are required, follow the guidelines imposed by the consuming application and register an alternate name in caDSR with the appropriate Alt Name Type
- For items that are moved into OC systems alternate names will need to be created
- Standards based short name (DICOM, HL7) should follow best practice
Representation Terms
Below is the current list of Representation terms used by CTEP in naming Data Elements.
Representation term | Definition |
---|---|
Date | calendar date |
Time | time of day |
Date/Time | combined date and time |
Interval | length of time between specified events |
Duration | length of occurrence of an event (number/time period) |
Frequency | how often an event occurs (e.g., daily, weekly) |
Age-Months | age in months |
Age-Years | age in years |
Number | assigned identifier (e.g., patient number, specimen number, telephone number, cycle number, treatment arm number) |
Count | quantity, number of items |
Dose | amount of therapy administered or prescribed to or taken by a patient |
Measurement | dimensions or capacity of an object, or resulting calculation (e.g., diameter, area, volume) |
Value | numeric laboratory measurement |
LLN | lower limit normal |
ULN | upper limit normal |
UOM | units of measure |
Rate | relationship between two numbers (e.g., blood pressure rate) |
Average | mathematical mean |
Grade | numerical scale to describe extent of something, assigned according to standard criteria |
Stage | disease staging, assigned according to standard criteria |
Score | number assigned from standardized test or procedure |
Amount | numeric value of otherwise unspecified type |
Ind | response to a yes/no question; includes yes, no, unknown, not available, not assessed, etc. |
Ind-2 | response to a yes/no question; includes yes, no |
Ind-3 | response to a yes/no question; includes yes, no, unknown |
Name | designation for a person or object |
Code | values that substitute for others |
e-mail address | |
Procedure | enumerated list of treatment procedures |
Site | anatomic site |
Reason | explanatory action |
Source | source of information provided |
Category | classification |
Scale | spectrum of values |
Status | response to a binary question (e.g., positive/negative, left/right) |
Type | list of values of otherwise unspecified type |
Specify | free-text description where needed value was not available in associated/related question (i.e., "Other, specify") |
Text | free-text description of procedure or event |
Character Set and Symbols
The caDSR and CTEP's naming conventions make use of the standard ASCII character set in all administered component names, Valid Values, and Value Meanings. This character set includes all letters in the Latin alphabet (A through Z), in both lower- an upper-case and numbers 0 through 9. The following additional characters are included:
<space> ` ~ ! @ # $ % ^ & * ( ) - _ = + \ | [ ] { } ; : ' " , < . > / ?
All Long Names and Preferred Names may only begin with a letter. Valid Values and Value Meanings may begin with symbols unless restricted by the software being used.
Conventions have been established by CTEP for other symbols and formatting that may be needed for Document Text entries or Valid Values.
- Superscript will be indicated by the symbol ^ (e.g., 10^3)
- Subscript will be indicated by the symbol \ (e.g., A\2)
- Symbol for degrees will be written out as "degrees"
- Symbol for plus or minus will be indicated by +/-
- Symbol for check mark will be written out as "check"
- Symbol for less than or equal to will be indicated by <=
- Symbol for greater than or equal to will be indicated by >=
Using the CDE Browser
Overview of the CDE Browser
The CDE Browser (http://cdebrowser.nci.nih.gov) is the primary user interface for the caDSR. It is a public web site that has a real-time connection to the caDSR, so users of the CDE Browser see updates and edits immediately as they occur.
Using the CDE Browser, users can search, browse, and export Data Elements from the caDSR. The CDE Browser also provides users access to view CDE collections and template CRFs developed by the CTEP CDE disease committees and others.
Definition of Terms
Fields that appear in the CDE Browser are defined below.
Field in CDE Browser | Definition |
---|---|
Data Element | the basic unit of data that is being collected in an ISO 11179 database, a metadata descriptor. Within the "CTEP" Context, a Data Element can be thought of a question on a CRF, may also be referred to as a CDE. |
CDE ID | a unique seven-digit identifier assigned to each Data Element, may also be referred to as the Data Element's Public ID; each Data Element has one and only one CDE ID. |
Preferred Name | the field that stores the short, 20-30 character name ("computer" name) of a Data Element; other administered components (i.e., Value Domains, Data Element Concepts, and Conceptual Domains) also have Preferred Names. |
Long Name | the field that stores the primary name of a Data Element; other administered components (i.e., Value Domains, Data Element Concepts, and Conceptual Domains) also have Long Names. |
Document Text | the field in which additional Data Element names or documentation may be stored. Document Text associated with a Document Type of "Long Name" ("Long Name" in old CDE Browser) or "Historic Short CDE Name" ("Short Name" in old CDE Browser) are additional Data Element names, often containing the text or question most likely to be used on a CRF. Instructions will be associated with a Document Type of "Comment". |
Context | the business unit or other content division that is responsible for creating and managing associated content; in the caDSR, Contexts currently represent NCI programs and divisions. |
Workflow Status | the administrative status of a Data Element or other administered component. Within the "CTEP" Context, this refers to the Data Element's progress in the CDE disease committee review process. Please refer to the caDSR Business Rules for definition and usage information for each workflow status. |
Version | the version number of an administered component; the version number is incremented when significant changes are made to an administered component. Please refer to the caDSR Business Rules for an explanation of rules governing the creation of new versions of administered components. |
Origin | the source of the administered component or standard on which it is based. |
Historical CDE ID | a number that was previously assigned to a Data Element as an identifier; a Data Element may have many Historical CDE IDs. |
Public ID | the unique seven-digit identifier assigned to each administered component, for a Data Element may also be referred to as the CDE ID; each administered component has one and only one Public ID. |
Designation | the indication by a Context of their endorsement of a Data Element created by another program or division, may also be referred to as "Used By"; indicates to users that the Data Element is approved for use in this other Context as well. Please refer to the caDSR Business Rules for more information about the use of designations. |
Data Element Concept | the representation of a semantic concept without ties to a specific data type, similar in nature to a Data Element. |
Value Domain | the collection of attributes that describe in detail the type of data to be collected. Within the "CTEP" Context, a Value Domain describes the type of data that is being collected by a question on a CRF. If there is an enumerated list of Valid Values, it is these Valid Values that appear on the CRF as potential answers. |
Valid Values | the enumerated responses, defined by Value Meanings, associated with a Data Element through its associated Value Domain, may also be referred to as "Permissible Values". Within the "CTEP" Context, values that appear on the CRF as potential answers to a question. |
Conceptual Domain | a collection or description of related Value Meanings |
Classification | the relational categorization of Data Elements or other administered components for purposes of organization and ease of searching. Within the "CTEP" Context, classifications indicate the collections of Data Elements approved through the CDE disease committee review process and group Data Elements according to probable form use. |
Classification Scheme | a defined system for categorizing Data Elements or other administered components, may also be referred to as "CS"; a Classification Scheme is composed of related Classification Scheme Items that serve as categories defining the scope of the scheme. Within the "CTEP" Context, there are three main Classification Schemes: "Disease", "Trial Type Usage", and "Category". |
Classification Scheme Item | a category within a Classification Scheme to which Data Elements or other administered components may be assigned, may also be referred to as "CSI". |
Core | the designation by a disease committee of a Data Element, indicating the committee's determination that it is likely to be used in most phase 3 clinical trials; used on one or more template CRFs. |
Non-core | the designation by a disease committee of a Date Element for which a less frequent need is anticipated; does not appear on any template CRFs for the disease. |
Locating CDEs
The CDE Browser provides two main mechanisms for locating CDEs.
The first of these is to enter search criteria in the text boxes on the right frame of the screen. You may search by keyword or CDE ID. In addition or instead, you may search by associated Value Domain or Data Element Concept, Workflow Status(es), and Classification assignments. You may also specify whether you wish to retrieve all versions of each matching CDE or only the most recent version; in most cases, the latest version will be the approved or "Released" version of the Data Element. Keyword searches can be limited to one or more types of Data Element names, if preferred. Searches by CDE ID will search both CDE IDs and Historical CDE IDs.
The second mechanism for locating CDEs is to use the navigation tree in the left frame. Each time you click on a node in the tree, all CDEs associated with that Context, Classification Scheme, Classification Scheme Item, or Protocol Form Template (template CRF) are displayed in the right frame under the search criteria. Navigation Links are displayed at the bottom of the frame so you may view a different page of CDEs.
You may also use a combination of these mechanisms, clicking on a node in the navigation tree and then entering search criteria. The criteria will only be matched against the CDEs associated with the selected Context, Classification Scheme, Classification Scheme Item, or Protocol Form Template. If you do not click on any nodes or if you click on "caDSR Contexts," your search criteria will be matched against all CDEs in the caDSR.
You can find CTEP CDEs by clicking on "NCI Cancer Therapy Evaluation Program (CTEP)" in the tree. You may further refine your search by clicking on the CTEP folder icon. From here, you may click on "Protocol Form Templates" or "Classifications".
"Protocol Form Templates" allow you to search, by Phase or Disease, the CDEs contained on each template CRF developed by the CDE disease committees. These template CRFs are intended to provide examples of use of the most common CDEs for a particular disease. Within each disease, forms are classified further by type of form and study phase. Once you have selected a template CRF, you can view the template CRF in Microsoft Word or download the associated CDEs using XML or Excel; these links are under the "Search Data Elements" and "Clear" buttons. You may also browse the details of the associated CDEs online.
"Classifications" allow you to search the CDEs by category. "Type of Category" classifies CDEs by their typical use in a clinical trial (e.g., Patient Demographics, Labs, Adverse Events). "Type of Disease" categorizes CDEs by disease, according to the decisions of the CDE disease committees. "Trial Type Usages" provides classification by disease phase or disease description, according to use on template CRFs.
"Type of Disease" allows you to search the CDEs by CDE disease committee designation as "core" or "non-core".
Once a list of CDEs has been selected, either through searching or use of the navigation tree, you may view the details of each individual element online or the entire list may be downloaded using XML or Excel. To download selected CDEs, click on the appropriate link for the preferred format (XML or Excel); these links are under the "Search Data Elements" and "Clear" buttons. The Excel download contains the most pertinent details for each Data Element, including the names of the associated Data Element Concept and Value Domain, and all of the associated Valid Values. The download in XML provides significant detail, including many of the attributes specified in ISO 11179 for each Data Element. The DTD used by the CDE Browser to download Data Elements shows the fields from which data is included and how the data is ordered.
CDE Details
Once you have located the desired CDE by searching or using the navigation tree, you may click on its underlined Preferred Name to view its details online. This will open a pop-up window with five tabs. Those that will be of most use to the general CTEP user include Data Element (details about the CDE), Valid Values (attributes of the associated Value Domain and Valid Values), and Classifications (categorization of the CDE).
Building Case Report Forms (CRFs)
Using CDEs on CRFs
Once CDEs for a disease are released by CTEP, they must be implemented on CRFs for all phase 3 studies of that disease submitted to CTEP. Collections of disease-specific CDEs have been developed and released for public use for bladder, breast, colorectal, gynecologic, lung, prostate, and upper gastrointestinal cancers, as well as for melanoma and leukemia. The exact language of the CDE, both Data Element name and Valid Values, must be used.
CDEs that have been approved for use on CRFs being submitted to CTEP include those that have "CTEP" as their Context or designating Context and also have "Released" or "Released-non-compliant" as their Workflow Status. CDEs in the CTEP Context with other Workflow Statuses are either in the process of being reviewed by a disease committee or have been retired or removed from use. Please consult the Workflow Status definitions for more information.
Entries in the following fields may be used as questions on CRFs: Long Name, or entries in Document Text of Type "Long Name" or "Historic Short CDE Name". When choosing a CDE to use, look carefully at its names and definition to determine whether it is appropriate for your needs. For a given CDE, the entire set or a subset of the Valid Values may be used as answers to a question.
If you cannot locate a "Released" or "Released-non-compliant" CDE that is appropriate for the question you would like to ask on your CRF, expand your search to include CDEs with other Workflow Statuses. Many of these CDEs are currently being reviewed by the CDE disease committees, but special approval may be given for their use on your CRFs if suitable and there are no "Released" terms that might be recommended. Specifically, you might look for CDEs with the following Workflow Statuses: Committee Approved, Committee Submitted Used, Approved for Trial Use, Draft Mod, Draft New, Committee Submitted, Retired Withdrawn.
If you are unable to locate any CDEs that are appropriate for the question you would like to ask, please word the question in a manner similar to other CDEs and submit it on your CRF. The reviewers will conduct an extensive search to determine if there is an existing CDE to recommend or if there is need for a "Draft New" CDE to be created.
For questions concerning CTEP data in the caDSR, please contact the NCI CTEP CDE Compliance Review Team.
CDE Compliance Review
When your CRFs have been developed, submit them to CTEP's Protocol and Information Office (PIO), who will forward them to the CDE reviewers for compliance review. It is preferred that the CRFs be submitted as e-mail attachments without security that prevents copying text from the files.
The initial review consists of three Excel spreadsheets that report the results of the CDE Compliance Review and a statement of whether the CRFs are considered CDE-compliant.
The Question Comparison Report indicates for each CRF question whether it was considered an exact match to an existing CDE, whether it should be replaced with a recommended term, or whether it has been created as a new element to meet the specific needs of the protocol. Where indicated, a response will be required of you in the Group Comments column, such as whether you agree with and will use the recommendations or would like to suggest another Data Element for use.
The Valid Value Comparison Report indicates whether, for each CRF question, each CRF valid value was considered an exact match to an existing value, whether it should be replaced with a recommended term, or whether it has been created as a new value to meet the specific needs of the protocol. Again, where indicated, a response will be required as to whether you agree with and will use recommendations or would like to suggest another Valid Value or Data Element for use.
The Proposed New Data Elements Report includes those CRF questions for which there was no possible match in the dictionary. It is requested that you develop the definition for each of these so new Data Elements may be created. Use of these new Data Elements is allowed on a one-time basis for the particular protocol for which they were created; they will also be forwarded to the appropriate CDE disease committee as part of the CDE change management process. If approved by the committee, these Data Elements will be published and will be available for use in future studies.
Unless the CRFs are CDE-compliant, a response is required of you. Please indicate your responses, where required, on the three spreadsheets and submit these and your revised forms for re-review. The same review process will be conducted on any non-compliant questions, as well as any questions or values new to or modified on the CRFs.
When the CRFs are CDE-compliant, final spreadsheets indicating the CDE ID numbers used will be sent to you through PIO and to the Cancer Trials Support Unit (CTSU), if appropriate.
It is required that you resubmit your CRFs when any changes are made to them. To speed the review process, please include a memo outlining what changes were made.
CDE Development
The development of CDEs has been a collaboration of CTEP and the Clinical Trials Cooperative Group Program, Specialized Programs of Research Excellence (SPOREs), the Cancer Biomarkers Research Group, the Early Detection Research Network, NCI Center for Bioinformatics, Oracle Corporation, and The EMMES Corporation.
Standards Leveraged by the CDE Project
The CDE project extensively leverages all existing work supporting the collection of common data, such as the CRFs, surveys, and data reporting formats developed by a variety of groups.
- Expanded Participation Project (EPP)
- Clinical Data Update System (CDUS) through the Cancer Therapy Evaluation Program (CTEP)
- Clinical Trials Cooperative Group Program
- Cancer Family Registry
- Cancer Genetics Network
- Early Detection Research Network (EDRN)
- Lung Cancer Biomarker Chemoprevention Consortium (LCBCC)
- Specialized Programs of Research Excellence (SPOREs)
The following cancer-specific standards have also been considered in developing CDEs:
- American Joint Committee on Cancer (AJCC)
- American College of Surgeon's Commission on Cancer (COC)
- NCI's Surveillance Epidemiology and End Results (SEER) program
- North American Association of Central Cancer Registries (NAACCR)
The following national and international standards and standards organizations were consulted in developing CDEs:
- World Health Organization (WHO)
- International Classification of Diseases (ICD)
- Standard Industry Classification (SIC)
- National Drug Codes (NDC)
- International Medical Terminology (IMT)
- Medical Dictionary for Regulatory Activities (MedDRA)
- Unified Medical Language System (UMLS)
- Digital Imaging and Communications in Medicine (DICOM)
Refer to the Glossary at the end of this document for a more complete list of all organizations contributing to the standards used in CDE development.
Disease Committee Participation
The disease and special topics committees include representatives from one or more groups listed below. Oracle Corporation and The EMMES Corporation provide technical support to the CDE project.
- Cancer Therapy Evaluation Program (CTEP)
- Lung Cancer Biomarkers and Chemoprevention Consortium (LCBCC)
- American College of Surgeons Oncology Group (ACOSOG)
- Cancer and Leukemia Group B (CALGB)
- Children's Oncology Group (COG)
- Eastern Cooperative Oncology Group (ECOG)
- European Organization for Research and Treatment of Cancer (EORTC)
- Gynecologic Oncology Group (GOG)
- National Cancer Institute (NCI)
- National Cancer Institute of Canada (NCIC)
- National Surgical Adjuvant Breast and Bowel Project (NSABP)
- New Approaches to Brain Tumor Therapy (NABTT)
- North Central Cancer Treatment Group (NCCTG)
- Radiation Therapy Oncology Group (RTOG)
- Southwest Oncology Group (SWOG)
Glossary
Acronyms and Abbreviations
Acronym | Definition |
---|---|
AJCC | American Joint Committee on Cancer http://www.cancerstaging.org/ |
CaPCURE | Association for the Cure of Cancer of the Prostate http://www.capcure.org |
CDC | U.S. Centers for Disease Control and Prevention http://www.cdc.gov |
CDEs | common data elements http://cdebrowser.nci.nih.gov |
CDUS | Clinical Data Update System http://ctep.cancer.gov/reporting/cdus.html |
CFR | U.S. Code of Federal Regulations http://www.access.gpo.gov/nara/cfr/cfr-table-search.html |
COC | Commission on Cancer (American College of Surgeons) http://www.facs.org/cancer/index.html |
CRFs | case report forms |
CTC (2.0) | Common Toxicity Criteria http://ctep.cancer.gov/reporting/ctc.html |
CTCAE (3.0) | Common Terminology Criteria for Adverse Events http://ctep.cancer.gov/reporting/ctc.html |
CTCAE (3.0) | Common Terminology Criteria for Adverse Events |
CTEP | Cancer Therapy Evaluation Program http://ctep.cancer.gov/ |
CTSU | Cancer Trials Support Unit http://www.ctsu.org/ |
DICOM | Digital Imaging and Communications in Medicine (Radiological Society of North America) http://www.rsna.org/practice/dicom/ |
EDRN | Early Detection Research Network http://www3.cancer.gov/prevention/cbrg/edrn/ |
ELCAP | Early Lung Cancer Action Program http://icscreen.med.cornell.edu/ |
EPP | Expanded Participation Project http://spitfire.emmes.com/study/epp/ |
FDA | U.S. Food and Drug Administration http://www.fda.gov/ |
FIGO | International Federation of Gynecology and Obstetrics http://www.figo.org/ |
HIPAA | Health Insurance Portability and Accountability Act http://www.jhita.org/admsimp.htm . |
HISB | Healthcare Informatics Standards Board (American National Standards Institute http://www.ansi.org/standards_activities/standards_boards_panels/hisb/overview.aspx?menuid=3 . |
HL7 | Health Level Seven http://www.hl7.org/ . |
IBCSG | International Breast Cancer Study Group http://www.ibcsg.org/ |
ICD | International Classification of Diseases http://www.who.int/whosis/icd10/ |
ICH | International Conference on Harmonisation [sic] of Technical Requirements for Registration of Pharmaceuticals for Human Use http://www.ich.org/ . |
IEC | International Electrotechnical Commission http://www.iec.ch/ |
IMT | International Medical Terminology |
ISCO 88 | International Standard Classification of Occupations http://www.ilo.org/public/english/bureau/stat/class/isco.htm |
ISO | International Organization for Standardization http://www.iso.org/ |
ISO 3166 | International Organization for Standardization, Codes for the Representation of Names of Countries and Their Subdivisions http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/index.html . |
ISO 8601 | International Organization for Standardization, Data Elements and Interchange Formats - Information Interchange - Representation of Dates and Times http://www.iso.ch/iso/en/prods-services/popstds/datesandtime.html . |
ISO 11179 | International Organization for Standardization, Information Technology - Specification and Standardization of Data Elements |
LCBCC | Lung Cancer Biomarkers and Chemoprevention Consortium |
LOINC | Logical Observation Identifiers, Names and Codes http://www.loinc.org |
MedDRA | Medical Dictionary for Regulatory Activities http://www.fda.gov/MedWatch/report/meddra.htm |
NAACCR | North American Association of Central Cancer Registries http://www.naaccr.org |
NABTT | New Approaches to Brain Tumor Therapy http://www.nabtt.org |
NAICS | North American Industry Classification System http://www.census.gov/epcd/www/naics.html |
NCCTG | North Central Cancer Treatment Group http://ncctg.mayo.edu/ |
NCI | U.S. National Cancer Institute http://www.cancer.gov |
NCICB | U.S. National Cancer Institute Center for Bioinformatics http://ncicb.nci.nih.gov/ |
NDC | National Drug Code http://www.fda.gov/cder/ndc |
NHANES | National Health and Nutrition Examination Survey http://www.cdc.gov/nchs/nhanes.htm |
NIH | U.S. National Institutes of Health http://www.nih.gov/ |
NLM | U.S. National Library of Medicine http://www.nlm.nih.gov |
RECIST | Response Evaluation Criteria in Solid Tumors http://www.nci.nih.gov/bip/RECIST.htm |
SEER | Surveillance, Epidemiology, and End Results Program http://seer.cancer.gov/ |
SIC | Standard Industrial Classification http://www.osha.gov/oshstats/sicser.html |
SPOREs | Specialized Programs of Research Excellence http://spores.nci.nih.gov SPOREs: Specialized Programs of Research Excellence |
UMLS | Unified Medical Language System http://www.nlm.nih.gov/research/umls/ |
USHIK | United States Health Information Knowledgebase http://www.ushik.org/ |
WHO | World Health Organization http://www.who.int |
Clinical Trials Cooperative Groups
- ACOSOG: American College of Surgeons Oncology Group http://www.acosog.org/
- ACRIN: American College of Radiology Imaging Network http://www.acrin.org/
- CALGB: The Cancer and Leukemia Group B http://www.calgb.org/
- COG: Children's Oncology Group http://www.childrensoncologygroup.org/
- ECOG: Eastern Cooperative Oncology Group http://ecog.dfci.harvard.edu/
- EORTC: European Organisation [sic] for Research and Treatment of Cancer http://www.eortc.be/
- GOG: Gynecologic Oncology Group http://www.gog.org/
- NCCTG: North Central Cancer Treatment Group http://ncctg.mayo.edu/
- NCIC: National Cancer Institute of Canada http://www.ncic.cancer.ca/
- NCIC CTG: National Cancer Institute of Canada Clinical Trials Group http://www.ncic.cancer.ca/ncic/internet/standard/0,3621,84658243_85817309__langId-en,00.html
- NSABP: National Surgical Adjuvant for Breast and Bowel Project http://www.nsabp.pitt.edu/
- NWTS: National Wilms Tumor Study http://www.nwtsg.org/
- QARC: Quality Assurance Review Center http://www.qarc.org/
- RTOG: Radiation Therapy Oncology Group http://www.rtog.org/
- SWOG: Southwest Oncology Group http://www.swog.org/
CTEP CDE Category Definitions
Category | Definition |
---|---|
Adverse Events | CDEs that characterize the untoward effects of the therapeutic intervention using the Common Toxicity Criteria (CTC) and Common Terminology Criteria for Adverse Events (CTCAE), to consistently grade the severity of the event and to provide information as to whether the treatment may have been a cause. |
Cytogenetics | CDEs that characterize the results of cellular analysis to identify genetic abnormalities present in cancer by examining cellular components concerned with the structure and function of chromosomes responsible for development and differentiation of cells. |
Disease Description | CDEs that characterize the disease, such as diagnosis, location, and extent. |
Follow-up | CDEs that characterize the sequential assessment of the disease or vital status of the individual, such as progression, long-term toxicity, and date of death. |
Immunophenotype | CDEs that characterize the results of analysis to divide leukemias and lymphomas into clonal subgroups on the basis of differences in their cell surfaces and cytoplasmic antigens, detecting these differences using monoclonal antibodies, flow cytometry, etc. |
Labs | CDEs that characterize the results of laboratory tests such as LDH, WBC, creatinine, and glucose. |
Patient Characteristics | CDEs that characterize the health and emotional state of the individual enrolled on the study including treatments for a prior cancer. |
Patient Demographics | CDEs that characterize the individual enrolled on the study, such as name, address, date of birth, weight, and height. |
Protocol/Administrative | CDEs that characterize the regulatory, reporting, and data management aspects of clinical trials, such as IRB date and protocol number. |
Response | CDEs that characterize the outcome of the study, such as overall tumor response, partial response, and first date observed. |
Treatment | CDEs that characterize the properties of the intervention regimen, such as agent, dose, procedure, and modalities. |
Tumor Markers | CDEs that characterize biological markers for presence or level of involvement; specific to a disease, such as PSA, CA125. |
CDE Terms
CDE Term | Definition |
---|---|
Administered component | an item about which administrative data is collected. A Data Element is the most familiar type of administered component, although many administered component types exist within an ISO 11179 database and within the caDSR. |
caDSR | Cancer Data Standards Registry and Repository, a robust metadata registry, developed and maintained by the NCI Center for Bioinformatics and Information Technology, that stores NCI CDEs and related attributes. |
Case report form (CRF) | a data collection form. |
Case report form (CRF) module | a collection and sequence of elements grouped to provide context for the information requested by the CRF questions. |
CDE Browser | the primary user interface (http://cdebrowser.nci.nih.gov) to search, browse, and export Data Elements from the caDSR. |
CDE ID | a unique seven-digit identifier assigned to each Data Element, may also be referred to as the Data Element's Public ID; each Data Element has one and only one CDE ID. |
Classification | the relational categorization of Data Elements or other administered components for purposes of organization and ease of searching. Within the "CTEP" Context, classifications indicate the collections of Data Elements approved through the CDE disease committee review process and group Data Elements according to probable form use. |
Classification Scheme | a defined system for categorizing Data Elements or other administered components, may also be referred to as "CS"; a Classification Scheme is composed of related Classification Scheme Items that serve as categories defining the scope of the scheme. Within the "CTEP" Context, there are three main Classification Schemes: "Disease", "Trial Type Usage", and "Category". |
Classification Scheme Item | a category within a Classification Scheme to which Data Elements or other administered components may be assigned, may also be referred to as "CSI". |
Common data element (CDE) | a standardized term for the collection and exchange of data. CDEs are metadata; they describe the type of data being collected, not the data itself. A basic example of metadata is the question presented on a form, "Patient Name," whereas an example of data would be "Jane Smith". |
Conceptual Domain | a collection of or description of related Value Meanings; may be either enumerated or descriptive only. |
Context | an organizational division within an ISO 11179 database. A Context may represent a business unit or some other content division that is responsible for creating and managing associated content. All administered components within the database are associated with a Context, either that in which they originated or are used. All CDEs that were created by CTEP are associated with the "CTEP" context. |
Core CDE | a Data Element that is included on one or more template CRFs. The designation of a CDE as "core" for a disease indicates the committee's determination that it is likely to be used in most phase 3 clinical trials. |
Data Element | the basic unit of data that is being collected in an ISO 11179 database, a metadata descriptor. It represents a semantic concept and indicates the specific type of data to be collected. Data Elements are named and defined in a standardized manner according to Context-specific naming conventions. |
Data Element Concept | the representation of a semantic concept without ties to a specific data type, similar in nature to a Data Element. |
Decimal Place | the number of places behind the decimal point in the response to a Data Element; specified for a Value Domain. |
Definition | the detailed meaning of a Data Element or other administered component. |
Designation | the indication by a Context of their endorsement of a Data Element created by another program or division, may also be referred to as "Used By"; indicates to users that the Data Element is approved for use in the designating Context as well. Please refer to the caDSR Business Rules for more information about the use of designations. |
Document Text | the field in which additional Data Element names or documentation may be stored; names in this field are not bound by naming conventions. |
Historical CDE ID | a number that was previously assigned to a Data Element as an identifier; a Data Element may have many Historical CDE IDs. |
ISO 11179 | Information Technology - Metadata Registries (http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html) developed by the International Organization for Standardization and the International Electrotechnical Commission. |
Long Name | the field that stores the primary name of a Data Element or other administered component. |
Maximum length | the maximum number of storage units (of the corresponding data type) that may be used in representing the response to a Data Element; specified for a Value Domain. |
Metadata | data (attributes) that describe the type of data being collected. |
Minimum length | the minimum number of storage units (of the corresponding data type) that must be used in representing the response to a Data Element; specified for a Value Domain. |
Non-core CDE | a Data Element for which a CDE disease committee anticipates a less frequent need. Non-core CDEs are not included on any template CRFs for a disease. |
Object Class term | an administered component, frequently used in naming Data Elements; a thing about which data is being collected. |
Origin | the source of the administered component or standard on which it is based. |
Phase 1 Clinical Trials | these first studies in people evaluate how a new drug should be given (by mouth, injected into the blood, or injected into the muscle), how often, and what dose is safe. A phase 1 trial usually enrolls only a small number of patients. |
Phase 2 Clinical Trials | a phase 2 trial continues to test the safety of the drug and begins to evaluate how well the new drug works. Phase 2 studies usually focus on a particular type of cancer. |
Phase 3 Clinical Trials | these studies test a new drug, a new combination of drugs, or a new surgical procedure in comparison to the current standard for treatment. A participant will usually be assigned to the standard treatment group or the new treatment group at random (called randomization). Phase 3 trials often enroll large numbers of people and may be conducted at many doctors' offices, clinics, and cancer centers nationwide. |
Preferred Name | the field that stores the short, 20- or 30-character name ("computer" name) of a Data Element or other administered component. |
Property term | an administered component, frequently used in naming Data Elements; a characteristic or possession of the object class. |
Public ID | the unique seven-digit identifier assigned to each administered component, for a Data Element may also be referred to as the CDE ID; each administered component has one and only one Public ID. |
Qualifier | an attribute, frequently used in naming Data Elements and other administered components; a modifier that describes any other term, similar to an adjective. |
Representation term | an attribute, frequently used in naming Data Elements and other administered components; specifies the form of the data that is being collected. |
Template CRFs | a CRF developed as a guideline by a CDE disease committee to provide a graphic representation of the "core" CDEs for each disease. The designation of a CDE as "core" for a disease indicates the committee's determination that it is likely to be used in most phase 3 clinical trials. |
Valid Values | the enumerated response, defined by Value Meanings, associated with a Data Element through its associated Value Domain, may also be referred to as "Permissible Values". Within the "CTEP" context, values that will appear on the CRF as potential answers to a question. |
Value Domain | the collection of attributes that describe in detail the type of data to be collected. Attributes of a Value Domain include data type, maximum and minimum field lengths, high and low values, unit of measure, and number of decimal places. A Value Domain may also include an enumerated list of specific Valid Values. |
Value Meaning | the essence of the data that is being collected, rather than the actual data itself. For example, a response to the question "Patient Name" might be "Jane Smith." "Jane Smith" is actual data, whereas the essence of the data is "the name of a person." Another example is the question "Country of Residence," which includes as responses the two-letter code for each country in the world. The codes would be Valid Values in a Value Domain, but the Value Meanings would be the list of countries in the world. |
Version | the version number of an administered component; the version number is incremented when significant changes are made to an administered component. Please refer to the caDSR Business Rules for an explanation of rules governing the creation of new versions of administered components. |
Workflow Status | indication of the administrative status of a Data Element or other administered component. Within the "CTEP" Context, this refers to the Data Element's progress in the CDE disease committee review process. Please refer to the caDSR Business Rules for definition and usage information for each workflow status. |
Additional Resources
- caDSR Wiki home page
- caDSR Users List - Notices of upgrades, new releases, new features, service inteeruptions
- Support - Help with any technical problem
- CTEP's CDE team - Questions about CTEP data and content