NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information.

...

Determine if each sample used in an expression profiling experiment is available for a SNP analysis experiment.

...

Support patient to trial matching through the use of computable eligibility criteria

...

When defining new datasets for caIntegrator's data-warehouse for biomedical data collection and analysis, automatically record these new datatypes in a well-defined and federated manner so that data can be shared.

...

...

Support of form annotations to enable form behavior

Forms provide a convenient paper-like electronic mechanism to capture data in a structured way.  For example, when a patient is placed on a clinical trial, data about the patient's demographics and eligibility for the trial need to be captured.  However, forms can also exhibit specific behavior that may or may not be reusable.  These include skip patterns (if the answer to question 10 is "Yes" then skip to question 15), derived values ("what is your age" and "is your age less than, greater than, or equal to 65), and composite answers ("check all" or "more than one of the above").  Furthermore, specific requirements about how a form is rendered can exist.  For example, the question description, help text, valid values, maximum and minimum answer length, the format of a data mask (such as SSN), etc. It is important to be able to allow for forms to be annotated with this behavior such that tools can appropriately render and act upon them.  Furthermore, if appropriate, web- and paper-based collection instruments can be automatically generated from this metadata.

Extend allowable answers with additional permitted values

In many cases, data elements can be reused but the allowable values need to be extended or restricted.  For example, one researcher may want to capture diseases of the nervous system while another may want to capture diseases of the cirulatory system.  These both can be captured in the same data element (disease) using the same controlled terminology (ICD-9).  However, the list of allowable values is quite different.  Furthermore, yet another researcher may want to focus only on certain circulatory diseases, such as those of the heart.  The metadata repository must allow for reuse of data elements while restricting or extending the permitted values.

C. Metadata Specialist Stories

Navigation and creation of metadata through modeling and web tools

The information, including names, semantic meaning, and linkages within and across information models provides a deluge of useful information for clinicians, informaticists, metadata specialists, and software engineers.  However, access to this deep and complex information in an intuitive manner can be challenging.  It is important that access be provided through modeling tools such that metadata can be discovered, reused, and created directly through the tooling that metadata specialists and software engineers are familiar with.  Furthermore, the information models themselves should be browsable through the web in a way that hides the complexity while revealing interesting relationships.

Managing semantic relationships in order to link and share data

In many cases, different systems call the same data element by different names even though they are semantically equivalent in a given context.  For example, a hospital system may have a Patient Last Name and a clinical trials system may have a Subject Surname.  Both of these data elements share a semantic equivalence, but it may be very difficult to combine them automatically.  The metadata registry should provide a way to describe semantic relationships such as this in order to enable the linking and sharing of data.

Supporting interoperability standards (e.g. Healthcare Datatypes)

Leveraging interoperability standards, such as standard data formats and datatypes are critical to data exchange within and across enterprises.  For example, ISO 21090, otherwise known as HL7 Healthcare Datatypes, provide a basic representation of common chunks of data exchanged in the healthcare community, such as Address, Document, and Coded List.  The metadata repository should be flexible enough to encode a variety of standards while restrictive enough to provide a common foundation for data exchange.  Furthermore, it is critical that organizations and individuals be able to restrict, or localize, these standards for custom use.

2nd paragraph: describe the fact that the KR can accommodate any UML-based model (including ISO 21090)

Capturing data in a standard way using data element reuse

Domain Description: a forms curator is sitting down to create the case report forms for a new trial titled "Study of Ad.p53 DC Vaccine and 1-MTin Metastatic Invasive Breast Cancer."  Her goal it to make the forms intuitive, reduce human error when collecting data, and as precise as possible.  When building the demographics form, she decides to make the age data element derived from the date of birth data element.  Entering data that can simply be calculated from other data can only introduce errors, especially since date of birth is also captured in the hospital system so can easily be validated.  When building the medical history CRF, she realizes that fifteen of the questions only relate to women that have previously been pregnant.  She promptly enters a skip pattern based on the gender question, as well as the pregnancy question.  That should significantly save time.  Now that all the questions are entered, she goes back to edit them so have minimum lengths for required text questions, maximum lengths for numeric questions, pick-lists for those questions with a particular set of possible answers, and a data mask for the social security number question.  Now, the clinical data management system can render the forms via PDF using all of this handy information.

Technical Description: Forms provide a convenient paper-like electronic mechanism to capture data in a structured way.  For example, when a patient is placed on a clinical trial, data about the patient's demographics and eligibility for the trial need to be captured.  However, forms can also exhibit specific behavior that may or may not be reusable.  These include skip patterns (if the answer to question 10 is "Yes" then skip to question 15), derived values ("what is your age" and "is your age less than, greater than, or equal to 65), and composite answers ("check all" or "more than one of the above").  Furthermore, specific requirements about how a form is rendered can exist.  For example, the question description, help text, valid values, maximum and minimum answer length, the format of a data mask (such as SSN), etc. It is important to be able to allow for forms to be annotated with this behavior such that tools can appropriately render and act upon them.  Furthermore, if appropriate, web- and paper-based collection instruments can be automatically generated from this metadata.

Extend allowable answers with additional permitted values

Domain Description: In many cases, data elements can be reused but the allowable values need to be extended or restricted.  For example, one researcher may want to capture diseases of the nervous system while another may want to capture diseases of the cirulatory system.  These both can be captured in the same data element (disease) using the same controlled terminology (ICD-9).  However, the list of allowable values is quite different.  Furthermore, yet another researcher may want to focus only on certain circulatory diseases, such as those of the heart.  A metadata specialist can sit with a domain specialist to identify the appropriate ontologies and constrain or expand them as needed.

Technical Description: the metadata repository allows for data element to have a value domain referencing an external terminology.  Furthermore, those terminologies can be constrained or expanded as needed in the local repository.

C. Metadata Specialist Stories

Creation of metadata and management of information models through modeling and web tools

Domain Description: the imaging center at a cancer center has just purchased a magnetic resonance spectroscopy (MRS) machine to add to their numerous magnetic resonance imaging (MRI) machines.  MRS is used to measure the levels of different metabolites in body tissues. The MR signal produces a spectrum of resonances that correspond to different molecular arrangements of the isotope being "excited".  Magnetic resonance spectroscopic imaging (MRSI) combines both spectroscopic and imaging methods to produce spatially localized spectra from within the sample or patient.  A metadata specialist has been assigned to enhance their imaging repository to handle this new type data.  He opens his modeling tool, and begins to add additional classes related to metabolic signatures.  As the metadata specialist types the class name "Metabolite" into the modeling tool, a number of existing classes and concepts are suggested to him automatically.  One of these peak's his interest, and he clicks on the link for more information.  His web browser pops up showing him the data element from a system focused on drug discovery and pharmokenetics.  This is the perfect term to reuse, and this type of linkage should provide for a convenient way to easily match potential drugs with MRS results.  He imports the class into his modeling tool, bringing with it an number of associated classes and attributes that may be of use.

Technical Description: all data elements and referenced concepts in the metadata repository are indexed and easily accessible by type-ahead and other integrated tooling solutions.  The model browser is a convenient interface for exploring the metadata in a UML or data element centric way.  Furthermore, the repository supports the import and export of modeling standards, such as XMI, which facilitates direct reuse.

Managing semantic relationships in order to link and share data

Core to interoperability is capturing data in a standard way using the same or similar data elements.  Data elements individually can be reused, for example allowing for patient data to be joined across systems using the Patient Medical Record Number.  Forms in their entirety can be reused, such as eligibility forms for multi-site clinical trials.  Data formats for encoding biomedical data can be shared, such as MAGE-ML for gene expression data.  This allows for data to be captured in a standard way, shared across platforms and systems, for users to search based on the data that is encoded using type-ahead Google-like functionality, and for users to build new systems based on the standards that are already in use.

Support interoperable system design by finding touch points between information models

In many cases, different systems call the same data element by different names even though they are semantically equivalent in a given context.  For example, a hospital system may have a Patient Last Name and a clinical trials system may have a Subject Surname.  Both of these data elements share a semantic equivalence, but it may be very difficult to combine them automatically.  The metadata registry should provide a way to describe semantic relationships such as this in order to enable the linking and sharing of data.

Supporting interoperability standards (e.g. Healthcare Datatypes)

Leveraging interoperability standards, such as standard data formats and datatypes are critical to data exchange within and across enterprises.  For example, ISO 21090, otherwise known as HL7 Healthcare Datatypes, provide a basic representation of common chunks of data exchanged in the healthcare community, such as Address, Document, and Coded List.  The metadata repository should be flexible enough to encode a variety of standards while restrictive enough to provide a common foundation for data exchange.  Furthermore, it is critical that organizations and individuals be able to restrict, or localize, these standards for custom use.

2nd paragraph: describe the fact that the KR can accommodate any UML-based model (including ISO 21090)

Capturing data in a standard way using data element reuse

Core to interoperability is capturing data in a standard way using the same or similar data elements.  Data elements individually can be reused, for example allowing for patient data to be joined across systems using the Patient Medical Record Number.  Forms in their entirety can be reused, such as eligibility forms for multi-site clinical trials.  Data formats for encoding biomedical data can be shared, such as MAGE-ML for gene expression data.  This allows for data to be captured in a standard way, shared across platforms and systems, for users to search based on the data that is encoded using type-ahead Google-like functionality, and for users to build new systems based on the standards that are already in use.

Finding touch points with other systems when building a population science application

Domain Description: The mission of population science is to reduce the risk, incidence, and deaths from cancer as well as enhance the quality of life for cancer survivors.  Genetic, epidemiologic, behavioral, applied, and surveillance cancer research are typical activities of population science researchers, which combines clinical, basic, and population scientists to further individual and population health.  Patients are often followed for months or years after diagnosis and/or treatment.  A cancer population sciences researcher is studying chemotherapy use in young and elderly patients with advanced lung cancer. For this type of cancer, physicians and patients often have to choose between platinum-based chemotherapy or non-platinum-based chemotherapy. Platinum-based treatment is generally considered to be more aggressive and effective, but it is also more toxic. It is unclear whether physicians are avoiding platinum-based treatments in the elderly because of concerns about frailty and toxicity.  The cancer researcher consults with a metadata specialist for designing the information model that will include patient, clinical, pathology, tissue, and imaging data.  The metadata specialist selects a number of information models that are currently being used by other researchers, and overlays them to determine the data elements that are important for linking and capturing such diverse data.  These are exported from the metadata repositories and imported into his modeling tool to be enhanced with the new fields for the population science research.

Technical Description: each information model has well defined metadata available in distributed metadata repositories.  The nature of the metadata is such that simple queries can determine overlapping data elements.  This can be visualized side-by-side in a tabular format, or graphically in a UML class model.  The metadata repository can output data using UML standards, such as XMI, which can easily be aggregated and imported into a modeling toolOne of the key challenges in designing new systems to be interoperable with existing systems is to identify and integrate the touch points between the existing systems.  For example, in designing a system for capturing new biomedical data based on patient samples, it is important to know the key pieces of information used to link with other systems, such as biospecimen identifier, patient medical record number, bio-image identifier, etc.  The metadata repository should support the ability to discover the touch points amongst all the systems with registered metadata such that new systems can be designed in an interoperable fashion.

Support data transformations in order to allow different tools to work together

...