NIH | National Cancer Institute | NCI Wiki  

Contents of this Page

Overview

The Metadata Maintenance task will clean up of existing content in caDSR database and develop automated programs to keep the content synchronized in conjunction with the vocabulary changes in NCI terminology software system (NCIt)

Scope

The scope of the Metadata Maintenance work comprises three primary tasks:

  1. Retire redundant Representation Terms, Object Classes and Properties
  2. Synchronize NCI CBIIT and caDSR concepts: names and definitions
  3. Develop monthly maintenance program to keep caDSR concepts synchronized with NCIt monthly maintenance

Related Metadata Cleanup GForge Trackers

GF ID

Summary

Priority

Comments

7706, 10738

Add filter for special characters to Concepts "before row" trigger

3

The special characters are non-printing control characters that cause data rendering problems in the caDSR tools and formatting problems in XML documents.

6338

Write Script to Retire Unused OCs, Property and Rep Terms

5

'Retire Withdrawn' all unused OCs, Property, Rep Term components that are not in caBIG Context. For this Tracker, ignore unused Concepts.

2694, 2708

Write script to set EVS_Source to NCI_CONCEPT_CODE

3

Not possible through the user interface

7741

Retire VMs with strange characters in the Short Meaning, Description and preferredDefinition

3

Some characters may be extended ASCII (8 bit vs. current 7 bit); moving to 8 bit character set may resolve this.

819

When object classes or property is changed, the preferred name for data element concept is not changed

3

---

1051

Make sure that Effective Date is being set for Concepts created by the Excel Loader

3

---

2565

Write scripts to correct the origin and database columns for concepts

3

---

4387

There are.some concept derivation rules with no concepts associated

3

---

4598

Trigger on value meaning not copying preferred definition to description

3

---

5440

Some data has leading/trailing spaces.

3

---

6361

Write script to change all OCs and Properties with "Side Effect" in the concept derivation rule to use Adverse Event instead

4

---

6985

Retire C25367 "Assessment" and merge components into C25217

4

---

9983

QUEST_CONTENTS_EXT table lacks trigger logic to set BEGIN_DATE

3

---

Objectives

  1. Reduce the cost of manual identification of redundant metadata
  2. Reduce the cost of retrospective metadata harmonization
  3. Reduce the time and effort required to identify and reuse semantically identical or similar content

Approach

The following steps should be taken to maintain the metadata:

  • CBLOAD will be used as the development environment
  • All cleanup scripts will be owned by cadsr_maint. This account has already been created on CBLOAD
  • CBLOAD is loaded with the SBR and SBREXT schema from CBPROD. Attached are instructions on preparing the CBLOAD environment for writing and testing scripts by cadsr_maint
  • Scripts written in CBLOAD by and owned cadsr_maint are tested. This process can be done several times
  • Once the scripts are tested reports are generated to get approval from content group
  • Once the reports are approved then the scripts in cadsr_maint are moved to cadsr_maint staging
  • The changes are made and can be tested by the content group on stage
  • Once approved, make a backup of cbprod and run the scripts on CBPROD