NIH | National Cancer Institute | NCI Wiki  

Contents

Sign-Off

Sign off

Date

Role

CBIIT or Stakeholder Organization

Scope

The Concept Maintenance effort includes the following main parts:

  • Part 1 -- Retire an identified list of Concepts in caDSR
  • Part 2 -- Retire all the caDSR NCI Metathesaurus concepts and provide pointers to equivalent NCI Thesaurus concepts.
  • Part 3 -- Synchronize caDSR concepts with EVS concepts

GForge Tracker: GF11350

Impact Analysis/Risks

This task will be implemented by executing several activity steps. Between execution of the various activity steps and after the completion of the implementation, the following conditions may be observed:

  • Existing content could be associated with retired concepts
  • Duplicate Administered Element (Object Classes, Properties, Data Element Concepts, etc...) could be found.
  • Concepts with no definition could be loaded in caDSR.

Part 1. Retire caDSR Concepts from Identified list

The purpose of this effort is to retire a set of caDSR concepts, identified on a predetermined list, and migrate the content of the retiring concepts, to a predetermined list of replacement concepts.

Activity 1A: Retire caDSR concepts from list provided by content users

[GF2672]

Retire the identified caDSR Concepts

A list of concepts to be retired was provided by the content team with equivalent replacement concepts for each retiring concept. Retire all the concepts from that list. For the identified concepts in the porovided spreadsheet, concept codes in column A need to be retired in caDSR, by changing the Concept Workflow Status (update to ASL_NAME field) to 'RETIRED ARCHIVED'. Also add a comment note (Append/Update to CHANGE_NOTE field) to the retiring concept code record, indicating the date and change to retired status, and the replacement concept code from column D .

*PLEASE COMPLETE STEP 1 ASAP. Record the target date here: This task was completed on 10/28/2008 (na)(4/28/2009 dw)*

Validate that all of the replacement concepts from the above list already exist in caDSR

[GF TBD] Defer STEP 2 until STEP 1 is complete (4/28/2009 dw)

This task has not yet been completed (na)

1. All retiring concepts in the list provided by the content team have at least 1 or more replacement concepts.

Validate that all replacement concepts exist in caDSR (initial validation is based on Concept codes).

i) NO: If the replacement concept does not exist in caDSR, create it based on the EVS attributes.

ii) YES: If the replacement concept code exists in caDSR, compare the existing Concept Long Name and Preferred Definition with EVS Preferred Name and Preferred Definition (NCI Definition). If a discrepancy is found, update the attributes of the caDSR concept administered item (name or definition) to match EVS. Save the old names/definitions as alternate names and definitions.

2. Create a report of the new and existing concepts (CON_UPD_LOG) and send to the content team for FYI review.

Activity 1B: Migrate to the replacement concepts, all the objects and related content that belonged to the retiring concepts.

Retiring concepts could be associated with various types of administered component records. Retiring concepts can be grouped into a couple of categories based on the fact that they are associated to either a single replacement concept or multiple replacement concepts.

  1. For retiring Concepts associated with a single replacement concept, find the Administered elements (only OC, PROP, REP, CD, VM, VD, CS and CSI) related to the retiring concept. Proceed to step 3 and beyond.
  2. For retiring Concepts associated with multiple replacement concepts, send a report of these concepts and related content usage information (CON_REPL_USAGE), to the Content team for review and feedback on additional processing requirements. For each set of replacement concepts, the content team needs to determine which one to use as replacement concept to build the concept derivation rule. Once that is determined, proceed to step 3 and beyond.
  3. Find the concept derivation rule for the retiring concept. We need to determine if the concept derivation rule for the retiring concept has related content that is not retired and not in the test and training context.
    1. If the concept derivation rule does not have related content: Delete the concept derivation rule for the retiring concept. Create a report of these concept derivation rules (CON_CONDR_DELETE_LOG) and send to the content team for FYI review.
    2. If the concept derivation rule does have related content: Proceed to step 4 and beyond.
  4. Only concepts from processing 3b are including in this processing. Find out if there's an existing concept derivation rule for the replacement concept.
    1. If a concept derivation rule does not exist: Create a new concept derivation rule for the replacement concept and proceed to step 5 and beyond.
    2. If a concept derivation rule does exist: Find out if the CONDR is already used by an Administered Element of the same type as that of the AC related to the retiring concept. The list of AC types includes OC/PROP/REP/CD/VM/VD/CS/CSI.
      1. If the CONDR for the replacement concept is not already used by an AC of the same type, use this CONDR for the processing in step 5 and beyond.
      2. If the CONDR for the replacement concept is already used by an AC of the same type, send a report of these concepts and related content usage information (CON_CONDR_DUPS), to the Content team for review and feedback on additional processing requirements. For each content duplicate set, the content team needs to determine the ACs belonging (via concept derivation rules) to the retiring concepts that need to be updated to the concept derivation rule of the replacement concept. Once that is determined, proceed to step 5 and beyond.
  5. Update the CONDR relationship for the AC content belonging to the retiring concept, by changing the CONDR relationship to that of the CONDR of the replacement concept (from steps 4a and 4b above). For OCs/Props/Reps, update Name/definition. Save the old concept names/definitions as alternate names and definitions. Update the change note for the OCs/Props/Reps by adding the timestamp and nature of this change.
  6. Delete the concept derivation rule for the retiring concept
  7. Create a report of these concepts (CON_MERGE_LOG) for FYI Review by the content team.

Workflow Diagrams

Refer to the workflow diagrams for each of the activities identified in the preceding text. The titles for each of the diagrams are a link to a PDF attachment of the activity descriptions.

Activity 1A
workflow diagram for activity 1A as described.

Activity 1B
workflow diagram for activity 1B as described.

Part 2. Retire NCI Metathesaurus concepts in caDSR

The purpose of this effort is to retire all caDSR Metathesaurus (Meta) concepts and provide, when possible, a pointer to an NCIt equivalent. Migrate usage information for the Meta concepts, to the NCIt equivalents.

Activity 2A -- Retired the Metathesaurus (MetaT) Concepts

Generate Metathesaurus Concept Reports

Generate a report of all NCI Metathesaurus concepts in caDSR. Check if there are equivalent NCI Thesaurus concepts in EVS.
i) Make comparison based on concept codes
ii) Only use non retired EVS concepts (include preferred name from EVS)

iii) If a Thesaurus concept is found in EVS, then check to see if it already exists in caDSR and what the Workflow status is. Make comparison based on concept codes.

Refer to the workflow diagram that follows.

Activity 2A
workflow diagram for activity 2A as described.

Refer to the report results in the attached file Metathesaurus_Concept_rpt_08Apr2009.xls.

The first tab includes a summary count of all the Metathesaurus Concepts by processing category:

Processing Category

caDSR MetaT Concept Count

1. caDSR Meta Concept not in EVS -- No Results found

671

2. caDSR Meta Concept found in EVS with no EVS NCIt concept

1,166

3. caDSR Meta Concept found in EVS with an EVS NCIt concept, but no caDSR NCIt concept

216

4. caDSR Meta Concept found in EVS with an EVS NCIt concept, also found in caDSR with released status

922

The second tab includes a summary count of all the Metathesaurus Concepts by processing categories and workflow status.

The third tab is the detailed report.

Requirements for retiring the Metathesaurus Concepts belonging to each of the above categories

*** Initial Requirements defined during the Concept review meeting on 2/6/2009 ***

Processing Category 1 -- caDSR Meta Concept not in EVS -- No Results found

Denise and Tommie performed EVS searches (using Concept names) for a few of the MetaT concepts in this category and found that some do exist in EVS with concept codes that are different from the ones in caDSR. The concept code mismatch for these MetaT concepts required further investigating, to determine if there was a data change performed that resulted in the mismatch.

Brian Carlsen, from the EVS team, performed a mapping reconciliation of these 671 CUIs, using the history files for NCI Metathesaurus version (200808) and the corresponding UMLS version it contains (2007AB). Brian processed 665 records from the Meta Concept file (6 of the original Meta concept codes were not CUIs) and was able to obtain solid reconciliation matches for 546 (matched to single Live Meta CUI in EVS), with the other 119 requiring additional validation. A report of the 119 concepts, with usage information, was provided to the content team for review and manual curation on March 16, 2009.

Refer to details in the attached file CaDSR Cui Map_UMLS200808_v3.xls.

Of the 546 good matches that were found, 28 were already in caDSR. So these 28 MetaT concepts are already being processed in categories 2 , 3 and 4 above (4 in cat2, 1 in cat3 and 23 in cat4). The remaining 518 *MetaTs are not yet in caDSR. For these {}518* Metas, we queried EVS to find the NCIt equivalents and we were able to group them in categories 2, 3 and 4 as above (301 in cat2, 42 in cat3 and 178 in cat4 ).

See details in attached file MetaT_category1_results.xls

The table that follows summarizes the results for category 1.

Number in category

Processed

Good Matches - Need Review

Good Matches in, not in caDSR

Categories of Good Matches not in caDSR

671 in Category 1

665 Processed for reconciliation, 6 invalid CUIs

From 665 Processed: 546 Good Matches, 119 Need Review

From the 546 Good matches: 518 not in caDSR, 28 already in caDSR(4 in cat2, 1 in cat3 and 23 in cat4)

From the 518 not in caDSR: 301 in cat2, 42 in cat3 and 178 in cat4

What to do with the 6 invalid CUIs? (workflow status and usage info provided in attached file MetaT_category1_results.xls)

Processing Category 2 -- caDSR Meta Concept found in EVS with no EVS NCIt concept

The following algorithm was devised for the category 2 caDSR MetaT concepts (1166 in original report, 298 from category 1):

a. Query caDSR to find out, if there is an NCIt concept with matching name

  • If there is, generate a report (with usage info of both MetaT and NCIt concepts) and have it reviewed by the content users. What to do when multiple matches are found?
  • if there isn't go to step b

b. No caDSR NCIt, no usage or usage in test and training only (no associated objects or associated objects found that are only in the test and training contexts): Retire the Meta concepts that are not already retired by updating the Workflow status to "RETIRED PHASED OUT", the change note to "Meta Concept with no NCIt equivalent" and current date in the Modified date field.

c. No caDSR NCIt, usage outside of test and training (associated objects found outside of Test and Training): generate a report (with usage info of MetaT concept) and have it reviewed by the content team. Could we break these down further to facilitate the review?

Results from the original processing

Number in category

Matches

Multiple Matches

No Match and usage

1166 in Category 2

111 NCIt string matches, 1055 no match

From the 111 with NCIt string matches: 107 with 1 match, 3 with 2 matches, 1 with 5 matches

From the 1055 with no match: 48 have no usage, 1007 with usage

(see details in attached file MetaT_category2_results_original.xls )

Results from the category 1 processing that ended up in category 2

Number in category

Matches

No Match and usage

301 in Category 2 after category 1 processing

3 NCIt string matches, 298 no match

From the 298 with no match: 12 have no usage, 286 with usage

(see details in attached file MetaT_category2_results_reconciled.xls)

Processing Category 3 -- caDSR Meta Concept found in EVS with an EVS NCIt concept, but no caDSR NCIt concept

For the 258 Meta concepts in this category (216 from original processing, 42 from category 1):

a. Load new concept records in caDSR for the NCIt concepts equivalents from EVS. Need to get Concept list validated (list in attachment file) before loading.

b. Then retire the Meta concept by updating the Workflow status to "RETIRED PHASED OUT", with a change note giving the public id, version of the new NCIt concept and the Modified date.

(Refer to the details in the attached file MetaT_category3_results.xls)

Processing Category 4 -- caDSR Meta Concept found in EVS with an EVS NCIt concept, also found in caDSR with released status

For the 1094 Meta concepts in this category (922 from original processing, 178 from category 1), retire the Meta concept by updating the Workflow status to "RETIRED PHASED OUT", with a change note giving the public id, version of the new NCIt concept and the Modified date.

(Refer to the details in the attached file MetaT_category4_results.xls)

Summary: So far, out of the 2975 caDSR MetaT concepts:

  • 1094 MetaTs have NCIt matches in caDSR
  • 258 MetaTs have NCIt matches that need to be loaded into caDSR
  • 60 MetaTs have no NCIt matches and no usage

So 1,412 MetaTs are ready for phased out retirement. The rest, 1563 MetaTs require additional review and/or processing.

Latest task status:

  1. Fully processed Metas (CON_CAT3_LOG, CON_CAT4_LOG, CON_CAT2_NO_USAGE_LOG ) were retired on all tiers on 7/27.
  2. Provided the following feedback reports to the content team on 7/13, waiting on review results to proceed with retirement:
    1. CON_CAT1_REC_LOG (118 Metas with 67 being reviewed by curators): Category 1 Meta concepts that went through the history file reconciliation and generated matching results that needed to be manually reviewed by the content team.
    2. CON_CAT2_NAME_MATCH_LOG (115 Metas): Category 2 Meta concepts with matching (by preferred name) caDSR NCIt concepts. Need to determine the suitability of each match.
  3. CON_CAT2_USAGE_LOG (1293 Metas): Category 2 Meta concepts with usage information. Created a report with string match results between these caDSR Meta and EVS NCItConcept names. Result summary is as follows:
    • 190 caDSR Metas with no EVS NCIt concept name or synonym match
    • 116 *caDSR Metas with single EVS NCIt name or single EVS NCIt Synonym matches *
    • * {}210* caDSR Metas with EVS NCIt Synonym match count between 2 and 20: 85 with name/definition matches in the synonyms, 125 without.
    • 244 caDSR Metas with EVS NCIt Synonym count match between 21 and 99: 34 with name/definition matches in the synonyms, 210 without.
    • 460 caDSR Metas with EVS NCIt Synonym match count greater than 99: 60 with name/definition matches in the synonyms, 400 without name/definition matches in the synonyms.
    • * {}73* caDSR Metas with EVS NCIt Synonym match count greater than 2000: **9* with name/definition matches in the synonyms{}, 64* without

Version 1 of this report provided to the content team on 9/16. Version 2 provided on 9/29, final version provided on 10/6 (see attached CON_CAT2_USAGE_LOG_Match_rpt_v1.xls, CON_CAT2_USAGE_LOG_Match_rpt_v2.xls) The final report can be found in NCI SVN: CON_CAT2_USAGE_LOG_Match_rpt_FINAL.zip.

Activity 2B -- Migrate the content of the retiring Metathesaurus concepts

Once all the Meta Concepts have been cleared for retiring, with or without an NCIt equivalent, the following processing steps need to be implemented for retiring Meta concepts with related content.

Retiring Meta concepts can be grouped into a couple of categories based on the fact that they are associated to either a single NCIt replacement concept or multiple NCIt replacement concepts.

  1. For retiring Meta Concepts associated with a single NCIt replacement concept, find the Administered elements (only OC, PROP, REP, CD, VM, VD, CS and CSI) related to the retiring Meta concept. Find out if there's an existing concept derivation rule for the replacement NCIt concept.
    1. If a concept derivation rule does not exist: Create a new concept derivation rule for the replacement NCIt concept and proceed to step 1c.
    2. If a concept derivation rule does exist: + Find out if the CONDR is already used by an Administered Element of the same type as that of the AC related to the retiring Meta concept. Should the list of AC types include OC/PROP/REP or OC/PROP/REP/CD/VM/VD/CS/CSI?
      1. If the CONDR for the replacement concept is not already used by an AC of the same type, use this CONDR for the processing in step 1c.
      2. If the CONDR for the replacement concept is already used by an AC of the same type, send a report of these concepts and related content usage information (CON_META_CONDR_DUPS), to the Content team for review and feedback on additional processing requirements.
    3. Update the CONDR relationship for the AC content belonging to the retiring Meta concept, by changing the CONDR relationship to that of the CONDR of the NCIt replacement concept (from steps 1a and 1bi above). For OCs/Props/Reps, update Name/definition. Save the old concept names/definitions as alternate names and definitions. Create a report of these concepts (CON_META_MERGE_LOG) for FYI Review by the content team.
  2. For retiring Concepts associated with multiple replacement concepts, send a report of these concepts and related content usage information (CON_REPL_USAGE), to the Content team for review and feedback on additional processing requirements.

Refer to the workflow diagram.

Activity 2B
workflow diagram for activity 2B as described.

Part 3. Synchronize caDSR Concepts with EVS

The purpose of this effort is to continually synchronize the attributes for the concepts loaded in caDSR with their equivalent records in EVS.

We assume that EVS is correct and caDSR has possible errors, i.e. EVS Vocabularies are the masters and caDSR is a subordinate copy.

Validation is done by the Sentinel daily Auto Run Report Generation.

Results are included in a section of the Audit Report.

Validate caDSR Concept attributes

Activity 3A -- Generate Discrepancy reports for Concept Attributes

Concept Null Attribute Report

For each non retired caDSR concepts, find out if all the essential attributes ("preferred_name", "definition", "origin", "evs_source" and "def_source") are not null:

  1. Yes - Good
  2. No - Create a Concept Null Attribute Report
Concept Workflow Status Discrepancy Report

For each non retired caDSR concepts, find out if the concept status is "ACTIVE" in EVS:

  1. Yes - Good (end)
  2. No - Generate Concept Workflow Status Discrepancy Report

Activity 3B -- Generate Discrepancy reports for Concept Names/Definition

Concept Name Discrepancy Report

For each non retired caDSR concepts, find out if the concept name is the same (case insensitive comparison) in EVS:

  1. Yes - Good
  2. No - Does a synonym comparison match it with EVS?
    1. Yes - Good
    2. No - Generate Concept Name Discrepancy Report
Concept Definition Discrepancy Report

For each non retired caDSR concepts, find out if the concept definition (preferred_definition) is the same (case insensitive comparison) in EVS:

  1. Yes - Good
  2. No - Does an alternate definition comparison match it with EVS?
    1. Yes - Good
    2. No - Generate Concept Definition Discrepancy Report
Reconciliation Questions

i) What should we do if we find that a concept has been retired in EVS?
ii) What should we do if we find that a concept has been changed in EVS (long name/ definitions may not match)?
iii) Should there be a monthly batch run for this or should there be a monthly report showing the content managers the changes?

Workflow Diagrams

Refer to the workflow diagrams.

Activity 3A
workflow diagram for activity 3A as described.

Activity 3B
workflow diagram for activity 3B as described.

Validate caDSR Concept content

Activity 3C -- Generate Discrepancy reports for Concept Content

Retired Concept Content Report

For each non retired Administered Components or ACs (only OC, PROP, REP, CD, VM, VD, CS and CSI) related to a Concept derivation rule, find out if the CONDR is made up of RELEASED concept(s).

  1. Yes - Good
  2. No - Create Retired Concept Content Report
Concept Content Long Name Discrepancy Report

For each non retired Administered Components or ACs (only OC, PROP, REP) related to a Concept derivation rule, find out if the AC long name is the same as the names of the composite referenced concepts that it is mapped to?

  1. Yes - Good
  2. No - Generate Concept Content Long Name Discrepancy Report

There is a need to determine which Administered Component Type should be included in this report, based on the fact that the name attribute of the AC type can or cannot be manually curated.

Concept Content Definition Discrepancy Report

For each non retired Administered Components or ACs (only OC, PROP, REP) related to a Concept derivation rule, find out if the AC definition is the same as the definitions of the composite referenced concepts that it is mapped to?

  1. Yes - Good
  2. No - Generate Concept Content Definition Discrepancy Report

There is a need to determine which Administered Component Type should be included in this report, based on the fact that the definition attribute of the AC type can or cannot be manually curated.

Workflow Diagrams

Refer to the workflow diagram.

Activity 3C
workflow diagram for activity 3C as described.

  • No labels