1. Object Model Considerations
1.1 Map Set
- Connects two versioned coding schemes
- Is versioned itself.
- Should be named in a descriptive fashion (e.g. MDR12 to SNOMEDCT_2010_07_31 Mappings).
- Significant amount of metadata exists at this level
- Characterize map sets (complexity, completeness, content domain, “officialness”, ..)
- Consider metadata to track whether mappings are curated or not.
- Consider metadata to track whether mappings are generated (and not reviewed) or not.
- Map set metadata should indicate whether map rank is important, useful, or even present.
1.2 Group of Mappings (a mapping subset, or search results, or a page of listings)
This is more sophisticated with complex rule-based map sets where a collection of mapping entries may actually represent only a single mapping.
1.3 Mapping
- "from" and "to" codes (or expressions of codes)
- Represented as an association with qualifiers for other semantically important information
- If target is an expression, represented as an “association to data”
- Have “default preferred name” in the event that a "from"/"to" code cannot be resolved in a loaded coding scheme
- "map rank" may be a standard part of the model. If so, values should be normalized, so that 1 always means "best" and increasing numbers represent lowering quality (exactly how much lower and why is map set dependent). A "map rank" threshold can be chosen by an application, so that it pays attention only to the "highest quality" mappings as defined by that application. Because the values are map set and application specific - algorithms/decisions used are use-case specific.
1.4 Mapping attributes
Any information about a mapping than what directly fits the “association” object model will be rendered as attributes. Hopefully these attributes will have standard names across different types of map set loader. In other words – we can define the semantics of mapping association qualifiers so that particular ones are used to always represent the same aspect of some kind of mapping semantics.
2. Searching Scenarios
2.1 By name
- Find mappings where the "from" code has a name matching a search string
- Find mappings where the "to" code has a name matching a search string
- Find mappings where the "from" code has a name matching a search string AND the target concept has a name matching a different string
2.2 By code
- like by name but for matching codes
2.3 Restrictions
- Restrict by vocabulary ("from" vocabulary, "to" vocabulary, or map set vocabulary)
- Restrict by “current” – only retrieve mappings whose endpoints can be resolved in current versions of terminologies
- Restrict by cardinality – e.g. only retrieve 1-1 mappings (that meet other criteria) - (1-1, n-1, 1-n, n-n)
- Restrict by association type – e.g. only retrieve “synonymous” mappings
2.4 Default Preferred Names
- In some cases older versions of map sets will be loaded that contain "to" or "from" codes that cannot be resolved among the versions of loaded vocabularies
- In this case, a "default preferred name" will be available so that browsers can still effectively visualize the data.
- This default name could be indexed so that searches retrieve these results, even though the codes no longer exist
3. Browsing/Discovery Scenarios
3.1 Grouping, Categorization
- How authoritative is it?
- Is it actively maintained?
- Does it connect “current version“ terminologies loaded into LexEVS?
- How important or heavily used is it?
- What “kind” of mapping set is it (i.e. what is it for) Group by “from” terminology or “to” terminology
3.2 Selections
- Pre-select certain map sets for searching based on well-defined criteria
3.3 Misc
- When searching across multiple map sets, consider options like grouping all mappings with the same “from” terminology together to make cognitive task easier.
- Support ability to identify map sets that are “to” or “from” a particular terminology
- E.g. get all map sets where the “from” coding scheme is NCI2010_07
4. Presentation Scenarios
4.1 Views
- Map Set view (including all known metadata)
- Map Group view (where needed) – e.g. one “subset” in a complex rule-based mapping.
- Mappings view (sortable table, paging capabilities).
- Standard column headers come from LexEVS view.
- Need to consider “extended” columns that may not apply to all map sets (e.g. MAP_RANK, or other MRMAP fields like MAPPRIORITY, etc)
- Individual mapping view – link "from" and "to" codes to the concept pages for those things – may not need this
- “Mappings” tab on concept view – render all mappings “to” or “from” that concept
- need to determine which map sets to search in
- retrieve this info only when user actually clicks on this tab
4.2 Other Considerations
- Make good decisions about when clicking should open a new window vs. reload in the same page.
- Support a “view all mappings” function for simple browsing of small sets (without requiring search).
- Handling expression-based mappings (may need to parse – can do based on grammar or “style”).
- Consider when a new page should open and when content should be reloaded in the current tab/browser window.
5. Loader Considerations
- Need to specify map set metadata (switches or prefs file)
- Need to indicate end point coding schemes for “from” and “to“ codes.
- If we know they cannot be resolved, placeholder names for endpoints should be provided and loaded.
- WARNING: be careful using Excel to export to CSV because of conversion things like “078.12” to “78.12”
- Consider loading a characterization of the “type” or “category” of a map set for organization in a list and pre-selection (see presentation)
- Consider how many map sets to load into a single coding scheme, and what criteria would be used to make that decision.
- Handle cases where we know the "from"/"to"codes cannot be resolved due to different versions and load default preferred names.
6. Obtaining or Generating Data
6.1 Obtaining
- MRMAP data from UMLS (e.g. SNOMEDCT-ICD9CM)
- CTCAE 3->4 mappings
- Go/BiomedGT Mappings
- Identify major mapping efforts, start gathering data – formats can drive further loader considerations
6.2 Generating Mappings
- Data sets created as part of NCI-META export (e.g. PDQ-NCIt)
- Map sets generated as a special project (e.g NCIt Neoplasms to SNOMEDCT)
- On-demand generation of mappings from an integrated terminology resource (like UMLS or NCI-META).
- Criteria-based candidate selection
- Criteria-based ranking and filtering
- See report (posted to gforge)
- API performance considerations.
7. Maintenance and Legacy Scenarios
It may be important to know what map sets have mappings whose "from" or "to" codes will not resolve.
Managing versions:
- “from” coding scheme,
- “to” coding scheme
- Map set coding scheme itself
Learn what we need to from various mapping maintenance environments about authoring, data models, visualization, etc.
- IHTSDO stand-alone tool (and eventual workbench)
- CogZ (protégé tool)
- Oboedit
- Ad-hoc data manipulation
6 Comments
Safran, Tracy (NIH/NCI) [C]
Please add commentary to this page in this comments section.
Unknown User (solbriha)
The CTS2 model currently includes both an abstract mapping, which has a description and other metadata and one or more mapping versions. Abstract mappings would describe something like "MedDRA to SNOMED-CT" and would identify who did the mapping, what it was for, etc.
A mapping version would be a versioned instance of an abstract mapping and would have a from and to code system version, both of which have to be versions of the from and to code systems in the abstract mapping. It would have additional metadata that would be specific to the version.
We would definitely want to come up with a set of standard metadata about both of these resources along with what information is required and what is optional. Note, however, that a goodly amount of the metadata described in 1.1 is not mapping specific (review cycles, curated, "officialness", etc.) and we may want to see whether there isn't already an ontology out there that could be used for this purpose - at least for assigning names and tags. We need to look at BioPortal, OMV and other ontology markup resources to decide.
Question: Are mappings always going to be 1..1 source to target? I would argue against n..m, but 1..n may be a viable option. If so, might we want to make the target 1..n in both the abstract mapping and the concrete list.
1.2 (Group of Mappings) - would there ever be more than one source concept in a group? If so, can you provide an example? I know that RRF provides for the notion of post-coordinated sources, but is that a level of complexity we would want to address here? Can we define the purpose and role of "map group" a little more completely?
1.3. Does "from" need to be an expression? Also, when it comes to "expression", are we talking post-coordinated expressions or something else? Agree on the "default preferred name" - we need to copy a name across as a "hyperlink anchor" and then link to a coding scheme if possible. Interesting question: when should the "preferred name" be updated?
3.2 - Need more detail, not sure what this is calling for.
Safran, Tracy (NIH/NCI) [C]
Is this abstract mapping implemented in LexBIG 6.0? If so, would it consist exclusively of metadata? I am not quite sure how this would fit into what we've already been shown.
Safran, Tracy (NIH/NCI) [C]
For 1.2, one Group mapping I could think of is to group mappings from Snomed to multiple other vocabularies as a single mapping. We would still want to maintain the individual mappings for users that only want to see, say Snomed to ICD10 and not any others. Is this what you mean?
Carlsen, Brian (NIH/NCI) [C]
I think the key metadata about a map set (whether versioned or abstract) are these things: the "to" and "from" coding schemes the map set connects, the use-case or purpose of the map set, the complexity of the ends of the mappings (individual codes or expressions), the cardinality (1:1, 1:n, n:1), and whether the map set is exhaustive one either end (e.g. whether every code in the "to" coding scheme has a mapping in this map set). That last one is computable, but it's better if it doesn't have to be computed. Some of this information is associated with the abstract or unversioned map set and some with the instance or versioned map set. In the conceptual model suggested above, "mappings" themselves only belong to versioned map sets.
The other kinds of metadata suggested above are really for usability, organization, and change management. I believe it is worth endeavoring to capture as much information about map sets as possible to assist users in what will inevitably be a very complex decision (about which ones to use). Having a formal model of all the metadata we capture is ideal.
All of this metadata should be handled by the loader - either derived from the underlying file format or passed in via command line switches or preferences files.
Now, I want to answer Harold's questions from above:
>Question: Are mappings always going to be 1..1 source to target? I would argue against n..m, but 1..n may be a viable option. If so, might we want to make the target 1..n in both the abstract mapping and the concrete list.
Answer: Definitely not always 1:1. there are certainly map sets with mappings where a "from" code maps to multiple "to" codes and cases where multiple "from" codes map to the same "to" code. And there is no realistic expectation that n..m cases won't also exist.
> Q: 1.2 (Group of Mappings) - would there ever be more than one source concept in a group? If so, can you provide an example? I know that RRF provides for the notion of post-coordinated sources, but is that a level of complexity we would want to address here? Can we define the purpose and role of "map group" a little more completely?
Answer: The group of mappings is used in rule-based map sets (like SNOMEDCT to ICD10 reimbursement map being developed by IHTDO). In this case, there are multiple mappings with the same "from" code, each of which has a rule that leads to a different "to" code. In these cases, the entries are ordered, so that the group of mappings actually represent a single-compound mapping with an ordered set of rules (to be resolved based on some kind of external data). See http://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd9cm_reimburse.html as an example.
> Q: 1.3. Does "from" need to be an expression? Also, when it comes to "expression", are we talking post-coordinated expressions or something else? Agree on the "default preferred name" - we need to copy a name across as a "hyperlink anchor" and then link to a coding scheme if possible. Interesting question: when should the "preferred name" be updated?
Answer: I do not believe we have any cases of map sets yet where "from" is an expression. I do not believe the LexEVS model can support this anyway. The purpose of the "default preferred name" is to guarantee that there is always a label that we can show, regardless of whether a "from" or "to" code or expression can be fully or accurately resolved.
> Q: 3.2 - Need more detail, not sure what this is calling for.
Answer: This is actually about how the term browser should behave. When presenting the user with available list of map sets to search in, some should be pre-selected and others not. It's a browser usability issue.
Unknown User (wynner)
Wow, it's been a while since this has been added to here on the wiki! Tracy and I have just been shown how to load MRMAP files properly into 6.0. All looks perfectly doable. There were some questions about 1) versioning and 2) QA.
Versioning-
To me, when resolving a mapping, you are pulling in a set of codes from both source and target terminologies. Whether or not these codes exist is curator oversight (on both sides: the map, and source/target terminology). On the side of the terminology, we believe an active and inactive status will allow proper detection for historical reference. But, there is no guarantee (that I'm aware of) that miscellaneous MRMAP CUI values will no longer correspond to those source/target terminlogies that are also in Meta. A standalone source might not always be loaded from RRF, and consequently could be different from varied counts, etc...
QA-
For loaded mappings, we have no established QA practice. Might Alameda be adding this to our data verification scripts that are run against the database? This would be the 'lowest hanging fruit'. Maybe there are other ideas I haven't yet thought of.