3-Using the Semantic Integration Workbench
What is the SIW?
The Semantic Integration Workbench (SIW) is a tool to help UML model owners work through the semantic annotation process, and to remove (whenever possible) the need for users to understand the more complicated details of semantically integrating their model with other models registered in the caDSR.
The SIW essentially helps UML model owners bring the metadata for their models into line with the models used by other organizations. To do this, the SIW:
- Automates searching for and (where possible) matching of UML elements to items already resident in a controlled vocabulary (EVS).
- Allows users to create and/or edit element description tags in the SIW interface rather than having to update the model separately and re-export.
- Streamlines semantic annotation by offering direct queries to the NCI Thesaurus, inserting the concept information from the search into the user's file in the SIW, and creating the appropriate tag names automatically. This eliminates syntax errors in the final XMI file.
- Allows for curation of the XMI file which maps existing or new EVS concepts (not yet in the NCI Thesaurus) to classes and attributes. (Curation is performed by CBIIT personnel.)
- Allows for mapping of attributes to existing caDSR data elements, either automatically through the Roundtrip step or manually through the SIW viewer.
- Allows review of the final XMI file before it is loaded, providing for individual review and acceptance of each entry.
- Ensures that files submitted for loading into the caDSR have been checked for missing information and validated using Silver Level Compatibility rules.
The SIW performs the association of the model's elements using an XMI file exported/saved from the UML modeling program used to create the model. The mapping of the elements in an XMI file to EVS concepts is called "annotation."
To be sure we're clear on these terms, an Unannotated XMI file is one that has been exported from the UML modeling software but has not yet had all of its elements mapped to either EVS concepts or caDSR CDEs. An Annotated XMI file is one where all of the elements in the model have been mapped to EVS concepts or to caDSR CDEs. The elements in the file have been annotated with this additional information.
Why should I use the SIW?
While you can manually annotate the UML model to achieve semantic integration (by adding all of the necessary tagged values yourself), using the SIW is the preferred method. The SIW streamlines and partially automates the process of semantic integration, as well as the required detailed review and approval of each element before a model can be registered into caDSR. Manual annotation has the potential to introduce human errors that can cripple metadata registration and the UML Loader.
Once the annotated version of the model has been approved by the model owner and the EVS curation team, the XMI file is sent to the caDSR team to be transformed into caDSR metadata via the UML Loader.
What is the UML Loader?
The UML Loader or UML Model Loader is a Java application that transforms UML domain models into caDSR metadata, reusing existing caDSR administered items or creating new ones as needed. This process is also referred to as registering the UML model in caDSR.
The UML Loader transforms the UML model information from an annotated XMI file into caDSR metadata while also mapping the registered metadata back to the UML model that uses it.
This metadata registration into the caDSR is performed by a team at CBIIT, working with model owners to be sure that the information from their model is registered properly.
The EVS concept annotations contained in the file are the basis for determining whether each of the UML elements can be represented using existing caDSR components or if new ones must be created. Specifically, the UML Loader transforms UML model attributes and classes, including inheritance and association links, into caDSR metadata. So for every element in the UML model, a corresponding metadata component (also referred to as an administered item) in caDSR is either mapped or created (and then mapped). The "mapping" of caDSR components involves adding information about the project and the model to the component information in caDSR.
Overall Workflow for Semantic Integration
The semantic integration process consists not only of annotating an XMI file and mapping the model elements to other pre-existing elements, but also of adding the appropriate items into the UML model before exporting the XMI file, and then of reloading the mapped items from the annotated XMI file back into the UML model for use in the next version of the system.
Through the SIW and the UML Model Loader, semantic integration is divided into the following phases:
Phase One - The model owner creates the necessary tagged values for the model elements, and if necessary, creates any local value domains to be used by the model (local value domains are optional). When the model is complete, the model owner creates an XMI file of the UML model (either by exporting from Enterprise Architect, which creates an .xmi file or saving from ArgoUML, which creates a .uml file).
Phase Two - Using the capabilities of the SIW, the elements in the XMI file are semantically annotated using NCI Thesaurus concepts in an iterative process between the model owner and the EVS curation team. Once the XMI file is fully annotated, the model owner reviews the file and, if accepted, submits the file, along with a submission form, to the CBIIT caDSR team to have the model loaded into caDSR.
Phase Three - The caDSR team takes the file and the submission request form information and loads the UML model to the caDSR Sandbox using the UML Loader. If the model loads successfully, the model owner reviews and approves the model for loading to caDSR Production. Once loaded to Production, the model owner reviews it again and requests any necessary changes to the metadata.
After curation, the registered model is sent through compatibility review and once approved, is released to the public.
Phase Four - After the model has been loaded to caDSR and released, the model owner can use the Roundtrip mode of the SIW (described below) to bind the now registered caDSR metadata to the model elements in the XMI file. The model owner can then import all of the XMI file annotations back into the UML model (they are applied to the model as tagged values for each element). This readies the model for reuse or for the next version of the system.
Once you have a fully annotated XMI file, you can use the caCORE SDK Code Generator to produce the final public APIs for your system.
Does the SIW also add GME Tags?
Yes it does, although your model does not have to contain GME tagged values in order to be considered semantically annotated. The SIW can add GME annotations to your model, but those SIW capabilities are treated separately from the other features of the SIW.
Without going too deeply into the topic of GMEs, there are a few things you should understand about the following two terms: XSD and GME tagged values.
What is XSD?
XSD stands for XML Schema Definition. XSD is an XML language used to express a set of rules to which XML documents must conform in order to be considered "valid." So when someone refers to an XSD, they mean an XML schema document that describes or defines a specific schema.
XSDs are important for caGrid. A caBIG-compabible UML model indexed in caGrid is known as a "Grid Data Service." Grid services exchange data using XML. The format of the XML exchanged is described in an XSD document. The XSD document is stored in a caGrid component called GME (Global Model Exchange). caGrid allows data services to be created from any model registered in caDSR. Registering GME metadata (along with the other metadata used by your model) allows for the automatic retrieval of XML schema definitions.
What are GME tagged values?
The GME information resides in your model as tagged values. GME tagged values are intended to capture the mapping between the UML model elements and the XML schemas that define the XML that will be used to exchange data represented by your model.
So if you are loading your model to caGrid, having the proper GME annotations in your model is important. Here we are using the term "GME annotation" to refer to both the GME tagged values as well as the process of including them in your model.
GME Namespaces are represented as URIs. If you use the SIW to create default GME tags, the URI provided is automatically derived by the SIW. The important thing about this URI is that it gives the caGrid enough information to look up the relevant portion of the identified XSD for any component in a model. The GME namespace URI is saved in caDSR as an "Alternate Name" for the model element. The alternate name added into caDSR for the element will have a type that specifies that it is a GME namespace tag.
When you view CDE information in the CDE Browser, you can see these alternate names and their types in the Alternate Name information section of the Data Element Details tab.
As stated above, inclusion of the GME annotations is not required for semantic integration of your model. The SIW will run just fine with or without the GME tags, and the UML Loader will still register models that do not have GME tags. The importance of having the GME annotations in caDSR is so that caGrid can access this information directly from caDSR rather than having to go through the step of finding the appropriate schema.
GME tagged values can be added either through the SIW or using caAdapter Global Model Exchange Version 4.2. The difference between using SIW or caAdapter GME v4.2 for generating the GME tagged values is as follows: use caAdapter GME v4.2 if you want to specify the URL of the XSD yourself; use SIW if you want default URLs generated for you.
To see the names of the GME tags and to learn a little more about the details of these values, refer to the XMI Tag Reference wiki page. Additional information regarding the use and loading of GME tagged values is also available on the GME Design wiki page.
The SIW/UML Model Loader Guide also contains additional information and sample scenarios for using the SIW to generate GME namespace tags for your model.
The initial Welcome screen of the SIW lists the different options provided by the program to help you semantically integrate your model. Each of these capabilities and what they do are described briefly here. However as with all of the caCORE tools, you should refer to the guide for more information and details.
Keep in mind that the input for each of these options is an XMI file that represents the UML model of your system. Furthermore, the state of the XMI file that is input and output from each option will vary, depending on where you are in the semantic integration process.
Review Unannotated XMI File
This is the first step of the SIW and should be performed with all newly exported XMI files before running any other SIW options. It allows you (the model owner) to view the XMI representation of your UML model. It provides an easy way to check the model for missing object definition tags or other problems with the file that will require changes before continuing. Primarily the errors you will see noted are those that indicate that your model has elements that do not have descriptions associated with them. The element descriptions that you provide are a key part of making sure that your model elements are mapped to the appropriate EVS concepts.
If you provided description tags in the model, you will be able to see your descriptions in the SIW. However if you need to add or edit these descriptions, the SIW provides a text box that allows you to enter element descriptions through the SIW rather than having to return to the model to add the appropriate tags.
If you add or change these descriptions through SIW, you can save the XMI file from SIW and import it back into your modeling tool.
Perform XMI Roundtrip
This step of the SIW is an automated step that can automatically annotate the elements in your XMI file that are based on a previously registered model. This is useful if you have based parts (or all) of your model on another system's model, or if you have a previously registered version of your model that you have updated for a new version of the system. In these instances, the Roundtrip option can save you a considerable amount of time by automatically mapping your XMI file's attributes to the caDSR common data elements (CDEs) used by the registered model you identify for the Roundtrip task. The automated matching is based on your model's having used exactly the same attribute names as the model you identify as the comparison/target for this step.
This step also lets you choose to have the SIW automatically insert GME namespace tags for the reused elements in your model that point to the existing XSD schema for those elements. If the GME alternate names do not exist in caDSR, or you do not select to insert the GME namespace tags, the XMI Roundtrip mode will not insert any of these tags back into the input file.
Run Semantic Connector
For users who are trying to adapt a system to caBIG compatibility for the first time, this is the step you are looking for. The Semantic Connector step is the automated feature of the SIW that searches EVS for concepts to match up to elements in your model. This is where the modeling best practices discussed in the process overview come in. The EVS search uses the camel case naming convention to determine the different component parts of your model elements. When matches are found, the SIW attaches one or more EVS concepts to each element. This process produces an annotated XMI file that is automatically saved to same location as the original file, though with "FirstPass" prefixed to the file name to differentiate it. You can then use the SIW to review the output file and see the EVS concepts that were mapped to your elements. You can then make changes to these mappings and submit the model to the EVS curation team.
Keep in mind that this step is not perfect. It is automated and there is no way to truly automate finding terms to represent what YOU mean in your model. The real benefit to this step is that it limits the terms that you need to go through to find a match for the elements in your model.
For example, a model that contains an attribute called "id" is run through semantic connector. The "FirstPass" XMI file that is output from the process has three different EVS concepts mapped to that attribute: identifier, Indonesia, ideology. The Semantic Connector has no way of knowing which of these is correct, so it selects all of what it thinks are the possible correct concepts, allowing you to review these mappings and remove the ones that do not apply.
Curate XMI File
This step is performed by the EVS concept curation team. During this step, the EVS team looks at the mappings added by the Semantic Connector, and then working with you the model owner, helps annotate your model with EVS concepts. In the event that there are no EVS concepts that are appropriate to elements in your model, the curator can create new EVS concepts to bind to those model elements.
This is the step where the descriptions you provide for the elements in your model become very important. The curators cannot possibly be as familiar with your system and your model as you are; they need these descriptions in order to fully understand what the purpose and use of the elements are. This allows them to more accurately identify the proper EVS concepts to bind to your model elements.
Once the curation team has finished their pass at curating your model, they return it to you to review in the next step of the SIW (Review Annotated XMI File). These two steps are typically the most iterative, meaning that very often, the file is returned to the curation team after model owner review, and then back to the model owner after re-curation and verification.
Review Annotated XMI File
This step, along with the Curate XMI File step (above) are typically performed several times until all of the model elements are mapped properly. This step allows you (a model owner or model reviewer) to review element annotations, search EVS for concepts if necessary, and where needed, change the mapping between a class or attribute and its EVS concept(s). Skilled users may also choose to map annotated elements to existing caDSR CDEs and/or value domains.
This mode of the SIW also lets you run validation checks to ensure that the concept information in the XMI file corresponds with EVS concept information, or that any differences between the element descriptions in the XMI file and the mapped EVS definitions are appropriate and valid.
After curation and review, the SIW requires you to at least look at each element in the file, look at the mappings, and check a box to verify that the mapping is appropriate and approved by you. This is because while the curation team can assist and guide you through the process and do some of the mapping based on your element descriptions, you, the model owner, are ultimately responsible for the semantic integration of your model.
Once all of the element mappings have been verified, the file is considered to be semantically complete. The final outcome of this step is a fully annotated XMI file that can be used to register the model into caDSR. This fully annotated file can also be used as the input to the next version of the system/model.
Generate Default GME Tags
The Generate Default GME Tags step of the SIW creates GME namespace tagged values in the XMI file that contain URLs that identify the XSD schema for your model.
If you use SIW to create default GME tags, the SIW wizard will ask you for the Project Name, the Project Version, the Context where your model will reside, and the top level package where GME tags are to be added.
GME information can be added at the model, package, class, attribute and association level. Currently, the Generate Default GME Tags option adds tags at all of these levels. At the association level, however, only those roles that have names will be tagged. In other words, only the Source and Destination ends of an association that have been given role names will be tagged with GME namespace tags.
If you decide you don't want to use the defaults, then you can specify the actual URIs using caAdapter GME v4.2.
For additional details and information regarding GME annotations, see GME Design wiki page.
Once a GME tag exists in the XMI file, SIW will not override it. If you wish to recreate GME namespace tags for your model, you should first use the GME Cleanup functionality. This removes all GME tags from the XMI file. This is useful for instances when an XMI file is fully annotated with GME tags, but you want to replace those tags, or let SIW regenerate default values.
Keep in mind that both the Roundtrip step and the Create Default GME Tags step only create GME tags for those components that do not already have these tags. By using GME Cleanup prior to running either of these steps, you can guarantee that all of the components that need the new tags will have them added.
All of the SIW options described above can be run on a full XMI file, or you may choose to run each step on only selected elements of the file. When you identify the file to parse, you may select the option to Choose Classes and Packages. Selecting this option adds a screen to the processing wizard that lists the Packages within the XMI file. Expanding a Package shows the classes contained within that package and lets you identify only those you want to process for that step.