NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

If you are reading this page then you know that LexEVS is an enterprise wide terminology server. When first installed it comes with no terminologies loaded into it. This documentation will cover the means for loading most content types that can be loaded. LexEVS was built to accommodate a wide variety of input and meld it into a common form - unifying many common formats. This necessitates a variety of loaders, each used on a specific incoming format. These inputs are typically called terminologies or coding schemes.

LexEVS provides both a LexEVS administrative GUI and LexEVS loader commands to load terminologies.  While the LexEVS administrative GUI is very functional, a system administrator may prefer the command line interface because command scripts can be adjusted to increase memory and tune other java virtual machine settings to insure that loads of larger terminologies have adequate resources.  For example, a user may select a loading script, open it in an editor, increase the java heap size and PermGen memory, depending on the machine’s resources, and save the script before running with the appropriate options written into the command line. Still, the GUI can be convenient for loading smaller terminologies and, in many cases, works fine for loading moderately large terminologies like the NCI Thesaurus.  Loading terminologies requires some knowledge of the source of the terminology.

Generic loading

Most terminology loads can be easily accomplished by pointing either the LexEVS commands or the LexEVS administrative GUI at the terminology source file and running the loader.   Generic loading instructions can be found for the LexEVS administrative GUI or the LexEVS loader commands. For many sources you can use a variation of the following LexEVS command:

Linux

./LoadOWL.sh -in "file:///ontologies/owl/amino-acid.owl"

Windows

LoadOWL.bat -in "file:///ontologies/owl/amino-acid.owl"

This LexEVS loader command loads input in OWL format. Substituting the matching LexEVS loader command for the format being used and pointing the loader to a local source file will load most terminologies. In the LexEVS administrative GUI, loading is accomplished using the "Load Terminology" menu. The administrative options must be enabled first in the Command menu.

Best Practices

As you work with terminologies in LexEVS there are some things that you will find are the best way to approach things to make life easy as far as loading goes.

Setting a terminology as active:

The need to take terminologies offline without unloading them is served by activation. After you load a terminology you must make it active if you want any queries to work against it. The LexEVS Administrative GUI has a button to set the activation state. All the LexEVS loader commands also have a flag that can set a terminology to active upon successful load.

Setting at terminology as the production terminology:

Even if you do not use the abilities, LexEVS can load multiple copies of terminologies. You may want different versions of the same thing for testing let's say. When this happens, LexEVS needs to know which terminology to treat as the default for a query that only specifies a terminology name. It is best if you always tell LexEVS if a terminology is the default, even if you only have one. Some queries, like queries to terminology metadata, do not work without setting a terminology as the production copy. You do this by tagging a terminology. The LexEVS Administrative GUI has a button to change the tag of any loaded terminology. The LexEVS loader commands have a flag that can be used to set the tag.

The tag is a simple string. You can assign any tag you want, but the string recognized by LexEVS is "PRODUCTION" (all cps, no quotes). You should get in the habit of marking loaded terminologies as PRODUCTION. The opposite of that might be "TEST" or just left blank.

Restarting distributed services after loading:

After loading terminologies in a LexEVS Distributed environment you will not see any results of doing so until you restart the web container. This is a limitation of the LexEVS Distributed service. Get used to restarting the application server after loading any number of terminologies. You do not have to restart after each one.

Large Terminologies

Loading any larger terminologies can be very time consuming and resource intensive and this can be helped by the following recommendations for database optimization.  The primary LexEVS configuration file, <LEXEVS_HOME>/resources/config/lbconfig.props, should be changed depending on how the primary key for the database should be generated. The default setting for the value of the database primary key is the following:

# DB_PRIMARY_KEY_STRATEGY indicates which strategy will be used
# for the primary key of the database tables.
# WARNING - This cannot be changed after the initial
# schema installation.
#
# Allowable values include:
#
#	"GUID"
#		- Primary Keys are implemented as random GUIDs.
#	"SEQUENTIAL_INTEGER"
#		- Primary Keys will be sequentially incremented
#		- as Ingeter values.
DB_PRIMARY_KEY_STRATEGY=GUID

Because this default is very taxing on the index processing at the end of the load, we recommend changing it to SEQUENTIAL_INTEGER unless you have a priority need for Global Unique Identifiers.

Special Case Loading

Some terminologies are special cases and need special handling.  Included in this category are the NCI Thesaurus in OWL format and any files loaded from UMLS RRF formatted sources.  The NCI MetaThesaurus is the largest terminology we load and as such it also requires special handling.  OWL terminologies do not normally require special handling, but LexEVS offers some advanced loading options users may take advantage of. Each of these has its own documentation:

  • No labels