NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As you work with terminologies in LexEVS there are some things that you will find are the best way to approach things to make life easy as far as loading goes.

...

Indexing terminologies:

...

Loading any terminologies can be very time consuming and resource intensive and this can be helped by the following recommendations for database optimization.  This is more necessary the larger the terminology gets. The LexEVS configuration file, <LEXEVS_HOME>/resources/config/lbconfig.props, should be changed depending on how the primary key for the database should be generated. The default setting for the value of the database primary key is the following:

Code Block
# DB_PRIMARY_KEY_STRATEGY indicates which strategy will be used
# for the primary key of the database tables.
# WARNING - This cannot be changed after the initial
# schema installation.
#
# Allowable values include:
#
#	"GUID"
#		- Primary Keys are implemented as random GUIDs.
#	"SEQUENTIAL_INTEGER"
#		- Primary Keys will be sequentially incremented
#		- as Ingeter values.
DB_PRIMARY_KEY_STRATEGY=GUID

Because this default is very taxing on the index processing at the end of the load, we recommend changing it to SEQUENTIAL_INTEGER for any terminology unless you have a priority need for Global Unique Identifiers. Note that this setting is final once any terminology is loaded. You can not change this after it is in effect. Even launching any LexEVS administrative command or opening the LexEVS administrative GUI will make this permanent. The only way to start over and change the setting is to change the lbconfig.props file, drop the database created for LexEVS, and recreate the database. If you are going to make the change this setting then do so before you do anything with LexEVS.

Setting a terminology as active:

...

When you first load any terminology is is not active by default. One thing you must do is to activate it after it is loaded if you want any queries to work against it. The LexEVS Administrative GUI has a button to activate or deactivate any given terminology. All the LexEVS loader commands also have a flag that can set a terminology to be active upon successful load. The reason for having these states is that you can take terminologies offline without having to unload them.

Setting at terminology as the production terminology:

...

It is best if you always tell LexEVS if a terminology is the default, even if you only have one copy of it loaded. Some queries, like queries to terminology metadata, do not work without setting a terminology as the production copy. You do this by tagging a terminology. The LexEVS Administrative GUI has a button to change the tag of any loaded terminology. The LexEVS loader commands have a flag that can be used to set the tag. The tag is a simple string. You can assign any tag you want, but the string recognized by LexEVS is "PRODUCTION" (all caps, no quotes). You should get in the habit of marking loaded terminologies as PRODUCTION. The opposite of that might be "TEST" or just left blank. The reason for having the tagging function is to allow for multiple versions of the same terminology to be loaded with no ill effect because one of them can be designated as the default with this approach.


Restarting distributed services after loading:

...

After loading terminologies in a LexEVS Distributed environment you will not see any results of doing so until you restart the web container. This is a limitation of the LexEVS Distributed service. Get used to restarting the application server after loading any number of terminologies. You do not have to restart after each one.

Large Terminologies

Loading any larger terminologies can be very time consuming and resource intensive and this can be helped by the following recommendations for database optimization.  The primary LexEVS configuration file, <LEXEVS_HOME>/resources/config/lbconfig.props, should be changed depending on how the primary key for the database should be generated. The default setting for the value of the database primary key is the following:

Code Block
# DB_PRIMARY_KEY_STRATEGY indicates which strategy will be used
# for the primary key of the database tables.
# WARNING - This cannot be changed after the initial
# schema installation.
#
# Allowable values include:
#
#	"GUID"
#		- Primary Keys are implemented as random GUIDs.
#	"SEQUENTIAL_INTEGER"
#		- Primary Keys will be sequentially incremented
#		- as Ingeter values.
DB_PRIMARY_KEY_STRATEGY=GUID

Because this default is very taxing on the index processing at the end of the load, we recommend changing it to SEQUENTIAL_INTEGER unless you have a priority need for Global Unique Identifiers.

Special Case Loading

Some terminologies are special cases and need special handling.  Included in this category are the NCI Thesaurus in the OWL source format and any files in the UMLS RRF (Rich Release Format) source format. The NCI MetaThesaurus is the largest terminology to be loaded and as such it also requires special handling.  OWL terminologies do not normally require special handling, but LexEVS offers some advanced loading options users may take advantage of. Each of these has its own documentation:

...