NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h4. Loading from UMLS RRF (Rich Resource Format) Files

...



The Unified Medical Language System (UMLS) regularly releases a set of terminologies in a large set of files referred to as the UMLS Metathesaurus.

...

  LexEVS can load the entire set or individual files from this file set using the LexEVS UMLS Batch loader.

...

We briefly mention the UMLS Metathesaurus tools on this page.  Since all documentation for these are maintained at the website linked to below we won't repeat it here.  As well LexEVS has a special purpose RRF loader which loads an NCI version of an entire MetaThesaurus.  This is dealt with in the documentation for the NCI loaders and we won't repeat it's use here.

...

Step

...

Action

...

1

...

Download the UMLS Metathesaurus to a local folder:
Loading to LexEVS from these source files requires that they reside locally so it can be accessed quickly from LexEVS.  This can be done according to documentation available on the UMLS website.  

...

2

...

Subset the desired terminology (optional):
Once the Metathesaurus is downloaded and installed, users can either load from the entire set of files by pointing to the containing file directory or they can use the UMLS tools to subset a terminology (recommended).  Subsetting the terminology beforehand provides improved performance during loads.

...

3

...

Set command line options in the loading script:
If you load larger ontologies, we usually recommend use of the command line options. This will allow manipulation of the memory allocated for loads of larger terminologies such as SNOMED. Scripting options can be added to the scripts contained at <LexEVS install base>/admin  If a user is working on a Linux environment with a 64 bit architecture, then they can use the LoadUmlsBatch.sh file.  On a server class computer with say 16  gigabytes of memory and 8 four core processors users can access fairly substantial resources to load content.   Open the .sh file with a text editor and edit the values for -Xmx and -XX:MaxPermSize as follows "-Xmx6000M -XX:MaxPermSize=256M"  or more if you have adequate resources available. If you have not set the DB_PRIMARY_KEY value to SEQUENTIAL_INTEGER as described earlier it could 33 hours to load a terminology as large as SNOMED which otherwise could complete in 4 hours.

Note
titleMemory Handling

The batch loader is not memory dependent, but at the end of the load the resource is indexed and indexing does require at least 3 GB of memory. Increasing memory can provide faster indexing time.

...

4

...

5

Load the Terminology from the command line referencing the SAB.
 

...

&nbsp;

h4. {info}We briefly mention the UMLS Metathesaurus tools on this page.&nbsp; Since all  documentation for these are maintained at the website linked to below  we won't repeat it here.&nbsp; As well LexEVS has a special purpose RRF  loader which loads an NCI version of an entire MetaThesaurus.&nbsp; This is  dealt with in the [documentation&#124;display/EVS/Installing+NCI+Vocabularies&#124;&#124;&#124;\||] for the NCI loaders and we won't repeat it's use here.{info}

|| Step || Action ||
| 1 | Download and install the UMLS Metathesaurus: \\
Loading to LexEVS from these source files requires that they reside locally so it can be accessed quickly from LexEVS.&nbsp; This is not just a download. The UMLS Metathesaurus also has install steps that are documented by NLM on the [UMLS website|http://www.nlm.nih.gov/research/umls/]. \\ |
| 2 | Subset the desired terminology (recommended): \\
Once the UMLS Metathesaurus is installed, users can either load the entire set of files to LexEVS by pointing to the containing file directory or they can use the UMLS MetamorphoSys tool to subset a terminology.&nbsp; Subsetting the terminology is the same as creating a smaller set with only your desired concepts. For example, you can select to subset out one of the source terminologies that make up the UMLS. This provides improved performance during loads and is typical when you want only one of the original source coding schemes. Again, the documentation to use MetamorphoSys to subset out a terminology is on the [UMLS website|http://www.nlm.nih.gov/research/umls/]. \\ |
| 3 | Set command line options in the loading script: \\
If you are loading a subset you may or may not need to change the memory options. If you load the entire UMLS Metathesaurus, we usually recommend use of the command line options. This will allow manipulation of the memory allocated for loads of larger terminologies such as SNOMED. Scripting options can be added to the scripts contained at <LEXEVS_HOME>/admin. If a user is working on a Linux environment with a 64 bit architecture, then they can use the LoadUmlsBatch.sh file.&nbsp; On a server class computer with say 16 gigabytes of memory and 8 four core processors users can access fairly substantial resources to load content. Open the .sh file with a text editor and edit the values for \-Xmx and \-XX:MaxPermSize as follows "-Xmx6000M \-XX:MaxPermSize=256M" or more if you have adequate resources available. \\
\\
If you have not set the DB_PRIMARY_KEY value to SEQUENTIAL_INTEGER as described under best practices it could take 33 hours to load a terminology as large as SNOMED which otherwise could complete in 4 hours.
{note:title=Memory Handling}The batch loader is not memory dependent, but at the end of the load the resource is indexed and indexing does require at least 3 GB of memory.  Increasing memory can provide faster indexing time.{note} |
| 4 | Find the SAB (RSAB) in the MRSAB.RRF file: \\
Both the LexEVS Administrative GUI and the LexEVS Administrative commands require the user to enter a SAB or source abbreviation when loading RRF files.  This requires that you either know this source abbreviation, or find it in the MRSAB.RRF file contained in the folder of the UMLS installation or the subset you made for the terminology you wish to load.  We recommend you open this file in a text editor and search on the terminology name, for instance SNOMED and you should find a line with that name in a row of text separated by a “pipe” character or “ \| ”. \\
\\  !Screen Shot 2011-11-16 at 4.57.34 PM.png|border=1!\\
The current format of UMLS has the RSAB in column four, which in the case of SNOMED is "SNOMEDCT".&nbsp;   Notice that this is a licensed terminology and all use must be in accordance with the licensing agreements. |
| 5 | Load the Terminology from the command line referencing the SAB. \\
&nbsp; \\  {code}
./LoadUmlsBatch.sh -in "file:///data/phont/ontologies/2011AA" -s "SNOMEDCT"

...

{code}
Note:  The file path is pointing to the directory directly above the .RRF files.

...

6

...

 |
| 6 | Monitor output (optional):

...

 \\
The output from the UMLS batch loader indicates steps of the batch load and can be monitored from the logs at <LexEVS Install Root>/logs/LexBIG_load_log.text.

...

 \\
If you are on Linux this can be done using:

...

 \\
\\  \\  {code}
watch -n .1 -d tail LexBIG_load_log.text.
{code}

...

Sample output of an early load step is as

...

Restarting an RRF Load

Note
titleBear in Mind!

Killed processes cannot be restarted. The load can recover from an application error, but not from an outside activity that stops the process

...

Step

...

Action

...

1

...

Open the lbGUI

...

2

...

3

...

Highlight and double click the terminology row in the table

...

4

...

On the resulting window note for the following command line execution:
The URI and the version.
Image Removed

...

5

Using the LexEVS utilities restart the load of the RRF source:

Code Block
{LEXEVS_HOME}/admin

For Windows installation use the following command:

...

 follows: \\  !Screen Shot 2011-11-11 at 12.28.46 PM.png|border=1!\\
Sample output of a final load step: \\  !Screen Shot 2011-11-11 at 12.24.44 PM.png|border=1!\\
\\ | \\
h4. Restarting an RRF Load\\
 \\  {note:title=Bear in Mind\!}\\
Killed processes cannot be restarted.  The load can recover from an  application error, but not from an outside activity that stops the  process \\  \\
\\  \\
\\ \\
\\
 {note}
\\ |
|| Step || Action ||
| 1 | Open the lbGUI |
| 2 | Find in the table the terminology with the broken or stopped load \\
This should have a status of pending \\
\\  !Screen Shot 2011-11-17 at 11.53.10 AM.png|border=1!\\ |
| 3 | Highlight and double click the terminology row in the table |
| 4 | On the resulting window note for the following command line execution: \\
The URI and the version. \\  !Screen Shot 2011-11-17 at 11.28.55 AM.png|border=1!\\ |
| 5 | Using the LexEVS utilities restart the load of the RRF source:
{code}
{code} | {LEXEVS_HOME}/admin{code}
For Windows installation use the following command:
{code}ResumeUmlsBatch.bat-in [file:///home/LargeStorage/ontologies/rrf/RXNORM/2011AA/] \-s RXNORM \-uri "urn:oid:2.16.840.1.113883.6.88" \-version "10AB_110307F"

...

{code}
For Linux installation use the following command:

...


{code

...

}./ResumeUmlsBatch.sh \-in [file:///home/LargeStorage/ontologies/rrf/RXNORM/2011AA/] \-s RXNORM \-uri "urn:oid:2.16.840.1.113883.6.88" \-version "10AB_110307F"{code}
{code} |