NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Step

Action

1

Using a web or ftp client go to the URL: ftp://ftp1.nci.nih.gov/pub/cacore/EVS/
screenshot of FTP directory

2

Select the version of NCI Thesaurus OWL file you wish to download. Save the file to a directory on your machine.

3

Extract the OWL file from the zip download and save in a directory on your machine. This directory will be referred to as NCI_THESAURUS_DIRECTORY in script examples.

4

Create Manifest and Preferences files. (optional)

Manifests update or fill empty coding scheme metadata and can make adjustments to names and alternate names, versions and many other things that the source normally wants to say about the coding scheme.  This is very useful when the source itself does not supply this information, but the user needs to record it in the terminology service representation.  Some versions of the NCI Thesaurus may not load without thema manifest.  What follows is a sample manifest file used to update alternate names, language designations, versions and versionsother metadata.  Details of manifest elements are found in the Administration Guide to the manifest file.



Preference files control how data is loaded and while they don’t add anything to the source, they can change the representation of a terminology , by making choices as to what is loaded as a property, entity, or association from the source.  The following preferences file sets root nodes for the terminology and  and processes a set of complex properties not handled by the owl processor, among other things.  The full preferences definitions are described in the Administration Guide to the preferences file.



Since the Thesaurus may be released as either a "by code" or "by name" formatted source,  preference files can serve the purpose of adjusting the entity code to it's intended place in LexEVS. Here are some differences in an OWL formatted Thesaurus source:

A "By Code" formatted Thesaurus OWL class:



Which has it's entity code formatted as an rdf:ID.  This loads as the unique identifier without using a preferences file.

In the "by name" version we have:



With the code found as a "<code>" tagged attribute of the class:



In this case we can insure that the value "C1324" will be loaded into LexEVS as an entity code by supplying a preferences file with the following values:
Loading this type of source without the preferences file may cause data truncation errors on the longer names and cause data loss on load.


Finally, post processing is also available for this and other sources, but this is dealt with in the Administrative guide  post processor section and we won't revisit here.

5


Using the LexEVS command line utilities, load the NCI Thesaurus with no options.:

Change to the LexEVS home directory.

Code Block
cd {LEXEVS_HOME}/admin

For Windows installation use the following command:

Code Block
LoadOWL.bat –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl

For Linux installation use the following command:

Code Block
LoadOWL.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl

This should work best with a "by code" type Thesaurus source.

6

Using the LexEVS command line utilities, load the NCI Thesaurus with options:

Change to the LexEVS home directory.

Code Block
cd {LEXEVS_HOME}/admin

For Windows installation use the following command:

Code Block
LoadOWL.bat –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl" -mf "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_MF.xml"
-lp "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_prefs.xml"

For Linux installation use the following command:

Code Block
LoadOWL.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl" -mf "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_MF.xml"
-lp "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_prefs.xml"


...

Note
titleNote

NCI Metathesaurus contains many individual vocabularies some of which are large vocabularies in and of themselves. It requires many hours to load and index. It can require 36 hours on a multiprocessor machine with 6g 6 gb plus memory. The total time to load the NCI MetaThesaurus will vary depending on machine, memory, and disk speed. Because this loader uses a batch loading strategy it is less dependent on memory, but some users will see 3 or 4 day load times with average multiprocessor processing power.

Step

Action

1

Using a web or ftp client go to the URL: ftp://ftp1.nci.nih.gov/pub/cacore/EVS/
screenshot of FTP directory

2

Select the version of NCI Metathesaurus RRF you wish to download. Save the file to a directory on your machine.

3

Extract the RRF files from the zip download and save in a directory on your machine. This directory will be referred to as NCI_METATHESAURUS_DIRECTORY.  RELASE_INFO.RRF is required to be present for the load utility to work.

4

Using the LexEVS utilities load the NCI Thesaurus:

Code Block
{LEXEVS_HOME}/admin

For Windows installation use the following command:

Code Block
LoadMetaBatch.bat –in "file:///{NCI_METATHESAURUS_DIRECTORY}/"

For Linux installation use the following command:

Code Block
LoadMetaBatch.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/"

...

Since this loader is resource hungry we provide the option to restart should you find your resource settings to be inadequate. Resuming loads which have crashed or been interrupted by server problems is possible using the ResumeBatchLoad script set.

Step

Action

1

Open the lbGUI

2

Find in the table the terminology with the broken or stopped load.
This should have a status of pending

3

Highlight and double click the terminology row in the table.

4

On the resulting window note for the following command line execution:
The URI and the version.

5

Using the LexEVS utilities retart the load of the NCI Metathesaurus:

Code Block
{LEXEVS_HOME}/admin

For Windows installation use the following command:

Code Block
ResumeMetaBatch.bat –in "file:///{NCI_METATHESAURUS_DIRECTORY}/" -s "NCI Metathesaurus" -uri "urn:oid:2.16.840.1.113883.3.26.1.2" -version "200601"

For Linux installation use the following command:

Code Block
ResumeMetaBatch.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/" -s "NCI Metathesaurus" -uri "urn:oid:2.16.840.1.113883.3.26.1.2" -version "200601"

...