NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »

NCI Thesaurus

This section describes the steps to download and install a full version of the NCI Thesaurus for the LexEVS Service. 

The NCI Thesaurus differs from other OWL formatted resources and, as a result administrators usually take advantage of some of the custom loading resources available in LexEVS.  While users can download the Thesaurus.owl file from NCI and load it into LexEVS without it failing, many users will want to customize it with xml formatted manifest and preferences files. 

Note

The NCI Thesaurus has grown large enough that it can no longer be loaded on many typical desktop machines. We recommend a 64-bit operating system running on a multiprocessor computer with a minimum of 4g of memory. Server class Linux machines are the typical target for these loads. The time to load NCI Thesaurus will vary depending on machine, memory, and disk speed. Expect a couple of hours for a higher end machine.

Step

Action

1

Using a web or ftp client go to the URL: ftp://ftp1.nci.nih.gov/pub/cacore/EVS/
screenshot of FTP directory

2

Select the version of NCI Thesaurus OWL foyou wish to download. Save the file to a directory on your machine.

3

Extract the OWL file from the zip download and save in a directory on your machine. This directory will be referred to as NCI_THESAURUS_DIRECTORY in script examples.

4

Create Manifest and Preferences file. (optional)

Manifests control coding scheme metadata and can make adjustments to names and alternate names, versions and many other things that the source normally wants to say about the coding scheme.  This is very useful when the source itself does not supply this information, but the user needs to record it in the terminology service representation.  Some versions of the NCI Thesaurus may not load without them.  What follows is a sample manifest file used to update alternate names, language designations and versions.  Details of manifest elements are found in the Administration Guide to the manifest file.



Preference files control how data is loaded and while they don’t add anything to the source they can change the representation of a terminology, by making choices as to what is loaded as a property, entity or association from the source.  This preferences file sets root nodes for the terminology and  processes a set of complex properties not handled by the owl processor among other things.  The full preferences definitions are described in the Administration Guide to the preferences file.



Since the Thesaurus may be released as either a "by code" or "by name" formatted source,  preference files can serve the purpose of adjusting the entity code to it's intended place in LexEVS. Here are some differences in an OWL formatted Thesaurus source:

A "By Code" formatted Thesaurus:



Which has it's entity code formatted as an rdf:ID.  This loads the unique identifier without using a preferences file.

In the "by name" version (this in different version so the label and about do not quite match up) we have:



With the code found as a "<code>" tagged attribute of the class:



In this case we can insure that the value "C1324" will be loaded into LexEVS as an entity code by supplying a preferences file with the following values:
Loading this type of source without the preferences file may cause data truncation errors on the longer names and cause data loss on load.


Finally post processing is also available for this and other sources, but this is dealt with in the Administrative guide  post processor section and we won't revisit here.

5


Using the LexEVS command line utilities, load the NCI Thesaurus with no options.:

Change to the LexEVS home directory.

cd {LEXEVS_HOME}/admin

For Windows installation use the following command:

LoadOWL.bat –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl

For Linux installation use the following command:

LoadOWL.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl

This should work best with a "by code" type Thesaurus source.

6

Using the LexEVS command line utilities, load the NCI Thesaurus with options:

Change to the LexEVS home directory.

cd {LEXEVS_HOME}/admin

For Windows installation use the following command:

LoadOWL.bat –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl -mf "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_MF.xml"
-lp "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_prefs.xml"

For Linux installation use the following command:

LoadOWL.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_10.10d.owl -mf "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_MF.xml"
-lp "file:///{NCI_THESAURUS_DIRECTORY}/Thesaurus_prefs.xml"


Example output from load of NCI Thesaurus 05.12f

…
[LexBIG] Processing TOP Node... Retired_Kind
[LexBIG] Clearing target of NCI_Thesaurus...
[LexBIG] Writing NCI_Thesaurus to target...
[LexBIG] Finished loading DB - loading transitive expansion table
[LexBIG] ComputeTransitive - Processing Anatomic_Structure_Has_Location
[LexBIG] ComputeTransitive - Processing Anatomic_Structure_is_Physical_Part_of
[LexBIG] ComputeTransitive - Processing Biological_Process_Has_Initiator_Process
[LexBIG] ComputeTransitive - Processing Biological_Process_Has_Result_Biological_Process
[LexBIG] ComputeTransitive - Processing Biological_Process_Is_Part_of_Process
[LexBIG] ComputeTransitive - Processing Conceptual_Part_Of
[LexBIG] ComputeTransitive - Processing Disease_Excludes_Finding
[LexBIG] ComputeTransitive - Processing Disease_Has_Associated_Disease
[LexBIG] ComputeTransitive - Processing Disease_Has_Finding
[LexBIG] ComputeTransitive - Processing Disease_May_Have_Associated_Disease
[LexBIG] ComputeTransitive - Processing Disease_May_Have_Finding
[LexBIG] ComputeTransitive - Processing Gene_Product_Has_Biochemical_Function
[LexBIG] ComputeTransitive - Processing Gene_Product_Has_Chemical_Classification
[LexBIG] ComputeTransitive - Processing Gene_Product_is_Physical_Part_of
[LexBIG] ComputeTransitive - Processing hasSubtype
[LexBIG] Finished building transitive expansion - building index
[LexBIG] Getting a results from sql (a page if using mysql)
[LexBIG] Indexed 0 concepts.
[LexBIG] Indexed 5000 concepts.
[LexBIG] Indexed 10000 concepts.
[LexBIG] Indexed 15000 concepts.
[LexBIG] Indexed 20000 concepts.
[LexBIG] Indexed 25000 concepts.
[LexBIG] Indexed 30000 concepts.
[LexBIG] Indexed 35000 concepts.
[LexBIG] Indexed 40000 concepts.
[LexBIG] Indexed 45000 concepts.
[LexBIG] Indexed 46000 concepts.
[LexBIG] Getting a results from sql (a page if using mysql)
[LexBIG] Closing Indexes Mon, 27 Feb 2006 01:44:22
[LexBIG] Finished indexing

NCI Metathesaurus

Loading the Metathesaurus

This section describes the steps to download and install a full version of the NCI Metathesaurus for the LexEVS Service.

Note

NCI Metathesaurus contains many individual vocabularies some of which are large vocabularies in and of themselves. It requires many hours to load and index. It can require 36 hours on a multiprocessor machine with 6g plus memory. The total time to load NCI MetaThesaurus will vary depending on machine, memory, and disk speed. Because this loader uses a batch loading strategy it is less dependent on memory, but some users will see 3 or 4 day load times with average multiprocessor processing power.

Step

Action

1

Using a web or ftp client go to the URL: ftp://ftp1.nci.nih.gov/pub/cacore/EVS/
screenshot of FTP directory

2

Select the version of NCI Metathesaurus RRF you wish to download. Save the file to a directory on your machine.

3

Extract the RRF files from the zip download and save in a directory on your machine. This directory will be referred to as NCI_METATHESAURUS_DIRECTORY. RELASE_INFO.RRF is required to be present for the load utility to work.

4

Using the LexEVS utilities load the NCI Thesaurus:

{LEXEVS_HOME}/admin

For Windows installation use the following command:

LoadMetaBatch.bat –in "file:///{NCI_METATHESAURUS_DIRECTORY}/"

For Linux installation use the following command:

LoadMetaBatch.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/"

Resuming Loads

Since this loader is resource hungry we provide the option to restart should you find your resource settings to be inadequate. Resuming loads which have crashed or been interrupted by server problems is possible using the ResumeBatchLoad script set.

Step

Action

1

Using the LexEVS utilities load the NCI Thesaurus:

{LEXEVS_HOME}/admin

For Windows installation use the following command:

ResumeMetaBatch.bat –in "file:///{NCI_METATHESAURUS_DIRECTORY}/" -s "NCI Metathesaurus" -uri "urn:oid:2.16.840.1.113883.3.26.1.2" -version "200601"

For Linux installation use the following command:

ResumeMetaBatch.sh –in "file:///{NCI_THESAURUS_DIRECTORY}/" -s "NCI Metathesaurus" -uri "urn:oid:2.16.840.1.113883.3.26.1.2" -version "200601"

NCI History

This section describes the steps to download and install a history file for NCI Thesaurus.

Step

Action

1

Using a web or ftp client go to the URL: ftp://ftp1.nci.nih.gov/pub/cacore/EVS/

2

Select the version of NCI History you wish to download. Save the file to a directory on your machine. Select the VersionFile download to the same directory as the history file.

3

Extract the History files from the zip download and save in a directory on your machine. This directory will be referred to as NCI_HISTORY_DIRECTORY

4

Using the LexEVS utilities load the NCI Thesaurus:

cd {LEXEVS_HOME}/admin

For Windows installation use the following command:

LoadNCIHistory.bat –nf –in "file:///{NCI_HISTORY_DIRECTORY}" –vf “file:///NCI_HISTORY_DIRECTORY}/VersionFile”

For Linux installation use the following command:

LoadNCIHistory.sh –nf –in "file:///{NCI_HISTORY_DIRECTORY}" –vf “file:///NCI_HISTORY_DIRECTORY}/VersionFile”

Note

If a 'releaseId' occurs twice in the file, the last occurrence will be stored. If LexEVS already knows about a releaseId (from a previous history load), the information is updated to match what is provided in the file.

This file has to be provided to the load API on every load because you will need to maintain it in the future as each new release is made. We have created this file that should be valid as of today from the information that we found in the archive folder on your ftp server. You can find this file in the 'resources' directory of the LexEVS install.

  • No labels