Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

cTAKES 1.3.2 Developer Install Instructions

Contents of this Page

These instructions for installation of icTAKES 1.3.2 are for developers. With these instructions you can set up your development environment with cTAKES code then change or extend the code, compile and deploy. If you simply want to be a user of the software, refer to cTAKES 1.3.2 User Install Instructions.

Once you have completed this installation you will have all the source code and be able to compile and deploy it as needed. Knowledge of what the components do is not supplied by the install instructions. This is found in the cTAKES 1.3 User Guide. There is no training or documentation (except for code comments) on the code itself. You must familiarize yourself with the components and then study the code on your own to be able to extend it.

In order to modify the source code for a cTAKES component, developers must download the code. Then you can utilize either an IDE, such as Eclipse, to do this or another editor of your choice. Compiles are then performed in Eclipse or with Ant (using a command line).
Follow the appropriate sections here depending upon your developer preferences.

Prerequisites

In order to complete these instructions you will need the following:

  • Sun's distribution of the Java JDK version 1.6+
  • Eclipse plug-ins (if you want to compile via Eclipse)
  • Ant 1.7.1+ (if you want to compile via the command line)

Step

Example

1. Install the JDK (not the runtime environment) of Java 1.6+. This software can be downloaded from java.com.

You need two things here. One is the proper version and the other is the SDK not just the Java Runtime environment.

To check if you have the SDK, look in the lib directory of the Java install and see if the file tools.jar is there. If there is a lib directory and there is a file by that name then you have the SDK.

To check if you have the proper version. Enter the command:
Windows and Linux: java -version on any command line to see what version you have now.

C:\>java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

2. It is possible that some commands and programs can find the Java runtime that you want to be used but it is best to set the JAVA_HOME environment variable. Set the value of JAVA_HOME to the absolute path of the root of the Java Runtime environment that you want UIMA to use. On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series. On Linux use the command
export JAVA_HOME=<path>

screenshot illustrating step

3. (for developers using Eclipse) Install Eclipse and plug-ins. This documentation is not here. You must follow the install instructions provided with UIMA for Eclipse on apache.org. Then come back here.

Note

There are UIMA plug-ins that need to be installed. Do not skip the installation of these plug-ins after you install Eclipse.

The instructions above will guide to the UIMA plug-ins hosted on apache.org. In order to see if you have the plug-ins, go to Help > About Eclipse > Installation Details > Plug-ins. You will see a dialog such as that on the right (next cell).

screenshot illustrating step

4. (for developers using command line compile) Navigate to the Ant download site on apache.org

screenshot illustrating step

5. (for developers using command line compile) Download Ant 1.7.1 or later. Unzip the compressed file you downloaded. We will call this <ANT_HOME> Follow the instructions for installing Ant on apache.org. This will include changing the PATH and ANT_HOME environment variables.

Tip

If you will not be using Eclipse but still compiling source code from a command line, that is when you would need to install Ant.

screenshot illustrating step

screenshot illustrating step

The documents upon which you can run cTAKES will take many forms. An example of doing this is covered in the Testing section.

Install icTAKES

Installation now requires simply a download and unzip. icTAKES is an initiative at the Mayo NLP program to make Mayo cTAKES easy to use for end users.

Since icTAKES is an open source tool you can get the version that is currently in development through SVN. This is not recommended unless you know what you are doing. In order to get the latest, stable release follow the directions here. When you download icTAKES the source code is already available in it.

Step

Example

1. Navigate to the source downloads for a released version on SourceForge.net

icTAKES is about 180MB

2. Download the latest version.

Select the file to download based on your operating system: Windows: Download icTAKES.zip file; Linux: Download icTAKES.tar.gz fileSave the file to a temporary location on your machine.

screenshot illustrating step

3. Unzip (extract the contents of) the compressed file you downloaded into a directory that you want to be the icTAKES (cTAKES1.3.2) install location. For example, Windows: c:\cTAKES-1.3.2; Linux: /usr/bin/cTAKES-1.3.2 This folder we will call <icTAKES_HOME>. You will need to refer to the directory later.

screenshot illustrating step

4. Set UIMA_HOME. UIMA requires a special environment variable for its commands to run.

Use UIMA_HOME for the name of the variable and the absolute path to the <icTAKES_HOME> directory in the previous step as the value. On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series. On Linux use the commandexport UIMA_HOME=<icTAKES_HOME> For example: export UIMA_HOME=/usr/bin/cTAKES-1.3.2

screenshot illustrating step

5. Edit PATH. This will be used for any command line access to binaries. On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button. Edit the Path environment variable adding ;<icTAKES_HOME>\bin to the end. Keep clicking OK until you are out of the dialog series. On Linux use the command export PATH=%PATH%:<icTAKES_HOME>/bin Be sure to substitute the actual icTAKES_HOME directory.

screenshot illustrating step

Eclipse

These instructions require the UIMA plug-ins. This was part of the prerequisites at the start of these instructions.

Step

Example

1. (optional) It is recommended that you start a new workspace to keep your cTAKES projects separate from other work.

No example

2. In Eclipse use File > New > Java Project ...

Uncheck Use default location and navigate to <icTAKES_HOME> for location Click Next> Click Finish.

screenshot illustrating step
screenshot illustrating step

3. Remove unnecessary JAR files.

Go to Project > Properties > Java Build Path > Libraries. Select all the entries except the JRE System Library and click the Remove button.

screenshot illustrating step

4. Add icTAKES folders as class resources.

Select Add Class Folder. Check the two class folders cTAKESdesc and resources. Click OK.

screenshot illustrating step

5. Add JAR files from <icTAKES_HOME>/lib.

Select Add Library button > User Library list item > Next > button > User Libraries button > New button. Name the new user library ctakes1.3.2lib and click OK. Click the Add JARs... button and Navigate to the lib directory inside <icTAKES_HOME> Select all the JAR files and click Open. Click OK. Click Finish.

screenshot illustrating step

6. Close the User Libraries dialog.

screenshot illustrating step

screenshot illustrating step

7. If you have Eclipse set to build automatically it will do so and you may continue to run and debug from Eclispe.

You can also setup Eclipse to run ant builds using the ant files shipped as seen on the right (next cell). Build icTAKES in Eclipse using 'ant' or in '<icTAKES_HOME>/lib', type 'ant'.

screenshot illustrating step

SVN

If you know what you are doing with the icTAKES code and you must get the latest code currently under development, then you need to use an SVN connection to retrieve the code. The pre-release versions are available from the SVN code repository on SourceForge.net

If you checked out source files from the SVN repository, you will need to generate the type system. To generate the type system from Eclipse:

  1. Select this file in the Package Explorer: <icTAKES_HOME>/cTAKESdesc/typesysytem/cTAKESTypes.xml
  2. Right click on the file > Open With > Component Descriptor Editor
  3. Click the tab Type System
  4. Click the JCasGen button (in the center)
  5. Click Project > Build unless you have Build automatically already selected in the Projects menu

The UIMA command to generate the type system through the command line (jcasgen) is not shipped with icTAKES at this time. Please use Eclipse for this portion until a future release when this is added to icTAKES.

Command line

To compile icTAKES, change to the icTAKES_HOME directory and simply run:
ant

Process documents using icTAKES

You can now launch or debug the cTAKES components that you have built. You could run commands from a command prompt, as found in the user install instructions, but you can launch them from within Eclipse now instead. Launching the CAS Visual Debugger (CVD) and the Collection processing engine (CPE) from Eclipse is as simple as following this step.

Step

Example

1. In Eclipse, launch the tools using their main program.

Find in the Eclipse project: src -> edu.mayo.bmi.ctakes.main > cTAKESCPEGUI.java or cTAKESCVDGUI.java. Then use the Run menu to run or debug as desired. Using the tools does not change from what is documented in the user install instructions.

screenshot illustrating step

Next Steps

The cTAKES 1.3 User Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider dictionaries and models.

Dictionaries

Bundled UMLS Dictionaries

cTAKES 1.3 includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.

  • An rxnorm_index database (a Lucene index) containing drug names from RxNorm
  • A UMLS database (using two hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)

To use them, you must have a UMLS username and password, and an Internet connection.

Note

If you do not have a UMLS username and password, you may request one at UMLS Terminology Services

In order to use the complete UMLS dictionaries shipped with cTAKES you will need to do two things:
(1) Update the DictionaryLookupAnnotatorUMLS.xml Analysis Engine file with your UMLS username and password. Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files above with your UMLS username and password.

  • Dictionary Lookup: <cTAKES_HOME>/cTAKESdesc/lookupdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
  • (optional) Drug NER: <cTAKES_HOME>/cTAKESdesc/drugnerdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml

The following shows where in the files you would make the changes. (Do not change the <configurationParameters> by the same name.)

{2) Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has provided duplicates of shipped Analysis Engine descriptors, put UMLS in the name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these components:

  • Dictionary Lookup
  • Clinical Documents pipeline
  • Drug NER
  • Side Effect
    So you simply need to switch to using those descriptors. For example, if you were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you would switch to using AggregateCdaUMLSProcessor.xml instead and you will now hook into the complete dictionaries.

You can, of course, modify your own aggregate Analysis Engine files and place the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
Since this is an in-memory database implementation, please be patient during the initial load as it could take approximately 20-30 seconds for the database to initialize.

If you would like to go back to using the small sample dictionaries that do not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is not in the file name) Analysis Engine descriptor in your aggregate. Removing your password from the DictionaryLookupAnnotatorUMLS.xml files will not work.

LVG

We have successfully tested the 2008 release of the full LVG data. In order to use this release of the full LVG data you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps 1) to uncompress and 2) to unzip.
  3. Replace the directory <icTAKES_HOME>/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <icTAKES_HOME>/resources/lvgresources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Building Your Own Dictionaries

To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:

Models

Some models included in cTAKES may not represent your data distribution well. Ff you want to build or train your own models, please read cTAKES 1.3 User Guide, particularly:

  • Training a sentence detector model
  • Building a Parts of Sentence (POS) tagger model (Building a model Obtaining training data)
  • Building a Parts of Sentence (POS) tag dictionary (Building a tag dictionary)
  • Building a chunker model (Building a model Prepare GENIA training data)
  • Training a dependency parser (Dependency Parser (optional)