Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Child pages
  • cTAKES 2.5 Developer Install Instructions
Skip to end of metadata
Go to start of metadata

These are instructions for installation of cTAKES 2.5 for developers. NOTE: For the latest version of cTAKES, see Apache cTAKES (incubating).

With these instructions you can set up your development environment with cTAKES code, change or extend the code, compile the code, and deploy. If you simply want to be a user of the software, refer to the cTAKES 2.5 User Install Instructions.

Knowledge about what the cTAKES components do is not supplied by the install instructions. This is found in the cTAKES 2.5 Component Use Guide. There is no training or documentation (except for code comments) on the code itself. You must familiarize yourself with the components and then study the code on your own to be able to extend it.

In order to modify the source code for a cTAKES component, developers must utilize either an IDE, such as Eclipse, or another editor of your choice. Compiles are then performed in the IDE or with Ant. Follow the appropriate sections here depending upon your developer preferences.

Once you have compiled the code you can process documents with the cTAKES components. The documents upon which you can run cTAKES will take many forms. An example of doing this is covered in the Processing Documents section.


Preparing Java



1. All forms of development require the Java SDK 1.6+. You can get it from

Make sure you get the proper version and install the SDK not just the Java Runtime environment.

To check if you have the SDK, look in the lib directory of the Java install and see if the file tools.jar is there. If there is a lib directory and there is a file by that name then you have the SDK.

To check if you have the proper version. Enter the command:
Windows and Linux:

java -version

on any command line to see what version you have now.

C:\>java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

2. It is possible that some commands and programs can find the Java runtime that you want to be used but it is best to set the JAVA_HOME environment variable. Set the value of JAVA_HOME to the absolute path of the root of the Java Runtime environment that you want UIMA to use.
Right-click on Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series.

export JAVA_HOME=<path>

screenshot illustrating step

Preparing Eclipse

If you are going to use Eclipse for development then follow these instructions.



1. Download and install Eclipse if you don't already have it.

(optional) It is recommended that you start a new workspace to keep your cTAKES projects separate from other work.

No example

2. Find UIMA Eclipse plug-ins.

Help > Install New Software... > Add... button
Set a repository name and this site location

Click OK.

screenshot illustrating step

3. Install UIMA Eclipse plug-ins.

Select the UIMA Eclipse tooling and runtime support.
Click Next > and finish the install.

Additional help can be found on the Apache site UIMA for Eclipse on


These install instructions depend upon the installation of these plug-ins.

(optional) Verify the installation of the UIMA Plug-ins. Go to Help > About Eclipse > Installation Details > Plug-ins. You will see a dialog such as that on the right (next cell) with plug-in names starting with "UIMA Eclipse:".

Preparing Command Line Tools

If you are going to use command line only to compile then you will need these tools.



1. Navigate to the Ant download site on and install Ant 1.7.1+

screenshot illustrating step

2. Download Ant 1.7.1+. Unzip the file you downloaded to a local directory. We will call this <ANT_HOME> Follow the instructions for installing Ant on This will include changing the PATH and ANT_HOME environment variables.

screenshot illustrating step

screenshot illustrating step

Compile the latest stable release in Eclipse

A tested and stable release is delivered as a ZIP file. The cTAKES source code is included in the file as well as UIMA.



1. Navigate to the source downloads for a released version on

NOTE: For the latest version of cTAKES, see Apache cTAKES (incubating) and follow the install instructions there.

Even without the full LVG data, cTAKES is about 790 MB when compressed.

2. Download the latest version.

Select the directory for the latest version and download the ZIP file.

screenshot illustrating step

3. Unzip the file you downloaded into a directory that you want to be the cTAKES install location. It will expand to about 1.4GB.




This folder we will call <cTAKES_HOME>. You will need to refer to the directory later.

screenshot illustrating step

4. Set UIMA_HOME. UIMA requires a special environment variable for its commands to run.

Use the absolute path to the <cTAKES_HOME> directory in the previous step as the value.

Right-click on Computer > Properties > Advanced > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series.
Use the command export UIMA_HOME=<cTAKES_HOME> for example:

export UIMA_HOME=/usr/bin/cTAKES-2.5

screenshot illustrating step

5. Edit PATH. This will be used for any command line access to binaries.
Right-click on Computer > Properties > Advanced > Environment Variables button. Edit the Path environment variable adding ;<cTAKES_HOME>\bin to the end. Keep clicking OK until you are out of the dialog series.
Use the command export PATH=%PATH%:<cTAKES_HOME>/bin for example

export PATH=%PATH%:/usr/bin/cTAKES-2.5/bin

screenshot illustrating step

6. In Eclipse use File > New > Java Project ...

Uncheck Use default location and navigate to <cTAKES_HOME> for location Click Next> Click Finish.

screenshot illustrating step

7. Remove unnecessary JAR files.

Go to Project > Properties > Java Build Path > Libraries. Select all the entries except the JRE System Library and click the Remove button.

screenshot illustrating step

8. Add cTAKES folders as class resources.

Select Add Class Folder. Check the two class folders cTAKESdesc and resources. Click OK.

screenshot illustrating step

9. Add JAR files from <cTAKES_HOME>/lib.

Select Add Library button > User Library list item > Next > button > User Libraries button > New button. Name the new user library ctakes2.5lib and click OK. Click the Add JARs... button and Navigate to <cTAKES_HOME>\lib. Select all the JAR files and click Open. Click OK. Click Finish.

screenshot illustrating step

10. Close the User Libraries dialog.

Click OK. Your Package Explorer should look something like this (next cell).

screenshot illustrating step

screenshot illustrating step

11. If you have Eclipse set to build automatically it will do so and you may continue to run and debug from Eclipse.

You can also setup Eclipse to run Ant builds using the Ant files shipped as seen on the right (next cell).

screenshot illustrating step

Compile with commands only

The UIMA command to generate the type system through the command line (JCasGen) is not shipped with cTAKES at this time. The common type system has already been generated for you. If you need, for some reason, to generate this then you will need to use the method described for Eclipse or install the entire UIMA SDK.



1. Follow the first steps of the "Compile the latest stable release in Eclipse" (which do not require Eclipse) until you get to the part about creating a new project in Eclipse.

No example

2. Obtain the relevant build.xml file from SVN, placing it into <cTAKES_HOME>

It can be found at
Be aware there are multiple build.xml files in SVN. be sure to use the one listed above

No example

3. To compile cTAKES, change to the <cTAKES_HOME> directory and simply run:


No example

Compile a development release from SVN in Eclipse

If you know what you are doing with the cTAKES code and you must get the latest code currently under development (potentially unstable), then you need to use an SVN connection to retrieve the code.



1. Install subversion or a suitable plug-in for your IDE.

No example

2. Check-out the code to a local directory, such as:


This folder we will call <cTAKES_HOME>. You will need to refer to the directory later.

Pre-release versions are available from:

svn checkout

If you are checking out via Subclipse in Eclipse, make sure to check out each project separately.

A cTAKES\PAD term spotter\desc\type_system
A cTAKES\PAD term spotter\desc\type_system\PADSiteAndTerm.xml
A cTAKES\PAD term spotter\desc\analysis_engine
A cTAKES\PAD term spotter\desc\analysis_engine\Radiology_TermSpotterAnnotatorTAEStyleMap.xml
A cTAKES\PAD term spotter\desc\analysis_engine\Radiology_TermSpotterAnnotatorTAE.xml
A cTAKES\PAD term spotter\desc\analysis_engine\DxStatusAnnotator.xml
A cTAKES\PAD term spotter\desc\analysis_engine\NegationDxAnnotator.xml
A cTAKES\PAD term spotter\desc\analysis_engine\PAD_Hit.xml
A cTAKES\PAD term spotter\desc\analysis_engine\SubSectionBoundaryAnnotator.xml
A cTAKES\PAD term spotter\desc\analysis_engine\Radiology_DictionaryLookupCSVAnnotator.xml
A cTAKES\PAD term spotter\desc\collection_processing_engine
A cTAKES\PAD term spotter\desc\collection_processing_engine\Radiology_sample.xml
A cTAKES\PAD term spotter\desc\collection_reader
A cTAKES\PAD term spotter\desc\collection_reader\RadiologyRecordsCollectionReader.xml
A cTAKES\PAD term spotter\.settings
A cTAKES\PAD term spotter\.settings\org.eclipse.jdt.ui.prefs
Checked out revision 667.

3. In Eclipse use File > Import... > General > Existing Projects into Workspace

Use your local directory for the root directory. Leave all of the projects selected and click Finish.

It may be necessary to import each project one by one!

screenshot illustrating step

4. Install UIMA 2.4. Make note of UIMA_HOME.

No example

5. Add UIMA JARs to the build path configuration in Eclipse. Add all JARs from:

  • jVinci.jar
  • uima-adapter-soap.jar
  • uima-adapter-vinci.jar
  • uima-core.jar
  • uima-cpe.jar
  • uima-documentation-annotation.jar
  • uima-examples.jar
  • uima-bootstrap.jar
  • uima-tools.jar

6. Generate the common type system.

A bug in the ordering and generation currently requires that you go to your project properties > Java Build Path > Order and Export and move the auto_generatesrc directory to the top of the list.

Right-click on /cTAKESdesc/typesysytem/common_type_system.xml and select Open With > Component Descriptor Editor.
Click the Type System tab then the JCasGen button (in the center).

screenshot illustrating step

Process documents using cTAKES

You can now launch or debug the cTAKES components that you have built. You could run commands from a command prompt, as found in the user install instructions, but you can launch them from within Eclipse now instead. Launching the CAS Visual Debugger (CVD) and the Collection processing engine (CPE) from Eclipse is as simple as following this step.



1. In Eclipse, launch the tools using their main program.

Find in the Eclipse project: src -> edu.mayo.bmi.ctakes.main > or Then use the Run menu to run or debug as desired. Using the tools does not change from what is documented in the user install instructions.

screenshot illustrating step

Next Steps

The cTAKES 2.5 Component Use Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider dictionaries and models.


Bundled UMLS Dictionaries

cTAKES includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.

  • An rxnorm_index database (a Lucene index) containing drug names from RxNorm
  • A UMLS database (using two hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)

To use them, you must have a UMLS username and password, and an Internet connection.


If you do not have a UMLS username and password, you may request one at UMLS Terminology Services

In order to use the UMLS dictionaries shipped with cTAKES you will need to do two things:

(1) Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files with your UMLS username and password.

  • Dictionary Lookup: <cTAKES_HOME>/cTAKESdesc/lookupdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
  • (optional) Drug NER: <cTAKES_HOME>/cTAKESdesc/drugnerdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml

The following shows where in the files you would make the changes. (Do not change the <configurationParameters> by the same name.)


(2) Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has provided duplicates of shipped Analysis Engine descriptors, put UMLS in the name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these components:

  • Dictionary Lookup
  • Clinical Documents pipeline
  • Drug NER
  • Side Effect

So you simply need to switch to using those descriptors. For example, if you were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you would switch to using AggregateCdaUMLSProcessor.xml instead and you will now hook into the complete dictionaries.

You can, of course, modify your own aggregate Analysis Engine files and place the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
Since this is an in-memory database implementation, please be patient during the initial load as it could take approximately 20-30 seconds for the database to initialize.

If you would like to go back to using the small sample dictionaries that do not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is not in the file name) Analyis Engine descriptor in your aggregate. Just removing your password from the DictionaryLookupAnnotatorUMLS.xml files will not switch you back to the small sample dictionaries.


We have successfully tested the 2008 release of the full LVG data. In order to use this release of the full LVG data you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps, 1) to uncompress and 2) to unzip.
  3. Replace the directory <cTAKES_HOME>/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <cTAKES_HOME>/resources/lvgresources/lvg/data/config/ file, replacing "lvg2008" with the name of the new release.

Building Your Own Dictionaries

To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:


Some models included in cTAKES may not represent your data distribution well. If you want to build or train your own models, please read the cTAKES 2.5 Component Use Guide, particularly:

  • Training a sentence detector model
  • Training a Part of Speech (POS) tagger model (Building a model Obtaining training data)
  • Creating a Part of Speech (POS) tag dictionary (Building a tag dictionary)
  • Training a chunker model (Building a model - Prepare GENIA training data)
  • Training a dependency parser (Dependency Parser)