Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

cTAKES 1.2.2 Developer Install Instructions

Contents of this Page

These installation instructions for icTAKES 1.2.2 are for developers. With these instructions you can set up your development environment with cTAKES code and then change or extend the code, compile and deploy. If you simply want to be a user of the software then go back to the overview and select the User installation instructions desired.

Once you have completed this installation, you will have all the source code and be able to compile and deploy it as needed. Knowledge of what the components do is not supplied by the installation instructions. This is found in the User Guide. There is no training or documentation (except for code comments) on the code itself. You must familiarize yourself with the components and then study the code on your own to be able to extend it.

In order to modify the source code for a cTAKES component, developers must download the code. Then you can utilize either an IDE, such as Eclipse, to do modify the code, or use another editor of your choice. Compiles are then performed in Eclipse or with Ant (using a command line).

Refer to the appropriate sections of this guide depending upon your developer preferences.

Prerequisites

In order to complete these instructions you will need the following:

  • Sun's distribution of the Java JDK version 1.6+
  • Ant 1.7.1+ (if you want to compile via the command line)

Step

Example

1. Install the JDK (not the runtime environment) of Java 1.6+.

This software can be downloaded from the Java site

You need two things: the proper version and the SDK not just the Java Runtime environment.

To check if you have the SDK, look in the lib directory of the Java install and see if the file tools.jar is there. If there is a lib directory and there is a file by that name then you have the SDK.

To check if you have the proper version. Enter the command:
Windows and Linux
java -version on any command line to see what version you have now.

2. It is possible that some commands and programs can find the Java runtime that you want to be used, but it is best to set the JAVA_HOME environment variable. Set the value of JAVA_HOME to the absolute path of the root of the Java Runtime environment that you want UIMA to use.

Windows
On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series.

Linux
On Linux use the command
export JAVA_HOME=<path>

screenshot to illustrate step

3. (for developers using Eclipse) Install Eclipse and plug-ins. This documentation is not here. You must follow the install instructions provided with UIMA for Eclipse. Then come back here.

Note

There are UIMA plug-ins that need to be installed. Do not skip the installation of these plug-ins after you install Eclipse. The preceding instructions will guide you to install the UIMA plug-ins hosted on the Apache site. In order to see if you have the plug-ins, go to Help > About Eclipse > Installation Details > Plug-ins. You will see a dialog such as that in the example screenshot.

screenshot to illustrate step

4. (for developers using command line compile) Navigate to the Ant download site.

screenshot to illustrate step

5. (for developers using command line compile) Download Ant 1.7.1 or later.

Unzip the compressed file you downloaded. We will call this <ANT_HOME>. Follow the instructions for installing Ant. This will include changing the PATH and ANT_HOME environment variables.

Tip

If you will not be using Eclipse but still compiling source code from a command line that is when you would need to install Ant.

screenshot to illustrate step
screenshot to illustrate step

Additional Information

The documents upon which you can run cTAKES will take many forms. An example of doing this is covered in the Testing section.

Install cTAKES 1.2.2

Installation is now simply a download and unzip. icTAKES is an initiative at Mayo NLP program to make Mayo cTAKES easy to use for end users.

Since icTAKES is an open source tool you can get the version that is currently in development through SVN. This is not recommended unless you know what you are doing. In order to get the latest, stable release follow the directions here. When you download icTAKES the source code is already available in it.

Step

Example

1. Navigate to the source downloads for released version on Sourceforge

icTAKES is about 180 MB

2. Download the latest version. Select the file to download based on your operating system:

Windows Download icTAKES.zip file.
Linux Download icTAKES.tar.gz file.

Save the file to a temporary location on your machine.

screenshot to illustrate example

3. Unzip (extract the contents of) the compressed file you downloaded into a directory that you want to be the icTAKES (aka cTAKES1.2.2) install location.
For example:
Windows c:\cTAKES1.2.2
Linux /usr/bin/cTAKES1.2.2

Note

There will be a top level directory within the folder you have selected to extract to. The icTAKES folder we will call <icTAKES_HOME>. You will need to refer to the directory later. For example:
Windows c:\cTAKES1.2.2\icTAKES
Linux /usr/bin/cTAKES1.2.2/icTAKES

screenshot to illustrate step

4. Set UIMA_HOME. UIMA requires a special environment variable for its commands to run.

Use UIMA_HOME for the name of the variable and the absolute path to the <icTAKES_HOME> directory in the previous step as the value.

Windows On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series.

Linux On Linux use the commandexport UIMA_HOME=<icTAKES_HOME> For example: export UIMA_HOME=/usr/bin/cTAKES1.2.2/icTAKES

screenshot to illustrate step

Note

Notice the underscore in the name of the variable. You can not have spaces in the variable name nor in the path represented by the variable.

5. Edit PATH. This will be used for any command line access to binaries.

Find the PATH environment variable and include %UIMA_HOME%\bin in the path.

Windows On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button. Edit the Path environment variable adding ;%UIMA_HOME%\bin to the end. Keep clicking OK until you are out of the dialog series.

Linux On Linux use the command export PATH=%PATH%:%UIMA_HOME%/bin

screenshot to illustrate step

Eclipse

These instructions require the UIMA plug-ins. This was part of the prerequisites at the start of these instructions.

Step

Example

1. In Eclipse use File > New > Java Project ...

Uncheck Use default location and navigate to <icTAKES_HOME>. Click Next > Finish.

screenshot to illustrate step
screenshot to illustrate step

2. Remove unnecessary JAR files.

Go to Project > Properties > Libraries. Select all the entries except the JRE System Library and click the Remove button.

screenshot to illustrate step

3. Add icTAKES folders as class resources.

Select Add Library Folder. Check the two class folders cTAKESdesc and resources. Click OK.

screenshot to illustrate step

4. Add JAR files from <icTAKES_HOME>/lib.

Select Add Library button > User Library list item > Next > button > User Libraries button > New button. Name the new user library ctakes1.2.2lib and click OK.

Click the Add JARs... button and navigate to the lib directory inside <icTAKES_HOME>.

Select all the JAR files and click Open. Click OK. Click Finish.

screenshot to illustrate step

5. Close the User Libraries dialog.

screenshot to illustrate step
screenshot to illustrate step

6. Build icTAKES in Eclipse using 'ant' or in '<icTAKES_HOME>/lib', type 'ant'.

screenshot to illustrate step

SVN

If you know what you are doing with the icTAKES code and you must get the latest code currently under development, then you need to use an SVN connection to retrieve the code. The pre-release versions are available from [SVN code repository:

If you checked out source files from the SVN repository, you will need to generate the type system. To generate the type system from Eclipse:

  1. Select this file in the Package Explorer: <icTAKES_HOME>/cTAKESdesc/typesysytem/cTAKESTypes.xml
  2. Right click on the file > Open With > Component Descriptor Editor.
  3. Click the tab Type System.
  4. Click the JCasGen button (in the center).
  5. Click Project > Build unless you have Build automatically is already selected in the Projects menu.
Additional Information

The UIMA command to generate the type system through the command line (jcasgen) is not shipped with icTAKES at this time. Please use Eclipse for this portion until a future release when this is added to icTAKES.

Command Line

To compile icTAKES, change to the icTAKES_HOME directory and simply run:
ant

Process documents Using icTAKES

You can now launch or debug the cTAKES components that you have built. You could run commands from a command prompt, as found in the user install instructions, but you can launch them from within Eclipse now instead. To launch the CAS Visual Debugger (CVD) and the Collection processing engine (CPE) from Eclipse perform this step.

Step

Example

1. In Eclipse, launch the tools using their main program.

Find in the Eclipse project: src > edu.mayo.bmi.ctakes.main > cTAKESCPEGUI.java or cTAKESCVDGUI.java. Then use the Run menu to run or debug as desired. Using the tools does not change from what is documented in the user install instructions.

screenshot to illustrate step

Next Steps

The cTAKES 1.2 User Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider that some of the dictionaries that come with icTAKES are small samples. It has been left to the user to load larger or different dictionaries.

The components that require special attention and will not work without a real dictionary:

  • clinical documents pipeline, the original main cTAKES aggregate descriptors (one for CDA and one for plaintext)
  • Drug NER
  • Side Effect

For example, we have successfully tested the 2008 release of the full LVG dictionary. In order to use this release of the full LVG dictionary you should:

  1. Download either the full version or the lite version from NIH Lexical Tools.
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in 2 steps 1) to uncompress and 2) to unzip.
  3. Replace the directory <icTAKES_HOME>/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <icTAKES_HOME>/resources/lvgresources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Likewise, other large dictionaries are available. To install complete dictionaries for RxNorm, SNOMED-CT, or others that available through the UMLS, refer to the following posts on the cTAKES forums:

You can also obtain a production dictionary resource by contacting the Mayo Clinic NLP Program by email for a new version of lookupresources to replace <icTAKES_HOME>/resources/lookupresources. You must provide proof of a valid UMLS license.

Note that the production icTAKES with larger dictionaries will have higher demand on hardware such as memory. One way to make more memory available to these tools is to modify

and

Add java -Xms512M -Xmx2000M to the Java command that launches the tool, for example:

Some models included in cTAKES may not represent your data distribution well. If you want to build your own models, please read the User Guide for information about components, particularly the information about the following:

  • Training a sentence detector model
  • Building a Parts of Sentence (POS) tagger model
  • Building a Parts of Sentence (POS) tag dictionary
  • Building a chunker model
  • Training a dependency parser
Labels
  • None