Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

cTAKES 1.2.1 User Install Instructions

Contents of this Page

These instructions are for end users. With these instructions you can install cTAKES, configure it, and use it to process text (typically text associated with a medical record). If you were planning to expand, change, or modify the code behind cTAKES then go back to the overview and select the developer install instructions desired.

These instructions cover installation and a test of the main product including trained models for sentence detection and tagging parts of speech, sample dictionaries, and a small subset of the full LVG resource. Optional components will also be described. If you do not want the optional components you can skip that section.

Once you have completed installation of cTAKES itself, you will be able to see what cTAKES is capable of. Further exploitation of the software's ability will require additional steps involving what dictionaries are being used. These are found as the last steps in these instructions.

Prerequisites

Before getting started with installation of cTAKES, ensure you have the prerequisites. The instructions in this section guide you through the prerequisites:

  • Ability to run commands on a command line
  • Java VM version 1.5+
  • Apache UIMA 2.3.1+

Step

Example

1. Open a command prompt window.

No example for this step.

2. Make sure you have the proper version of Java. Most systems come with Java already installed. You simply need to check if you have the proper version. Enter the command:
Windows Linux
java -version
on any command line to see what version you have now. If you do not have a version greater than or equal to the one specified then you must get Java and install it.

3. It is possible that some commands and programs can find the Java runtime that you want to be used but it is best to set the JAVA_HOME environment variable. Set the value of JAVA_HOME to the absolute path of the root of the Java Runtime environment that you want UIMA to use.

Windows On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series.
Linux Use the command
set JAVA_HOME <path>

4. Navigate to the UIMA Java framework & SDK from Apache UIMA 2.3.1+.

Go to the Apache UIMA Project site

5. Download the UIMA Java framework & SDK

Select the file to download based on your operating system:
Windows Download the Binary ZIP file.
Linux Download the Binary TAR.GZ fileSave the file to a temporary location on your machine.

screenshot to illustrate step

6. Unzip the compressed file you downloaded.

Windows On Windows, launch (double-click) the file and extract the files to a directory like c:\uimaj-2.3.1-bin\apache-uima
Linux On Linux, run the tar command and extract the files to a directory like /usr/bin/uimaj-2.3.1-bin/apache-uima

screenshot to illustrate step

7. (recommended) Rename the base directory to indicate a cTAKES install. For example:

Windows rename uimaj-2.3.1-bin cTAKES1.2.1
Linux move uimaj-2.3.1-bin cTAKES1.2.1

All of the example commands after this point will use the modified directory name. This root directory we will call <cTAKES_HOME>

screenshot to illustrate step

8. Set the UIMA_HOME environment variable. UIMA requires a special environment variable for its commands to run.

Use UIMA_HOME for the name of the variable and the absolute path to the <cTAKES_HOME> directory in the previous step as the value.

Windows On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series.
Linux On Linux use the command
export UIMA_HOME=<path>

screenshot to illustrate step

Note

Notice the underscore in the name of the variable. You can not have spaces in the variable name nor in the path represented by the variable.

9. An environment variable called PATH already exists. Modify that environment variable to add <cTAKES_HOME>/bin on the end of the value. For example,
Windows ;c:\cTAKSE1.2.1\apache-uima\bin
Linux :/usr/bin/cTAKES1.2.1/apache-uima/bin

screenhsot illustrating step

Note

Notice there is a semi-colon (Windows) or colon (Linux) between the existing value of the PATH and the directory you are placing on the end.

10. Open a new command prompt (in order to pick up the environment variable changes). In your command prompt change to the cTAKES_HOME directory and run the command to set paths.
Windows adjustExamplePaths.bat
Linux  .adjustExamplePaths.sh

screenshot to illustrate step

Additional Information

The documents upon which you can run cTAKES take many forms. An example of doing this is provided in the Testing section.

Install cTAKES

cTAKES comes in the form of Processing Engine ARchive (PEAR) packages or files. The cTAKES packages are deployed into the UIMA framework installed in the steps in the previous section. The steps follow.

Step

Example

1. Navigate to the source downloads for released version

No example for this step

2. Download the latest version. Select the file to download based on your operating system:
Windows Download cTAKES-1.2.1-pear.zip file.
Linux Download cTAKES-1.2.1-pear.tar.gz file

Save the file to a temporary location on your machine.

screenshot to illustrate step

3. Unzip the compressed file you downloaded into a temporary directory, for example:
Windows c:\stuff
Linux  /tmp

screenshot to illustrate step

4. Start the PEAR installer:

for example:
Windows c:\cTAKES1.2.1\apache-uima\bin\runPearInstaller.bat
Linux ./usr/bin/cTAKES1.2.1/apache-uima/bin/runPearInstaller.sh

screenshot to illustrate step

5. For the PEAR file field, click the Browse... button. Navigate to your temporary directory and select the file, C:\stuff\cTAKES-1.2.1-pear\core.pear.

screenshot to illustrate step

6. For the Installation directory field, click the Browse Dir... button. Navigate to the <cTAKES_HOME> directory c:\cTAKES1.2.1\apache-uima

screenshot to illustrate step

7. Click Install.

The text area will show you a log of what is happening. When the text says, "Installation of core completed" then you can move on to the remainder of the PEAR files.

screenshot to illustrate step

8. Repeat the last 3 steps for each of these PEAR files.

Additional Information

Note some of these are optional. Optional packages are discussed in in #Test Optional Components. If you do not install them now you will need to do it at that time.

  1. core (already done)
  2. document preprocessor
  3. POS tagger
  4. chunker
  5. context dependent tokenizer
  6. dictionary lookup
  7. LVG
  8. NE contexts
  9. clinical documents pipeline
  10. dependency parser (optional)
  11. PAD term spotter (optional)
  12. Drug NER (optional)
Note

The Installation Directory field must be the same for each PEAR file being installed. This should be easy, just don't change it in between clicking the Install button.

9. Close the PEAR installer application.

No example for this step

10. Copy the cTAKES utilities into cTAKES_HOME.
copy <temp location>/utils <cTAKES_HOME>/utils, for example:
Windows xcopy /e c:\stuff\cTAKES-1.2.1-pear\utils c:\cTAKES1.2.1\apache-uima\utils
Linux cp -r /tmp/cTAKES-1.2.1-pear/utils /usr/bin/cTAKES1.2.1/utils

screenshot to illustrate step

Testing

Process one clinical note

In order for you to get a taste of what is going on, there is a tool which will allow you to enter some text, run the pipeline, and see the results right away. This is not the tool you would use to process documents in a production environment.

Step

Example

1. Run the CAS Visual Debugger command.

Where:
<XML file> is the pear descriptor to use.

Starting in <cTAKES_HOME> allows clinical documents pipeline to find the other analysis engines it needs, for example:

Windows cvd.bat -desc "C:\cTAKES1.2.1\apache-uima\clinical documents pipeline\clinical documents pipeline_pear.xml"
Linux cvd.sh -desc '/usr/bin/cTAKES1.2.1/apache-uima/clinical documents pipeline/clinical documents pipeline_pear.xml'

The application may take a minute to start on slower hardware.

screenshot to illustrate step

2. Copy the text at the right and paste the contents into the Text section of CVD, replacing the text that is already there.

This example file can also be found in test data:
<cTAKES_HOME>/clinical documents pipeline/test/data/plaintext/testpatient_plaintext_1.txt

3. From the menu bar, click Run -> Run AggregatePlaintextProcessor.

You'll get a list of all the annotations in the Analysis Results frame.

screenshot to illustrate step

4. Named entities are now recognized in this clinical document. To find one, in the Analysis Results frame, click on the key in front of:

  • AnnotationIndex
  • uima.tcas.Annotation
  • edu.mayo.bmi.uima.core.type.IdentifiedAnnotation
  • edu.mayo.bmi.uima.core.type.NamedEntity

    Then select edu.mayo.bmi.uima.core.type.NamedEntity itself. This will show an Annotation Index in the lower frame. Select any NamedEntity in that frame and you will see the text discovered in the Text frame on the right. Double click the NamedEntity in the lower left frame to see the NamedEntity's attributes

screenshot to illustrate step

Process a collection of documents

Obviously, processing text by cutting and pasting into a GUI like the CAS Visual Debugger is not going to be sufficient for processing large numbers of documents. The UIMA framework provides the Collection Processing Engine (CPE) Configurator for processing multiple documents at once. Here we take you through a sample of processing a set of documents.

Note

You will notice that the command to start the CPE Configurator is long. This is because there is no environment variable set which can be used in commands like this. There is also no script provided in this release to launch the software. This function is being considered for a future release. Commands that you run must include the cTAKES components in the classpath. They are included by using the "-cp" parameter on the java command. "-cp" takes a delimited list of values. On Windows, the delimiter is the semicolon. On Linux, it is the colon. If you want to run any of the commands and build them yourself then you need to have the same "-cp" parameter with the same list of delimited values. This guide refers to the -cp parameter and its values as the <pipeline-classpath>. When used in this fashion it also means that you must be in the directory where the command resides in order to run the command.

Step

Example

1. Open a command prompt and change to the cTAKES_HOME directory. For example:
Windows cd \cTAKES1.2.1\apache-uima
Linux cd /usr/bin/cTAKES1.2.1/apache-uima

screenshot to illustrate step

Note for Windows

Notice that you must change directories here. There is no environment variable you can set that will locate the cTAKES classes for this command. All the cTAKES classes put in the command are relative to cTAKES_HOME.

2. Start the CPE Configurator.

Copy the command at the right and paste it into the command prompt.

The -cp parameter and its values are referred to as the <pipeline-classpath>

Additional Information for Windows

The carets(^) in the command escape the new line characters, hence breaking a long command into multiple lines and allowing you to paste it.

Additional Information for Linux

The back-slash () in the command escapes the new line characters, hence breaking a long command into multiple lines and allowing you to paste it.

3. This will bring up the Collection Processing Engine Configurator. In the Menu bar click File Open CPE Descriptor

screehnshot illustrating step

4. Navigate to the example file
<cTAKES_HOME>/clinical documents pipeline/desc/collection_processing_engine/test1.xml and click the Open button.

Screenshot to illustrate step

5. The input and output directory fields for this CPE are set for loading into Eclipse. Since you are not doing that they must be changed. In this case you need to add a directory to the front of both of those fields. Add clinical documents pipeline* to the front of the paths so they look like this: *clinical documents pipeline\test\data
clinical documents pipeline\test\data\output

screenshot to illustrate step

6. Click the Play button (green/blue play arrow near the bottom).

screenshot to illustrate step

7. You should see that one document was processed. You did process a collection of documents. In this case the collection only contained one just to show how to do it. Close the results window.

screenshot to illustrate step

8. Close the CPE application. You may be prompted to save changes. Since this was just a test you may click the No button.

screenshot to illustrate step

9. Open a new command prompt and change to the <cTAKES_HOME>/utils/bin directory

No example for this step

10. To test the results (which you can not see using the CPE) there is a comparison tool that will help show that the results match expectations with the following syntax:
java
edu.mayo.bmi.utils.xcas_comparison.Compare
<First File> <Second File> <diff-html>
Where:
<First File> is the first file to compare, <Second File> is the second file to compare, <diff-html> is where the results are written to

Copy and paste the example at the right which has had our example files already substituted into a command prompt to run.

Windows

Linux

11. The resulting file will open for you. Look at the comparison to see the annotations resulting from this pipeline.
Windows c:\stuff\diff-html.html
Linux /tmp/diff-html.html

screenshot to illustrate step

Optional components

Optional components may have already been downloaded in the steps of the cTAKES install section. If you choose to skip the optional components during the cTAKES install and you want to install them now, go back to the cTAKES install section for instructions on doing so and then return here.

You can test any of the components now using the CVD or CPE, just as in the steps to Process a sample clinical note or Process a collection of documents. Follow the same steps but use a test file from any other component. You can launch these from Eclipse or the command line.
Most components will have an analysis engine to load like:
<cTAKES_HOME>/<component name>/desc/analysis_engine/<CVD files>
and a CPE directory like:
<cTAKES_HOME>/<component name>/desc/collection_processing_engine/<CPE files>

For example: Test the dependency parser:

<cTAKES_HOME>/dependency parser/desc/analysis_engine/ClearParserPlaintextAggregate.xml
<cTAKES_HOME>/dependency parser/desc/collection_processing_engine/ClearParserTestCPE.xml

Test Drug NER:
<cTAKES_HOME>/Drug NER/desc/analysis_engine/DrugAggregatePlaintextProcessor.xml
<cTAKES_HOME>/Drug NER/desc/collection_processing_engine/DrugNER_PlainText_CPE.xml

Next steps

The User Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider that some of the dictionaries that come with cTAKES are small samples. To reduce the size of the initial download for cTAKES, it has been left to you to load larger or different dictionaries once cTAKES is installed.

The components that require special attention and will not work without a real dictionary are:

  • clinical documents pipeline, which contains the original main cTAKES aggregate descriptors (one for CDA and one for plaintext)
  • Drug NER
  • Side Effect

We have successfully tested the 2008 release of the full LVG dictionary. In order to use this release of the full LVG dictionary you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in 2 steps 1) to uncompress and 2) to unzip.
  3. Replace the directory <cTAKES_HOME>/LVG/resources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <cTAKES_HOME>/LVG/resources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Likewise, other large dictionaries are available. To install complete dictionaries for RxNorm, SNOMED-CT, or others that available through the UMLS, see the following posts on the cTAKES forums:

Some models included in cTAKES may not represent your data distribution well, if you want to build your own models, please read the User Guide, particularly the information on:

  • Training a sentence detector model
  • Building a Parts of Sentence (POS) tagger model
  • Building a Parts of Sentence (POS) tag dictionary
  • Building a chunker model
  • Training a dependency parser
Labels
  • None