Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

cTAKES 1.3.1 User Install Instructions

Contents of this Page

These instructions are for end users. With these instructions you can install cTAKES, configure it, and use it to process text (typically text associated with a medical record). If you were planning to expand, change, or modify the code behind cTAKES, refer to cTAKES 1.3.1 Developer Install Instructions.

These instructions will cover installation and a test of the main product including trained models for sentence detection and tagging parts of speech, sample dictionaries, and a small subset of the full LVG resource. Optional components will also be described. If you do not want the optional components you can skip that section.

Once you have completed the install of cTAKES itself, you will be able to see what cTAKES is capable of. Further exploitation of the software's ability will require following additional steps involving what dictionaries are being used. These are the last steps in these instructions.

Prerequisites

Before you start the installation of cTAKES, there are a few things you will need. The instructions in this section guide you through the prerequisites:

  • Ability to run commands on a command line
  • Java VM version 1.5+
  • Apache UIMA 2.3.1+

Step

Example

1. Open a command prompt window.

No example

2. Make sure you have the proper version of Java. Most systems come with Java already installed. You simply need to check if you have the proper version. Enter the following command on any command line to see what version you have now:
Windows and Linux java -version

If you do not have a version greater than or equal to the one specified, then you must go to java.com and install Java.

3. It is possible that some commands and programs can find the Java runtime that you want to be used, but it is best to set the JAVA_HOME environment variable. Set the value of JAVA_HOME to the absolute path of the root of the Java Runtime environment that you want UIMA to use. On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series. On Linux use the command set JAVA_HOME <path>

screenshot illustrating step

4. Navigate to the UIMA Java framework & SDK from Apache UIMA 2.3.1+.

Go to the Apache UIMA Project site

screenshot illustrating step

5. Download the UIMA Java framework & SDK

Select the file to download based on your operating system: On Windows, download the Binary ZIP file. On Linux, download the Binary TAR.GZ file. Save the file to a temporary location on your machine.

screenshot illustrating step

6. Unzip the compressed file you downloaded.

On Windows, launch (double-click) the file and extract the files to a directory like c:\uimaj-2.3.1-bin\apache-uima. On Linux, run the tar command and extract the files to a directory like /usr/bin/uimaj-2.3.1-bin/apache-uima

screenshot illustrating step

7. (recommended) Rename the base directory to indicate a cTAKES install. For example: On Windows, rename uimaj-2.3.1-bin cTAKES1.3.1. On Linux, move uimaj-2.3.1-bin cTAKES1.3.1

All of the example commands after this point will use the modified directory name. This root directory we will call <cTAKES_HOME>

screenshot illustrating step

8. Set the UIMA_HOME environment variable. UIMA requires a special environment variable for its commands to run.

Use UIMA_HOME for the name of the variable and the absolute path to the <cTAKES_HOME> directory in the previous step as the value.

On Windows, right-click on My Computer > Properties > Advanced tab > Environment Variables button > New button for System variables. Keep clicking OK until you are out of the dialog series. On Linux use the command set UIMA_HOME <path>

screenshot illustrating step

Note

There is an underscore in the name of the variable. You cannot have spaces in the variable name nor in the path represented by the variable.

9. An environment variable called PATH already exists. Modify that environment variable to add <cTAKES_HOME>/bin on the end of the value. For example, on Windows, ;c:\cTAKSE1.3.1\apache-uima\bin; on Linux, :/usr/bin/cTAKES1.3.1/apache-uima/bin

screenshot illustrating step

Note

Notice there is a semi-colon (Windows) or colon (Linux) between the existing value of the PATH and the directory you are placing on the end.

10. Open a new command prompt (in order to pick up the environment variable changes). In your command prompt change to the cTAKES_HOME directory and run the command to set paths. On Windows, adjustExamplePaths.bat; on Linux, .adjustExamplePaths.sh

screenshot illustrating step

The documents on which you can run cTAKES will take many forms. We will cover an example of doing this in the Testing section.

Install cTAKES

cTAKES comes in the form of Processing Engine ARchive (PEAR) packages or files. The cTAKES packages are deployed into the UIMA framework installed above in the manner described below.

Step

Example

1. Navigate to the source downloads for a released version on SourceForge.

No example

2. Download the latest version. Select the file to download: cTAKES-1.3.1-pear.zip Save the file to a temporary location on your machine.

screenshot illustrating step

3. Unzip the compressed file you downloaded into a temporary directory.
For example, Windows c:\stuff; Linux /tmp

screenshot illustrating step

4. Start the PEAR installer
<cTAKES_HOME>/bin/runPearInstaller.bat

sh, for example, Windows c:\cTAKES1.3.1\apache-uima\bin\runPearInstaller.bat; Linux ./usr/bin/cTAKES1.3.1/apache-uima/bin/runPearInstaller.sh

5. For the PEAR file field click the Browse... button. Navigate to your temporary directory and select the file C:\stuff\cTAKES-1.3.1-pear\core.pear

screenshot illustrating step

6. For the Installation directory field click the Browse Dir... button. Navigate to the <cTAKES_HOME> directory c:\cTAKES1.3.1\apache-uima

screenshot illustrating step

7. Click Install.

The text area will show you a log of what is happening. When the text says, "Installation of core completed" then you can move on to the remainder of the PEAR files.

screenshot illustrating step

8. Repeat the last 3 steps for each of these PEAR files:

  1. core (already done)
  2. document preprocessor
  3. POS tagger
  4. chunker
  5. context dependent tokenizer
  6. dictionary lookup
  7. LVG
  8. NE contexts
  9. clinical documents pipeline
  10. dependency parser (optional)
  11. PAD term spotter (optional)
  12. Drug NER (optional)
  13. smoking status (optional)
  14. SideEffect (optional)
  15. Constituency Parser (optional)
  16. coref-resolver (optional)

Note some of these are optional. We will discuss optional packages in Optional components. If you do not install them now you will need to do it at that time.



Note

The Installation Directory field must be the same for each PEAR file being installed. This should be easy; just don't change it in between clicks of the Install button.

9. Close the PEAR installer application.

No example

10. Copy the cTAKES utilities into cTAKES_HOME.
copy <temp location>/utils <cTAKES_HOME>/utils, for example,
Windows xcopy /e c:\stuff\cTAKES-1.3.1-pear\utils c:\cTAKES1.3.1\apache-uima\utils
Linux cp -r /tmp/cTAKES-1.3.1-pear/utils /usr/bin/cTAKES1.3.1/utils

screenshot illustrating step

Testing

Process one clinical note

In order for you to get a taste of what is going on, a tool is provided which will allow you to enter some text, run the pipeline, and see the results right away. This is not the tool you would use to process documents in a production environment.

Step

Example

1. Run the CAS Visual Debugger command.
cvd.bat

sh -desc <XML file>Where
<XML file> is the PEAR descriptor to use

Starting in <cTAKES_HOME> allows clinical documents pipeline to find the other analysis engines it needs, for example: Windows:
cvd.bat -desc "C:\cTAKES1.3.1\apache-uima\clinical documents pipeline\clinical documents pipeline_pear.xml"
Linux:
cvd.sh -desc '/usr/bin/cTAKES1.3.1/apache-uima/clinical documents pipeline/clinical documents pipeline_pear.xml'

The application may take a minute to start on slower hardware.

2. Copy the text in the example at the right (next cell) and paste the contents into the Text section of CVD, replacing the text that is already there.
This example file can also be found in test data:
<cTAKES_HOME>/clinical documents pipeline/test/data/plaintext/testpatient_plaintext_1.txt

Dr. Nutritious

Medical Nutrition Therapy for Hyperlipidemia

Referral from: Julie Tester, RD, LD, CNSD
Phone contact: (555) 555-1212
Height: 144 cm Current Weight: 45 kg Date of current weight: 02-29-2001
Admit Weight: 53 kg BMI: 18 kg/m2
Diet: General
Daily Calorie needs (kcals): 1500 calories, assessed as HB + 20% for activity.
Daily Protein needs: 40 grams, assessed as 1.0 g/kg.
Pt has been on a 3-day calorie count and has had an average intake of 1100 calories.
She was instructed to drink 2-3 cans of liquid supplement to help promote weight gain.
She agrees with the plan and has my number for further assessment. May want a Resting
Metabolic Rate as well. She takes an aspirin a day for knee pain.

3. From the menu bar, click Run > Run AggregatePlaintextProcessor.

You'll get a list of all the annotations in the Analysis Results frame.

screenshot illustrating step

4. Named entities are now recognized in this clinical document. To find one, in the Analysis Results frame, click on the key in front of:
AnnotationIndex
uima.tcas.Annotation 
edu.mayo.bmi.uima.core.type.IdentifiedAnnotation  
edu.mayo.bmi.uima.core.type.NamedEntity

Then select edu.mayo.bmi.uima.core.type.NamedEntity itself. This will show an Annotation Index in the lower frame. Select any NamedEntity in that frame and you will see the text discovered in the Text frame on the right. Double click the NamedEntity in the lower left frame to see the NamedEntity's attributes. You may close CVD if you wish.

screenshot illustrating step

Process a collection of documents

Obviously, processing text by cutting and pasting into a GUI like the CAS Visual Debugger is not going to be sufficient for processing large numbers of documents. The UIMA framework provides the Collection Processing Engine (CPE) Configurator for processing multiple documents at once. Here we take you through a sample of processing a set of documents.

You will notice that the command to start the CPE Configurator is long. This is because there is no environment variable set which can be used in commands like this. There is also no script provided in this release to launch the software. This function is being considered for a future release. Commands that you run must include the cTAKES components in the classpath. They are included by using the "-cp" parameter on the java command. "-cp" takes a delimited list of values. On Windows, the delimiter is the semicolon. On Linux, it is the colon. If you want to run any of the commands and build them yourself then you need to have the same "-cp" parameter with the same list of delimited values. We will refer to the -cp parameter and its values as the <pipeline-classpath>. When used in this fashion it also means that you must be in the directory where the command resides in order to run the command.

Step

Example

1. Open a command prompt and change to the cTAKES_HOME directory.

For example, Windows: cd \cTAKES1.3.1\apache-uima; Linux: cd /usr/bin/cTAKES1.3.1/apache-uima

screenshot illustrating step

Note

You must change directories here. There is no environment variable you can set that will locate the cTAKES classes for this command. All the cTAKES classes put in the command are relative to cTAKES_HOME.

2. Start the CPE Configurator.
Copy the command at the right and paste it into the command prompt.
The -cp parameter and its values are referred to as the <pipeline-classpath>

Windows: The carets(^) in the command escape the new line characters, hence breaking a long command into multiple lines and allowing you to paste it.


Linux: The back-slash in the command escapes the new line characters, hence breaking a long command into multiple lines and allowing you to paste it.

3. This will bring up the Collection Processing Engine Configurator. In the Menu bar click File > Open CPE Descriptor

screenshot illustrating step

4. Navigate to the example file
<cTAKES_HOME>/clinical documents pipeline/desc/collection_processing_engine/test1.xmland click the Open button.

screenshot illustrating step

5. The input and output directory fields for this CPE are set for loading into Eclipse. Since you are not doing that they must be changed. In this case you need to add a directory to the front of both of those fields. Add clinical documents pipeline\ to the front of the paths so they look like this: clinical documents pipeline\test\data; clinical documents pipeline\test\data\output

screenshot illustrating step

6. Click the Play button (green/blue play arrow near the bottom).

screenshot illustrating step

7. You should see that one document was processed. You did process a collection of documents. In this case the collection only contained one just to show how to do it. Close the results window.

screenshot illustrating step

8. Close the CPE application. You may be prompted to save changes. Since this was just a test you may click the No button.

screenshot illustrating step

9. Open a new command prompt and change to the <cTAKES_HOME>/utils/bin directory

No example

10. To test the results (which you can not see using the CPE) there is a comparison tool that will help show that the results match expectations with the following syntax: java edu.mayo.bmi.utils.xcas_comparison. Compare <First File> <Second File> <diff-html>
Where: <First File> is the first file to compare; <Second File> is the second file to compare; <diff-html> is where the results are written to

Copy and paste the example at the right which has had our example files already substituted into a command prompt to run.

Windows


Linux

11. The resulting file will open for you. Look at the comparison to see the annotations resulting from this pipeline.
Windows: c:\stuff\diff-html.html; *Linux: /tmp/diff-html.html

screenshot illustrating step

Optional components

Optional components may have already been downloaded in the install section. If you choose to skip the optional components during the cTAKES install and you want to install them now, please go back to the install section for instructions on doing so and then return here.

You can test any of the components now just as we did in the section Process one clinical note or Process a collection of documents section using the CVD or CPE. Follow the same steps there but use a test file from any other component. You can launch these from Eclipse or the command line.

Most components will have an analysis engine to load like:
<cTAKES_HOME>/<component name>/desc/analysis_engine/<CVD files>

and a CPE directory like:
<cTAKES_HOME>/<component name>/desc/collection_processing_engine/<CPE files>

For example: Test the dependency parser:
<cTAKES_HOME>/dependency parser/desc/analysis_engine/ClearParserPlaintextAggregate.xml
<cTAKES_HOME>/dependency parser/desc/collection_processing_engine/ClearParserTestCPE.xml
Test Drug NER:
<cTAKES_HOME>/Drug NER/desc/analysis_engine/DrugAggregatePlaintextProcessor.xml
<cTAKES_HOME>/Drug NER/desc/collection_processing_engine/DrugNER_PlainText_CPE.xml

Test smoking status:
<cTAKES_HOME>/smoking status/desc/analysis_engine/SimulatedProdSmokingTAE.xml

Test co-ref resolver:
<cTAKES_HOME>/coref-resolver/desc/analysis_engine/CorefProcessor.xml

Next steps

The cTAKES 1.3 User Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider dictionaries and models.

Dictionaries

Bundled UMLS Dictionaries

cTAKES 1.3 includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.

  • An rxnorm_index database (a Lucene index) containing drug names from RxNorm
  • A UMLS database (using 2 hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)

To use them, you must have a UMLS username and password, and an Internet connection. Note: If you do not have a UMLS username and password, you may request one at UMLS Terminology Services.

In order to use the complete UMLS dictionaries shipped with cTAKES you will need to do two things:

(1) Update the DictionaryLookupAnnotatorUMLS.xml Analysis Engine file with your UMLS username and password. Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files above with your UMLS username and password.

  • Dictionary Lookup: <cTAKES_HOME>/dictionary lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
  • (optional) Drug NER: <cTAKES_HOME>/Drug NER/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml

The following shows where in the files you would make the changes. (Do not change the <configurationParameters> by the same name.)

2. Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has provided duplicates of shipped Analysis Engine descriptors, put UMLS in the name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these components:

  • Dictionary Lookup
  • Clinical Documents pipeline
  • Drug NER
  • Side Effect

So you simply need to switch to using those descriptors. For example, if you were using AggregateCdaProcessor.xml in Clinical Documents pipeline you would switch to using AggregateCdaUMLSProcessor.xml instead and you will now hook into the complete dictionaries.

You can, of course, modify your own aggregate Analysis Engine files and place the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
Since this is an in-memory database implementation, please be patient during the initial load as it could take approximately 20-30 seconds for the database to initialize.

If you would like to go back to using the small sample dictionaries that do not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is not in the file name) Analyis Engine descriptor in your aggregate. Removing your password from the DictionaryLookupAnnotatorUMLS.xml files will not work.

LVG

We have successfully tested the 2008 release of the full LVG data. In order to use this release of the full LVG data you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in 2 steps 1) to uncompress and 2) to unzip.
  3. Replace the directory <cTAKES_HOME>/LVG/resources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <cTAKES_HOME>/LVG/resources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Building Your Own Dictionaries

To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:

Models

Some models included in cTAKES may not represent your data distribution well. If you want to build or train your own models, please read the cTAKES 1.3 User Guide, particularly:

  • Training a sentence detector model
  • Building a Parts of Sentence (POS) tagger model (Building a model Obtaining training data)
  • Building a Parts of Sentence (POS) tag dictionary (Building_a_tag_dictionary)
  • Building a chunker model (Building a model Prepare GENIA training data)
  • Training a dependency parser (Dependency Parser (optional)
Labels
  • None