Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

cTAKES 2.0 User Install Instructions

Contents of this Page

These instructions are for end users. With these instructions you can install cTAKES, configure it, and use it to process text (typically text associated with a medical record). If you were planning to expand, change, or modify the code behind cTAKES, refer to the cTAKES 2.0 Developer Install Instructions.

These instructions will cover installation and a test of the main product including trained models for sentence detection, tagging parts of speech, sample dictionaries, a small subset of the full LVG resource, etc. Optional components will also be described. If you do not want to utilize these components you can skip those sections.

Once you have finished installation of cTAKES, you will be able to see what cTAKES is capable of. Further exploitation of the software's ability may require following a few additional steps involving what dictionaries are being used. These are the last steps in these instructions.

Prerequisites

Step

Example

1. Make sure you have Java 1.6 or higher. Most systems come with Java already installed. You simply need to check.
Run this command to check your version.

If you do not you can install Java from java.com.

Install cTAKES

Step

Example

1. Navigate to the source downloads for a released version on SourceForge


2. Download the cTAKES-2.0.zip file.
Save the file to a temporary location on your machine.

screenshot illustrating step

3. Unzip (extract the contents of) the compressed file you downloaded into a directory that you want to be the cTAKES install location.
For example, Windows:


Linux:


This folder we will call <cTAKES_HOME>. You will need to refer to the directory later.

screenshot illustrating step |

Process documents using cTAKES

This version allows you to test most components bundled in cTAKES in two different ways:

  1. Using cTAKES CAS Visual Debugger (CVD) to view the results stored as XCAS files or run the annotators or
  2. Using cTAKES collection processing engine (CPE) to process documents in cTAKES_HOME/testdata directory

CAS Visual Debugger (CVD)

Step

Example

1. Open a command prompt and change to the cTAKES_HOME directory.
Windows:


Linux:

Note

cTAKES_HOME must be your current directory unless you are skilled at setting paths on your machine.

2. Start the CAS Visual Debugger by running this command:
Windows:


Linux:


The application may take a minute to start on slower hardware.

screenshot illustrating step

3. An analysis engine (AE) needs to be loaded in order to process text.
Use the Run -> Load AE menu bar command. Navigate to the file

Click Open.

screenshot illustrating step

4. Copy the text in the example at the right (next cell) and paste the contents into the Text section of CVD, replacing the text that is already there.
This example file can also be found in test data:

Dr. Nutritious

Medical Nutrition Therapy for Hyperlipidemia

Referral from: Julie Tester, RD, LD, CNSD
Phone contact: (555) 555-1212
Height: 144 cm Current Weight: 45 kg Date of current weight: 02-29-2001
Admit Weight: 53 kg BMI: 18 kg/m2
Diet: General
Daily Calorie needs (kcals): 1500 calories, assessed as HB + 20% for activity.
Daily Protein needs: 40 grams, assessed as 1.0 g/kg.
Pt has been on a 3-day calorie count and has had an average intake of 1100 calories.
She was instructed to drink 2-3 cans of liquid supplement to help promote weight gain.
She agrees with the plan and has my number for further assessment. May want a Resting
Metabolic Rate as well. She takes an aspirin a day for knee pain.

3. From the menu bar, click Run -> Run AggregatePlaintextProcessor.

You'll get a list of all the annotations in the Analysis Results frame.

screenshot illustrating step

4. Named entities are now recognized in this clinical document. Annotations of MedicationEventMention and EntityMention are created. To find one, in the Analysis Results frame, click on the key in front of:
AnnotationIndex
uima.tcas.Annotation
edu.mayo.bmi.uima.core.type.textsem.IdentifiedAnnotation
edu.mayo.bmi.uima.core.type.textsem.EntityMention 
and
edu.mayo.bmi.uima.core.type.textsem.EventMention
edu.mayo.bmi.uima.core.type.textsem.EventMention.MedicationEventMention
 

Then select edu.mayo.bmi.uima.core.type.textsem.EntityMention or edu.mayo.bmi.uima.core.type.textsem.EventMention.MedicationEventMention.This will show an Annotation Index in the lower frame. Select any annotation in that lower frame and you will see the text discovered in the Text frame on the right. You may close CVD if you wish.

screenshot illustrating step

Collection processing engine (CPE)

Step

Example

1. Open a command prompt and change to the cTAKES_HOME directory:
Windows:


Linux:

Note

Note that cTAKES_HOME must be your current directory unless you are skilled at setting paths on your machine.

2. Start the collection processing engine by running this command:
Windows:


Linux:


The application may take a minute to start on slower hardware.

screenshot illustrating step

3. This will bring up the Collection Processing Engine Configurator. In the Menu bar click File > Open CPE Descriptor

screenshot illustrating step

4. Navigate to the file

Click Open.

screenshot illustrating step

5. Click the Play button (green/blue play arrow near the bottom).

screenshot illustrating step

6. You should see that one document was processed. You did process a collection of documents. In this case the collection only contained one just to show how to do it. Close the results window.

7. Close the CPE application. You may be prompted to save changes. Since this was just a test you may click the No button.

screenshot illustrating step

8. Open a new command prompt and change to the <cTAKES_HOME>

No example.

9. To test the results there is a comparison tool that will help show that the results match expectations with the following syntax:

Where: <First File> is the first file to compare; <Second File> is the second file to compare; <diff-html> is where the results are written to

Copy and paste the example at the right (next cell) which has had our example files already substituted into a command prompt to run. In this case we have shipped an example of what the output should be for you to compare against.

Windows:

Linux:

10. The resulting file will open for you. Look at the comparison to see the annotations resulting from this pipeline.
Windows:

Linux:

screenshot illustrating step

Using the same CVD and CPE programs in the manner described above, you can test all the other components. The analysis engines and collection processing engines shipped with cTAKES for some of the annotators are described in the following table.

Annotator

Description

Abbreviated

Example Analysis Engine (AE)

Example Collection processing Engine (CPE)

Example test data

Clinical Document Pipeline

the complete cTAKES pipeline to obtain majority of cTAKES annotations

cdp

cTAKES_HOME/cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xml

cTAKES_HOME/cTAKESdesc/cdpdesc/collection_processing_engine/test_plaintext.xml

cTAKES_HOME/testdata/cdptest

Chunker

obtain cTAKES chunking annotations

chunker

cTAKES_HOME/cTAKESdesc/chunkerdesc/analysis_engine/ChunkerAggregate.xml

cTAKES_HOME/cTAKESdesc/chunkerdesc/collection_processing_engine/ChunkerCPE.xml

cTAKES_HOME/testdata/chunkertest

Dependency Parser

obtain dependency parsing tree

dp

cTAKES_HOME/cTAKESdesc/dpdesc/analysis_engine/ClearParserTokenizedInfPosAggregate.xml

cTAKES_HOME/cTAKESdesc/dpdesc/collection_processing_engine/ClearParserCPE.xml

cTAKES_HOME/testdata/dptest

Drug NER

the annotator to obtain drug annotations

drugner

cTAKES_HOME/cTAKESdesc/drugnerdesc/analysis_engine/DrugAggregatePlaintextProcesor.xml

cTAKES_HOME/cTAKESdesc/drugnerdesc/collection_processing_engine/DrugNER_PlainText_CPE.xml

cTAKES_HOME/testdata/drugnertest

Dictionary Lookup

mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm

lookup

cTAKES_HOME/cTAKESdesc/lookupdesc/analysis_engine/TestAggregateTAE.xml

cTAKES_HOME/cTAKESdesc/lookupdesc/collection_processing_engine/LookupCPE.xml

cTAKES_HOME/testdata/lookuptest

PAD Term Spotter

identifying terms related to PAD

pad

cTAKES_HOME/cTAKESdesc/paddesc/analysis_engine/Radiology_TermSpotterAnnotatorTAE.xml

cTAKES_HOME/cTAKESdesc/paddesc/collection_processing_engine/Radiology_Sample.xml

cTAKES_HOME/testdata/padtest

Smoking Status

the annotator to obtain document or patient-level smoking status

smoking

cTAKES_HOME/cTAKESdesc/smokingdesc/analysis_engine/SimulatedProdSmokingTAE.xml

cTAKES_HOME/cTAKESdesc/smokingdesc/collection_processing_engine/Sample_SmokingStatus_output_flatfile.xml

cTAKES_HOME/testdata/smokingtest

Side Effect

the annotator to find side effect mentions and sentences from clinical documents

sideeffect

cTAKES_HOME/cTAKESdesc/sideeffectdesc/analysis_engine/SideEffectAggregateTAE.xml

cTAKES_HOME/cTAKESdesc/sideeffectdesc/collection_processing_engine/SideEffectCPE.xml

cTAKES_HOME/testdata/sideeffecttest

Next steps

The cTAKES 2.0 Component Use Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider dictionaries and models.

Dictionaries

Bundled UMLS Dictionaries

cTAKES 2.0 includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.

  • An rxnorm_index database (a Lucene index) containing drug names from RxNorm
  • A UMLS database (using two hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)

To use them, you must have a UMLS username and password, and an Internet connection.

Note

If you do not have a UMLS username and password, you may request one at UMLS Terminology Services

.

In order to use the complete UMLS dictionaries shipped with cTAKES you will need to do two things:

(1) Update the DictionaryLookupAnnotatorUMLS.xml Analysis Engine file with your UMLS username and password. Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files above with your UMLS username and password.

  • Dictionary Lookup: <cTAKES_HOME>/cTAKESdesc/lookupdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
  • (optional) Drug NER: <cTAKES_HOME>/cTAKESdesc/drugnerdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml

The following shows where in the files you would make the changes. (Do not change the <configurationParameters> by the same name.)

(2) Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has provided duplicates of shipped Analysis Engine descriptors, put UMLS in the name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these components:

  • Dictionary Lookup
  • Clinical Documents pipeline
  • Drug NER
  • Side Effect

So you simply need to switch to using those descriptors. For example, if you were using AggregateCdaProcessor.xml in Clinical Documents pipeline you would switch to using AggregateCdaUMLSProcessor.xml instead and you will now hook into the complete dictionaries.

You can, of course, modify your own aggregate Analysis Engine files and place the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
Since this is an in-memory database implementation, please be patient during the initial load as it could take approximately 20-30 seconds for the database to initialize.

If you would like to go back to using the small sample dictionaries that do not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is not in the file name) Analyis Engine descriptor in your aggregate. Removing your password from the DictionaryLookupAnnotatorUMLS.xml files will not work.

LVG

We have successfully tested the 2008 release of the full LVG data. In order to use this release of the full LVG data you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps, 1) to uncompress and 2) to unzip.
  3. Replace the directory <cTAKES_HOME>/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <cTAKES_HOME>/resources/lvgresources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Building Your Own Dictionaries

To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:

Models

Some models included in cTAKES may not represent your data distribution well. If you want to build or train your own models, please read the cTAKES 2.0 Component Use Guide, particularly:

  • Training a sentence detector model
  • Building a Parts of Sentence (POS) tagger model (Building a model Obtaining training data)
  • Building a Parts of Sentence (POS) tag dictionary (Building a tag dictionary)
  • Building a chunker model (Building a model Prepare GENIA training data)
  • Training a dependency parser (Dependency Parser (optional))