Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

cTAKES 1.2.2 User Install Instructions

Contents of this Page

These icTAKES installation instructions are for end users. With these instructions you can install icTAKES (also called cTAKES 1.2.2), configure it, and use it to process text (typically text associated with a medical record). If you are planning to expand, change, or modify the code behind icTAKES then go back to the overview and select the developer install instructions desired.

These instructions will cover installation and a test of the main product including trained models for sentence detection, tagging parts of speech, sample dictionaries, a small subset of the full LVG resource, etc. Optional components will also be described. If you do not want to utilize these components you can skip that section.

Once you have finished installation of icTAKES, you will be able to see what icTAKES is capable of. Further exploitation of the software's ability will require a few additional steps involving what dictionaries are being used. These are found as the last steps in these instructions.

Prerequisites

Before getting started with the actual installation of icTAKES, you must have:

  • Java VM version 1.6+

Step

Example

1. Make sure you have the proper version of Java. Most systems come with Java already installed. You simply need to check if you have the proper version.

Install icTAKES

Install is now simply a download and unzip. icTAKES is an initiative of Mayo NLP Program to make icTAKES easy to use for end users.

Step

Example

1. Navigate to the source downloads for released version on Soureforge

icTAKES is about 180 MB

2. Download the latest version. Select the file to download based on your operating system.

Windows Download the icTAKES.zip file.
Linux Download the icTAKES.tar.gz fileSave the file to a temporary location on your machine.

screenshot to illustrate step

3. Unzip (extract the contents of) the compressed file you downloaded into a directory that you want to be the icTAKES (also called cTAKES1.2.2) install location. For example:

Windows c:\cTAKES1.2.2
Linux /usr/bin/cTAKES1.2.2

Note

There will be a top level directory within the folder you have selected to extract to. The icTAKES folder we will call <icTAKES_HOME>. You will need to refer to the directory later. For example:
Windows c:\cTAKES1.2.2\icTAKES
Linux /usr/bin/cTAKES1.2.2/icTAKES

Process documents using icTAKES

This version allows you to test most components bundled in icTAKES in two different ways:

  1. Using icTAKES CAS Visual Debugger (CVD) to view the results stored as XCAS files or run the annotators or
  2. Using icTAKES collection processing engine (CPE) to process documents in icTAKES_HOME/testdata directory

Each method is described in the next sets of steps.

CAS Visual Debugger (CVD)

Step

Example

1. Open a command prompt and change to the icTAKES_HOME directory. For example:
Windows cd \cTAKES1.2.2
Linux cd /usr/bin/cTAKES1.2.2

Note

icTAKES_HOME must be your current directory unless you are skilled at setting paths on your machine.

2. Start the CAS Visual Debugger by running this command:

For example:
Windows runctakesCVD.bat
Linux runctakesCVD.sh

The application may take a minute to start on slower hardware.

screenshot to illustrate step

3. An analysis engine (AE) needs to be loaded in order to process text. Use the Run > Load AE menu bar command. Navigate to the file <icTAKES_HOME>/cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xmlClick Open

screenshot to illustrate step

4. Copy the text at the right and paste the contents into the Text section of CVD, replacing the text that is already there.

This example file can also be found in test data: <icTAKES_HOME>/testdata/cdptest/testinput/plaintext/testpatient_plaintext_1.txt

3. From the menu bar, click Run > Run AggregatePlaintextProcessor.

You'll get a list of all the annotations in the Analysis Results frame.

screenshot to illustrate step

4. Named entities are now recognized in this clinical document. To find one, in the Analysis Results frame, click on the key in front of:

  • AnnotationIndex
  • uima.tcas.Annotation
  • edu.mayo.bmi.uima.core.type.IdentifiedAnnotation
  • edu.mayo.bmi.uima.core.type.NamedEntity

    Then select edu.mayo.bmi.uima.core.type.NamedEntity itself. This will show an Annotation Index in the lower frame. Select any NamedEntity in that frame and you will see the text discovered in the Text frame in the example.

screenshot to illustrate step END OF TABLE

Collection processing engine (CPE)

Step

Example

1. Open a command prompt and change to the icTAKES_HOME directory. For example:
Windows cd C:\cTAKES1.2.2
Linux cd /usr/bin/cTAKES1.2.2

Note

Note that icTAKES_HOME must be your current directory unless you are skilled at setting paths on your machine.

2. Start the collection processing engine by running this command:

For exmaple:
Windows runctakesCPE.bat
Linux runctakesCPE.sh

The application may take a minute to start on slower hardware.

screenshot to illustrate step

3. This will bring up the Collection Processing Engine Configurator. In the Menu bar click File open CPE Descriptor

screenshot to illustrate step

4. Navigate to the file <icTAKES_HOME>/cTAKESdesc/cdpdesc/collection_processing_engine/test_plaintext.xml. Click Open.

screenshot to illustrate step

5. Click the Play button (green/blue play arrow near the bottom).

screenshot to illustrate step

6. You should see that one document was processed. You did process a collection of documents. In this case the collection only contained one just to show how to do it. Close the results window.

screenshot to illustrate step

7. Close the CPE application. You may be prompted to save changes. Since this was just a test you may click the No button.

screenshot to illustrate step

8. Open a new command prompt and change to the <icTAKES_HOME>

No example for this step

9. Test the results. There is a comparison tool that will help show that the results match expectations with the following syntax: java -cp cTAKES.jar edu.mayo.bmi.utils.xcas_comparison8. Compare **{}<First File>* <Second File> <diff-html> where:
<First File> is the first file to compare
<Second File> is the second file to compare
<diff-html> is where the results are written to

Copy and paste the example which has had our example files already substituted into a command prompt to run.

Windows

Linux

9. The resulting file will open for you. Look at the comparison to see the annotations resulting from this pipeline.
Windows c:\stuff\diff-html.html
Linux /tmp/diff-html.html

screenshot to illustrate step

Using the same CVD and CPE programs in the manner as described above, you can test all the other components. The analysis engines and collection processing engines shipped with cTAKES for some of the annotators are described as follows.

Annotator

Description

Abbreviated

Example Analysis Engine (AE)

Example Collection processing Engine (CPE)

Example testdata

Clinical Document Pipeline

the complete cTAKES pipeline to obtain majority of cTAKES annotations

cdp

icTAKES_HOME/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xml

icTAKES_HOME/cdpdesc/collection_processing_engine/test_plaintext.xml

icTAKES_HOME/testdata/cdptest

Chunker

obtain cTAKES chunking annotations

chunker

icTAKES_HOME/chunkerdesc/analysis_engine/ChunkerAggregate.xml

icTAKES_HOME/chunkerdesc/collection_processing_engine/ChunkerCPE.xml

icTAKES_HOME/testdata/chunkertest

Dependency Parser

obtain dependency parsing tree

dp

icTAKES_HOME/dpdesc/analysis_engine/ClearParserTokenizedInfPosAggregate.xml

icTAKES_HOME/dpdesc/collection_processing_engine/ClearParserCPE.xml

icTAKES_HOME/testdata/dptest

Drug NER

the annotator to obtain drug annotations

drugner

icTAKES_HOME/drugnerdesc/analysis_engine/DrugAggregatePlaintextProcesor.xml

icTAKES_HOME/drugnerdesc/collection_processing_engine/DrugNER_PlainText_CPE.xml

icTAKES_HOME/testdata/drugnertest

Dictionary Lookup

mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm

lookup

icTAKES_HOME/lookupdesc/analysis_engine/TestAggregateTAE.xml

icTAKES_HOME/lookupdesc/collection_processing_engine/LookupCPE.xml

icTAKES_HOME/testdata/lookuptest

PAD Term Spotter

identifying terms related to PAD

pad

icTAKES_HOME/paddesc/analysis_engine/Radiology_TermSpotterAnnotatorTAE.xml

icTAKES_HOME/paddesc/collection_processing_engine/Radiology_Sample.xml

icTAKES_HOME/testdata/padtest

Smoking Status

the annotator to obtain document or patient-level smoking status

smoking

icTAKES_HOME/smokingdesc/analysis_engine/SimulatedProdSmokingTAE.xml

icTAKES_HOME/smokingdesc/collection_processing_engine/Sample_SmokingStatus_output_flatfile.xml

icTAKES_HOME/testdata/smokingtest

Side Effect

the annotator to find side effect mentions and sentences from clinical documents

sideeffect

icTAKES_HOME/sideeffectdesc/analysis_engine/SideEffectAggregateTAE.xml

icTAKES_HOME/sideeffectdesc/collection_processing_engine/SideEffectCPE.xml

icTAKES_HOME/testdata/sideeffecttest

Next steps

The User Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider that some of the dictionaries that come with icTAKES are small samples. It has been left to the user to load larger or different dictionaries.

The components that require special attention and will not work without a real dictionary:

  • clinical documents pipeline, the original main cTAKES aggregate descriptors (one for CDA and one for plaintext)
  • Drug NER
  • Side Effect

For example, we have successfully tested the 2008 release of the full LVG dictionary. In order to use this release of the full LVG dictionary you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps: 1) to uncompress and 2) to unzip.
  3. Replace the directory <icTAKES_HOME>/resources/lvgresources/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <icTAKES_HOME>/resources/lvgresources/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Likewise, other large dictionaries are available. To install complete dictionaries for RxNorm, SNOMED-CT, or others that available through the UMLS, refer to the following posts on the cTAKES forums:

You can also obtain a production dictionary resource by contacting the Mayo Clinic NLP Program by email for a new version of lookupresources to replace <icTAKES_HOME>/resources/lookupresources. You must provide proof of a valid UMLS license.

Note that the production icTAKES with larger dictionaries will have higher demand on hardware such as memory. One way to make more memory available to these tools is to modify

and

Add java -Xms512M -Xmx2000M to the Java command that launches the tool, for example:
java -cp resources;cTAKESdesc;cTAKES.jar edu.mayo.bmi.ctakes.main.cTAKESCPEGUI -Xms512M -Xmx2000M

Some models included in cTAKES may not represent your data distribution well. If you want to build your own models, read the User Guide for information about components, particularly the folowing information:

  • Training a sentence detector model
  • Building a Parts of Sentence (POS) tagger model
  • Building a Parts of Sentence (POS) tag dictionary
  • Building a chunker model
  • Training a dependency parser
Labels
  • None