Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Skip to end of metadata
Go to start of metadata

Tool Overview

Semantator (Semantic Annotator) is a tool developed in Mayo Clinic for users to semantically annotate data of interest with respect of domain ontologies in plain text. The Semantator allows users to leverage different tools and resources supported by the NCI Vocabulary and Knowledge Center (VKC) in a single environment.  Currently VKC has a terminology repository that lists a set of biomedical ontologies and terminologies. It supports the Open Health Natural Language Processing (OHNLP) Consortium, as well as the NCI protege. Semantator is implemented as a protege plugin, where users can view a domain ontology as well as the document to be annotated at the same time, and choose the leverage NLP annotation tools for automatic annotation. It provides a environment for users to view the annotation results and interactively curate the results when necessary. 

Querying and browsing data embedded in biomedical text is an important and challenging task. The emerging semantic web techniques envision a web of data which allows users to browse and query information of interest within documents directly. A prerequisite of this goal is to develop methods to produce decent structured data, i.e., converting data originally in free text into structured formats. Fully automatic approaches for data extraction are preferred but they do not always give satisfying results, and relying on manual annotation may not be realistic due to the large volume of text that needs to be processed. Therefore, semi-automatic data curation, where information from biomedical text is automatically extracted and then manual efforts are used to refine the annotations, is an attractive alternative. In addition, the results from semi-automatic processes could potentially serve as training sets for automatic systems to further improve their performance.

To support semi-automatic curation we developed Semantator, a user-friendly, semantic-web-oriented environment for browsing and querying the annotated data, as well as interactively refining annotation results if needed. Semantator is implemented as a Protege plug-in that allows users to view the annotation in its original context, the ontology used for annotation, and the annotation results in the same environment. Semantator provides two modes: (1) manual annotation; and (2) semi-automatic annotation. In the manual annotation model, a human expert curator can choose a document to be annotated and a domain ontology, highlight different pieces of information from the original text, and then mark which ontology concepts the information belongs to. The system will generate class instances according to curator’s annotation and displays different class instances in different colors. Curators can also link the instances together using the properties defined in the domain ontology. In the semi-automatic annotation mode, users can choose to use different automatic annotation tools such as the National Center for Biomedical Ontologies (NCBO) annotator and Mayo Clinic’s Clinical Text Analysis and Knowledge Extraction System (cTAKES), which are well-acknowledged tools for annotating biomedical and clinical text. Curators can review the annotation results in the Semantator environment and modify as needed. The annotation results are saved in RDF so that they can be used by tools developed by the semantic web community for querying and reasoning. In addition, Semantator also provides an interface where users can compare annotations done by different curators or annotation tools, to determine inter-annotator agreements, and to resolve conflicts among different annotations.

What's New


Development Focus

Release Date


Initial release

  • Manual instance annotation
  • Multi-select annotation
  • Semi-automatic annotation of one document via NCBO or cTAKES
  • In context view of annotations
  • Cross domain ontology use
  • Relationships between annotation instances
  • Store in RDF format

February 16, 2012

Installation and Downloads

Semantator is implemented as Protege plugin. First install Protege. Next simply unzip and place the two files in their appropriate place within the Protege folder structure. See the CNTRO site for downloads.

Documentation and Guides

Please go to the CNTRO Semantator page.


The CNTRO site lists these contacts for support.

Presentations and Demos


Additional Resources