This document presents the NCI Enterprise Vocabulary Services (EVS) Development Path. It was proposed in early 2011, and has been updated to reflect some developments since then, but remains under review and subject to resource constraints. The following sections are included:
NCI EVS is actively developing important new areas of terminology content, services and tooling, and is doing requirements gathering and testing in other areas. EVS has developed tools for creating and editing both value sets (sets of terms from one or more terminologies used for a particular coding purpose) and mappings between terminologies; EVS servers and browsers have been extended to store and publish them; and EVS editors have worked with several partners to create new mappings and value sets to support priority research activities. EVS is experimenting and documenting needs related to Semantic Web services, linked data, and use of the Cloud to make it easier to access and deploy terminology services.
At the same time, EVS will continue to devote much of its development effort to improving existing resources, services, and partnerships, developed over the last 15 years and vital to users in NCI and the broader cancer research and biomedical community (see EVS Use and Collaborations). Controlled terminology remains fundamental to creating meaningful information, data exchange, and semantic interoperability, very often through simple codelists and terminology standards, and EVS will focus on making these more easily and widely used even as it explores cutting edge technologies.
This page provides an entry point for exploring and contributing to current EVS development efforts and priorities. Sections 1 to 4 describe ongoing development efforts, many with production deployments scheduled in the coming months. Section 5 covers new technologies that are still being explored and may play a significant role in the future of EVS. Details of existing EVS services are available elsewhere on this EVS Wiki site, as well as the companion EVS Web, caBIG® Web, and Vocabulary Knowledge Center pages, and are included here only where they provide context for new development efforts.
1. Users, Communities, and Partners
Since 1997, EVS has extended from a few core NCI users to become widely used by NCI and its partners in NIH, other federal agencies, and other U.S. and international biomedical, academic, standards, and research organizations. EVS is now used in basic, translational, and clinical research; clinical care; epidemiology; public health; administration; and public information. EVS plans to continue addressing terminology needs in all of these areas, with a special emphasis on the more challenging requirements of translational research. NCI has a longstanding commitment to terminology harmonization, partnerships, and standards, both nationally and internationally, as serving the best interests of both NCI and the broader biomedical community.
1.2 Requirements Gathering, Feedback and Participation
EVS exists to serve its users, and has engaged with them in many different processes of feedback and requirements gathering. The most recent broadly scoped effort was in the requirements gathering leading up to and following on from the July 2009 Semantic Infrastructure Concept of Operations, with an extensive compilation of requirements across both terminology and other areas.
The largest sustained source of requirements and feedback comes from direct working relationships with NCI and other partners. There is also a substantial flow of requests and suggestions from a wide range of other users, through browser suggestion links, Web forms, email, and other channels. Several current development efforts are designed to strengthen and broaden such channels, as well as to encourage extensions in EVS terminology development partnerships. EVS is also experimenting with collaborative terminology development tools that will enable users to contribute more directly or build their own structured terminology and ontology resources through platforms such as a semantic media wiki and a web client with a Protégé backend (see later section).
Supporting both the content and technical needs of EVS users is a growing challenge. EVS is working to strengthen and broaden user support mechanisms in both content and technical areas, with points of entry and interaction more specialized than the generic Application Support service.
Terminology content support has focused above all on NCI Thesaurus (NCIt) reference terminology content, but is also available for a variety of other EVS content. Efforts are underway in the following areas:
- Term Suggestion software and processes have been created and linked into the EVS terminology browsers and other environments, supporting a growing volume of direct user feedback on specific NCIt, NCI Metathesaurus (NCIm), Common Terminology Criteria for Adverse Events (CTCAE) and other concepts (see Terminology Tools section for more details).
- EVS staff support, working directly with EVS users and development partners, continues to be the largest single form of terminology content support. Most users do not want to learn to build and curate terminology themselves, and sustained design and implementation of a whole area of content is often the best approach to providing new terminology. Activities include:
- Adding terminology required for metadata creation and curation as needed, on an occasional basis, or working directly with the end users to create or reuse terminology in a needed domain.
- Collaborative terminology development, described elsewhere.
- Metadata that provides structured description of terminologies is being extended and standardized in several forms, providing a basis for both human and machine understanding of terminology resources. The terminologies EVS makes available are described in metadata stored in LexEVS and partly published through the EVS browsers. Additional information was also published via the Vocabulary Knowledge Center wiki. EVS worked with caBIG®, the National Center for Biomedical Ontology (NCBO), and the UK National Cancer Research Institute (NCRI) to create common representation standards and content for terminologies and ontologies of shared interest (see the Ontology Representation Work Group - Phase II Roadmap).
- Development of documentation to detail the transformation from native source data file formats into the common structures and views of LexEVS and the EVS browsers.
- Use of comprehensive back-end quality assurance processes to validate loads of vocabulary and mapping data into LexEVS for access via APIs and display in the EVS browsers.
Terminology technical support is also being developed in several areas:
- The Vocabulary Knowledge Center (VKC) has provided additional, and in several areas deeper, technical support. This includes documentation about aspects including but not limited to installation and information models; hosting discussion forums; and development of Frequently Asked Questions (FAQs) and other resources; and providing direct support for installing and using many EVS and other terminology tools including LexEVS, browsers, Protégé, NCI Protégé, and LexWiki. The Vocabulary Knowledge Center (VKC) makes available a forum for direct interaction with the programmers on technical questions about LexEVS services.
- Going forward, more effort will be placed on making it easier for non-EVS applications to use LexEVS, including directly retrieving concepts, concept hierarchies, pick lists, and value sets, as well as traversing relationships and querying terminology maps. This will include providing convenience methods and code snippets, as well as developing new and improved tools and services (See sections 2, 4, and 5 below for more detail.)
- The Protégé ontology development tool has an information page.
- Users may submit questions and issue reports to Application Support.
- Users may submit requests for new feature or bug reports to the EVS GForge tracker.
- EVS has hosted training sessions and boot camps for users wishing to understand the LexEVS services. More sessions will be held in the future as resources allow.
2. Terminology Services: LexEVS 6 and HL7 Common Terminology Services (CTS 2)
The LexEVS package represents a comprehensive set of software and services to load, publish, and access terminology and ontological resources. It is built on a common information model representing multiple vocabularies. LexEVS utilizes common repositories, software components, APIs, and tools to facilitate interoperability. It is based on community standards including HL7 and ISO, and is provided as open source. LexEVS has been a key component of EVS infrastructure over the last six years, and is vital to many EVS technical and content development efforts.
LexEVS 6.0 is based on the draft standard Common Terminology Services Release 2 (CTS 2) specification. CTS 2 specifies a set of services that standardize the functional requirements of a terminology server. The scope of CTS 2 is to standardize functionality for Administration, Search and Query, Mapping Support, Value Set Support and Authoring and Maintenance. LexEVS 5.1 supported key parts of CTS 2 Administration, Search and Query, and Value Set functionality. LexEVS 6.0 expands upon 5.1 to address the remaining scope of CTS 2, particularly Mapping and Authoring. Additionally, LexEVS 6.0 includes an updated data access layer to allow for authoring and a diverse set of data sources; Web Ontology Language (OWL) loader updates; a new OWL/Resource Description Framework (RDF) exporter; load balancing; ISO 21090 datatype support for grid analytical services; and filtered export of LexEVS loaded content.
LexEVS 6.0 user documentation is available on the VKC, and the downloadable distribution files are accessible there and on the EVS download page.
Backward compatibility is a major goal for EVS systems, and changes that are not backward compatible are normally only made with major releases. While most LexEVS 5.1 APIs have been held stable, support for major new functionality required some incompatible changes to the LexEVS model and APIs. Information on these changes is available on the LexEVS 6.0 Release Highlights page, including detailed guidance on how to modify old code for LexEVS 6 as well as how to take advantage of the many new features. Sample code is being developed to help implement the most commonly used features, and the NCI Term Browser code are available as a large-scale implementation of most LexEVS features.
You may wish to refer to the following LexEVS 6.0 references:
2.1 Terminology Maps
The ability to represent mappings between terminologies to support various use cases is an important component of LexEVS 6. Mapping content is represented as an extension of LexEVS associations and will support everything from simple one-to-one equivalence maps through to complex multi-part rule-based maps for sophisticated use cases such as reimbursement. Starting with a limited number of high-quality and high-value mappings, EVS is working to better understand user needs and usable interfaces. Currently identified needs include:
- Representation of one or more sets of mappings from one terminology to one or more other terminologies.
- Rich metadata model for capturing version information, assisting in resolution of obsolete codes, and characterization of map sets by use case, comprehensiveness, complexity, and other criteria.
- Search and retrieval of mappings based on code or string criteria.
- Browsing of available mappings with links to referenced terminology content.
- Extensions to the normal browser concept view to show links to other terminologies.
- Support for common and standard mapping representations such as UMLS - Metathesaurus Release Columns and Data Elements (UMLS MRMAP) and the International Health Terminology Standards Development Organisation (IHTSDO) release format 2 (RF2) mapping RefSets.
- Representation and browsing of expression-based mappings, with resolution of individual codes within the expressions.
Tools for creating, maintaining and publishing maps will mostly use but be separate from LexEVS, as will more sophisticated tools for semi-automated criteria-based map generation between arbitrary terminologies, using comprehensive terminology resources such as NCI Metathesaurus (NCIm). A mapping tool that addresses some of these requirements has recently been created to support internal EVS operations, and is available for others to adopt or extend; gathering community requirements and extending the tool to support a broader user base are important future priorities (see Mapping tool section below).
2.2 Value Sets, Subsets and Pick Lists
The ability to subset content using value sets is an important need of end users and a major component of LexEVS 6. Value set content is described in a Value Set Definition that is then expanded using referenced terminologies to provide a set of values. Definitions provide a flexible and powerful tool for creating and maintaining value sets. Definitions can include as components:
- A reference to a terminology (code system). Alone, it will include all concepts in the referenced terminology.
- A reference to another value set definition. Alone, it will include all concepts in the referenced value set.
- A reference to a terminology plus concept codes, which will include only the referenced concepts.
- A reference to a terminology plus concept codes plus additional rules (for example, leaf only, immediate children, matching property name and value), which will include concepts that satisfy the concept code and rule set criteria.
- Combinations of any of the above with OR, AND, or SUBTRACT operations.
The ability to provide an ordered list of the entity codes and corresponding terms and presentations drawn from a value set is provided by LexEVS 6 Pick List support. The pick list is described by a Pick List Definition that, like the value set definition, provides a flexible and powerful tool for creating and maintaining pick lists. Definitions can include:
- The ability to include all the concept codes contained in the referenced value set by setting completeSet flag to 'true.'
- The ability to include individual pickText derived from referenced concepts.
- The ability to exclude specified concepts from the pick list.
- The ability to combine of any of the above.
- The ability to order members of the list.
2.3 Local Extensions
The ability to extend a standard terminology with a set of local terms and codes is important in several settings, and is provided by LexEVS 6 Code System Supplement support. This supports, for example, the ability for an organization to load NCI Thesaurus (NCIt) and add new local terms and codes that are not part of the existing NCIt. The additional codes are stored in a separate code system, but are linked to the base terminology for query purposes. When a search is performed, LexEVS is able to seamlessly return results from both the base terminology and the additional local codes.
3. Terminology Content
3.1 NCI Thesaurus (NCIt)
A strategic EVS development priority is continuing the growth of content in NCIt, NCI's reference terminology and core biomedical ontology. NCIt now covers some 100,000 key biomedical concepts with a rich set of terms, codes, definitions, and over 200,000 inter-concept relationships, and is used to code most NCI models and metadata. About 700 concepts are added each month in response to user requests and the requirements of dependent systems and applications, and much more content is similarly added to existing concepts. Many of these concepts include content created and maintained jointly with NCI's partners, making NCIt a shared coding and semantic infrastructure resource.
EVS makes the NCI Thesaurus available through LexEVS, the NCIt Browser, and for download in a variety of formats through the EVS ftp site. There are also subsets of the NCI Thesaurus provided to meet the needs of groups such as Clinical Data Interchange Standards Consortium (CDISC) and the Food and Drug Administration (FDA).
3.2 NCI Metathesaurus (NCIm)
Adding new terminologies and concept-based mappings between them in NCIm is another continuing development priority, responding to user needs to code, interpret, and map between a wide array of biomedical data and access it in a consistent way. NCIm has grown to include 76 biomedical terminologies whose 3,600,000 terms are mapped to 1,400,000 concepts representing their shared meanings. NCIm is updated approximately six times a year, growing by some 100,000 concepts annually. This growth involves adding new terminologies, and updated versions of existing terminologies, to meet the requirements of EVS users.
The NCI Metathesaurus can be accessed and searched through LexEVS and the NCIm Browser. It can also be downloaded as RRF in its entirety through the EVS download page to anyone who has a Unified Medical Language System (UMLS) license. (The download dialog includes an opportunity to obtain a UMLS license.)
3.3 Other Terminologies
EVS will also continue work to improve the representation and availability of other terminology content. EVS loads into LexEVS, and makes available through the NCI Term Browser, some 20 standalone terminologies including ICD-9-CM, ICD-10, CTCAE, MedDRA, SNOMED CT, NDF-RT, LOINC, RadLex, PDQ and GO. EVS works with its user community to identify other terminologies to be added to this list. EVS also works with several partners in the development and publication of new terminology content, some outlined below.
3.3.1 Common Terminology Criteria for Adverse Events (CTCAE)
EVS collaborated extensively with the NCI Cancer Therapy Evaluation Program (CTEP) and other CTCAE partners in the redesigned release 4, available through the NCI Term Browser and for download from the EVS ftp site. EVS plans to continue supporting development and release of future versions, including a release 5 planned for early 2014.
3.3.2 NanoParticle Ontology (NPO)
3.3.3 National Drug File Reference Terminology (NDF-RT)
EVS will continue to process and publish regular updates to the Veterans Health Administration National Drug File Reference Terminology (NDF-RT), publishing both the full OWL ontology and the FDA-specified Structured Product Labeling value sets through the LexEVS server, the NCI Term Browser, and the EVS ftp site.
3.4 Value Sets
EVS plans to continue extending and deepening its partnerships to create and support standardized terminology for biomedical coding. Hundreds of NCIt subsets and other code lists are maintained by EVS. EVS has prioritized making these available as CTS 2 value sets, following theto support easier access and interoperability with other systems and standards repositories. In caDSR, EVS value sets increasingly provide a pre-curated standard set of meanings for use by metadata curators.
3.5 Terminology Mappings
EVS plans increasing efforts to create and publish pairwise mappings between term lists, value sets, and whole terminologies. Mappings such as three EVS currently maintains to map from Physician Data Query (PDQ), Mouse Anatomy (MA), and Gene Ontology (GO) to NCIt are in growing demand to support data translation, cross-referencing and system integration. EVS plans to spend significant time with creators and consumers of this content, to ensure that it provides accurate mappings and supports real community needs.
4. Terminology Tools
EVS has developed a new generation of more user friendly terminology browsers, resulting in a substantial increase in users as well as adoption by other organizations. EVS plans to continue to improve the responsiveness and functionality of these browsers, including new developments outlined below.
Mapping: EVS has added mapping capabilities to the Term Browser to allow users to view mapping data efficiently and effectively. Users are able to find all concepts to which any given concept is mapped through a simple user interface. The mapping data include source and target concept code, name, terminology, relationship (such as SY, BT, NT) and, in some cases, a mapping score that indicates the quality of a mapping from a source concept to a target concept. Users are able to view source and target concepts in a mapping by clicking on the corresponding hyperlinks. EVS plans to support complex multi-part rule-based mappings and mappings whose targets are expressions of concept codes rather than simple, individual codes. Dynamic generation of mappings based on NCI Metathesaurus is also being considered.
Value sets, subsets and pick lists: EVS has added new functionalities to the Term Browser based on the value set component of LexEVS 6. Users can view the list of existing value sets available on the server, organized both by source terminology and by coding standards they support. For each value set, users can access both the value set definition and the concepts belonging to that value set, and can export the data to a file in LexGrid XML, comma delimited ASCII, or Microsoft Excel format.
Local extensions to terminologies: EVS is extending the Term Browser capabilities to account for new features in LexEVS 6 for local extensions to standard terminologies. Users will be able to view the hierarchies of the extended terminology seamlessly through the same user interface. Users will also be able to perform searches against the original base terminology.
EVS is also evaluating the possibility of creating a generic web terminology browser, based on the NCIt browser and allowing organizations to quickly implement terminology browsing functionality without the need to write code. Current code would need to be re-factored so that the back-end data source could be LexEVS, SQL, or any other data source provided a standard browser interface is implemented.
4.2 Report Writer
The EVS General Purpose Report Writer is a web-based tool that allows users to generate reports from a specified terminology data source, and to download or print those reports to a specified file type or format, e.g., Excel and tab-delimited text. The Report Writer allows for the generation of standard reports with predefined criteria and outputs.
EVS plans to extend the Report Writer to improve current value set support. EVS also plans to modify the Report Writer to allow administrative users to search and review metadata of value set definitions stored in a centralized database. An administrator will be able to select a value set definition and export its content to a file in LexGrid XML, tab-delimited text, or Microsoft Excel format based on the specification of the report provided by the user organization, such as FDA or CDISC.
4.3 Value Set Editor
EVS has developed a web-based value set application to help both administrative and other users create, modify, and delete value sets. This tool enables users (mostly internal for now) to pull from all terminologies available in a LexEVS server, with a graphical user interface providing methods intuitive to terminology users such as concept codes, search criteria, local hierarchies, and concept relationships. Results are collected and then efficiently represented by the tool in a value set definition data structure. The tool allows users to import and export both definitions and resolved concept sets in XML, tab-delimited, or Microsoft Excel format.
4.4 Mapping Tool
EVS has developed an initial tool to create and maintain mappings between terminologies. The tool provides basic support for the first two sets of functions below:
- Initial criteria-based semi-automated mapping between arbitrary sets of terms, using direct matching techniques as well as terminology mapping resources such as NCI Metathesaurus (NCIm). This will leverage NCIm structure as well as approaches like lexical mapping and scoring algorithms to suggest the quality of the generated mapping.
- User interfaces for manual review and acceptance of mapping content initially generated by semi-automated methods.
- Support for handling updates to source or target terminologies – to identify obsolete, split, or merged codes, as well as possible better matches (for example, a "narrower" mapping to a target code where the target terminology has been updated and that target code has new children that might provide a better mapping).
Extending support in the first two areas, covering the third, creating a database back end, and improving usability are key priorities for future development.
4.5 Terminology Editing
This section will discuss development and sharing of Protégé and Web Protégé.
4.6 User Input and Collaboration
This section will include discussion of Wikis and the EVS Term Suggestion application.
4.7 Light Weight EVS Tooling
EVS is working on new ways to make EVS terminology services more widely accessible, and easier to add to other environments and applications. Key areas currently being explored include:
- Plug-in Tools for Websites: Embeddable tools for websites, enabling related websites to include specific browser and other functionality without the need to program (similar to Google Maps).
- Small tool to search terminologies from within Microsoft Word.
- Search Portlet: a simple concept search portlet (see the caBIO Portlet).
- PDA Terminology Browser: Small tool to search terminologies from within devices such as iPhones (see: caBIO iPhone App).
5. New Technologies
5.1 Semantic Infrastructure Innovations
Earlier plans to provide terminology content and technical support for SAIF, ECCF and SOA-based services and applications are on hold. The related CTS 2 and LexEVS 6 efforts, however, are still central terminology components of the more gradually evolving semantic infrastructure as described in the sections above. For reference, preliminary draft materials from those activities set aside are available at:
5.2 Services Design and Ease of Use
EVS plans to extend current Representational State Transfer (REST) and Simple Object Access Protocol (SOAP) service interfaces, and is working with other parts of CBIIT to develop a broader redesign of information services. Actual and potential users of such services are strongly encouraged to communicate their requirements, use cases and suggestions. Some starting points of particular interest are:
- REST: Mayo Clinic planned two Platform-specific Models (PSMs) as part of its CTS 2 submission: one for SOAP, and a second for the REST architectural style. Some preliminary materials are on the Mayo Clinic website. The existing LexEVS API has a strong procedural focus, making it necessary to wrap the functionality of the API in an abstraction layer that presents the LexGrid model as a set of cohesive, traversable resources, and the services as Create, Read, Update, Delete (CRUD) operations against those services. The REST architectural style is a good fit for terminology services, as the majority of the accesses are query and read, meaning that the native distribution, federation and caching behavior of the web and internet, can be used to ensure scalability and availability.
- LexAJAX: The Mayo Clinic and VKC developed a prototype called LexAJAX. The purpose of this project is to develop an Ajax-based service based on the LexEVS API and LexGrid data model. Information about this project is available on the VKC lexAjax page.
- LexRDF: Resource Description Framework (RDF) is an official World Wide Web Consortium (W3C) Semantic Web specification for metadata models. The Mayo Clinic has established a set of mappings between the LexGrid Model(the model foundation of LexEVS) elements to corresponding constructs in the appropriate W3C standards. This allows LexGrid-represented terminology information to be rendered as RDF triples. For more information about LexRDF you may download this PDF from the Brigham Young University website.
5.3 Semantic Web
NCI's informatics community has shown a growing interest in semantic web technology. The swBIG initiative recently experimented with the conversion of metadata and terminology to OWL or RDF linked data to provide a web services mechanism to query across data available through caGRID endpoints. Another caBIG group experimented with conversion of the Life Sciences Domain Analysis Model to OWL based on interest in the community in using OWL models for reasoning and semantic web purposes. There has also been interest in precompetitive collaboration in semantic web technologies in the pharmaceutical industry to support data sharing and retrieval, and in what new research can be done with linked biological data stores. We anticipate that current semantic web technologies will not satisfy semantic infrastructure needs, but that they will provide some specific benefits in certain areas.
5.4 Cloud Computing
NCI CBIIT investigated creating a Cancer Knowledge Cloud, including use of cloud computing technologies, and the caBIG® Architecture Workspace also explored how cloud computing might be used. This effort is described on the caBIG® Architecture Workspace Cloud Computing Proof-of-Concept Activity page. EVS is seeking to take advantage of this by exploring how cloud computing might be used to better support terminology services and operations. For example, the Mayo Clinic created cloud instances of LexEVS already loaded with several terminologies as a way to enable people to create local LexEVS installations more easily. Security concerns are relatively few, and resource requirements can be quite large and variable, making at least some terminology activities good potential candidates.
Help Downloading Files
For help accessing PDF, audio, video, and compressed files on this wiki, go to Help Downloading Files.