NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Scrollbar
iconsfalse

...

Page info
title
title

Section
Column
Panel
titleContents of this Page
Table of Contents
minLevel2
Column
Align
alignright
Include Page
Menu LexEVS 6.x Loader to Include
Menu LexEVS 6.x Loader to Include

...

  • This Loader Framework requires LexEVS release 56.0 or abovex.
  • Development systems are required to install the Sun Java Development Kit (SDK) or Java Runtime Environment (JRE) version 1.5.0_11 or above.7.
  • Maven 23.1 or greaterx.
  • For software and hardware dependencies for the system hosting the LexEVS runtime, refer to the  Installation and downloads of the summary page for the latest release.

Development and Build Environment

Third Party Tools

    • Maven: Apache build manager for Java projects
    ; see http://maven.apache.org/
    • Multiexcerpt include
      nopaneltrue
      MultiExcerptNameExitDisclaimer
      PageWithExcerptwikicontent:Exit Disclaimer to Include
    . See http://www.eclipse.org/
    • Multiexcerpt include
      nopaneltrue
      MultiExcerptNameExitDisclaimer
      PageWithExcerptwikicontent:Exit Disclaimer to Include

Loader Framework Code

The Loader Framework code is available in the NCI Subversion (SVN) repository. It is comprised of three Framework projects. Also at the time of this writing there are three projects in the repository that utilize the Loader Framework.

Loader Framework Projects

  • PersistanceLayer: a Hibernate connector to the LexBIG database
  • Loader-framework: a framework that sets up build information for Maven
  • Loader-framework-core: a framework that contains all the interfaces and utilities; also contains an extendable class "AbstractSpringBatchLoader" that all new Loaders should extend

Loader Proejcts Using the New Framework

  • abstract-rrf-loader: a holder for common rrf-based loader code
  • meta-loader: a new loader to read the NCI MetaThesaurus
  • umls-loader: a loader for reading Unified Medical Language System (UMLS) content

Maven

The above preceding projects utilize Maven for build and dependency management. Obtain You may obtain the Maven plugin for Eclipse at http://m2eclipse.codehaus.org

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
.

How to Use the Loader Framework: A Roadmap

...

The projects containing the Loader Framework (PersistanceLayer , loader-framework , and loader-framework-core) use Maven for dependency management and build. You will still use Eclipse as your IDE and code repository, but you will need to install a Maven plugin for Eclipse.

  1. Install the Maven plugin for Eclipse, which can be found at: http://m2eclipse.sonatype.org/
    Multiexcerpt include
    nopaneltrue
    MultiExcerptNameExitDisclaimer
    PageWithExcerptwikicontent:Exit Disclaimer to Include
    .
  2. Provide a URL and userid/password to a Maven repository on a server (which manages your dependencies or dependent jar files).

    Ours here

    The Maven repository at Mayo Clinic

    is:

    is 

    Code Block
    http://bmidev4:8282/nexus-webapp-1.3.3/index.html
    .
  3. Import the Loader Framework classes from SVN.
  4. You will most likely see build errors about missing jars. Resolve those by right clicking on the project with errors, select Maven', and Resolve Dependencies. This will pull the dependant dependent jars from the Maven repository into your local environment.
  5. To build a Maven project, right click on the project, select Maven, then select assembly:assembly.

...

The following diagram is from the Maven documentation:

screenshot of the Maven directory structureImage Removed

screenshot of the Maven directory structureImage Added

For more information on the Maven project, refer to the documentation

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
.For more information on the Maven project, see http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html

Configure your Spring Config (myLoader.xml)

...

What follows is a brief overview of those tags related to the LoaderFramework. For more detail please see refer to the Spring documentation at http://static.springsource.org/spring-batch/reference/html/index.html

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
.

Beans

The beans:beans tag is the all-encompassing tag. You define all your other tags in it. You can also define an import within this tag to import an external Spring config file. (Import is not shown in the sample image above.)

Bean

Use these tags, beans:bean', to define the beans to be managed by the Spring container by specifying the packaged qualified class name. You can also specify inititialization values and set bean properties within these tags.

Code Block
<source>
<beans:bean id="umlsCuiPropertyProcessor" parent="umlsDefaultPropertyProcessor" class="org.lexgrid.loader.processor.EntityPropertyProcessor">
  <beans:property name="propertyResolver" ref="umlsCuiPropertyResolver" />
</beans:bean>
</source>

Job

The job tag is the main unit of work. The job is comprised of one or more steps that define the work to be done. Other advanced and interesting things can be done within the Job such as using split and flow tags to indicate work that can be done in parellel steps to improve performance.

Code Block
<source>
<job id="umlsJob" restartable="true">
 <step id="populateStagingTable" next="loadHardcodedValues" parent="stagingTablePopulatorStepFactory"/>
...
</Source> 

Step

One or more step tags make up a job and can vary from simple to complex in content. Among other things, you can specify which step should be executed next.

Tasklet

You can do anything you want within a Tasklet, such as sending an email or a LexBIG function such as indexing. You are not limited to just database operations. The Spring documentation also has this to say about Tasklets:

The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly
by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal
a failure. Each call to the Tasklet is wrapped in a transaction.

Chunk

Spring documentation says it best:

Spring Batch uses a "Chunk-Oriented" processing style within its most common implementation. Chunk-
oriented processing refers to reading the data one at a time, and creating "chunks" that will be
written out, within a transaction boundary. One item is read in from an ItemReader, handed to an
ItemWriter, and aggregated. Once the number of items read equals the commit interval, the entire
chunk is written out via the ItemWriter, and then the transaction is committed.

Reader

An attribute of the chunk tag. Here is the class that you defined implementing the Spring ItemReader interface to read data from your data file and create domain-specific objects.

Processor

Another attribute of the chunk tag. This is the class that implements the ItemProcessor interface where other manipulations of the domain objects take place. In the case of the Loader Framework, we create LexGrid model objects from the domain objects so that they can be written to the database via Hibernate. Note that this is not a required attribute. In theory, if you had a data source from which you could read such that you could create LexBIG objects immediately, you would not need a processor. In practice this would most likely not be the case, but rather you need to work with the data to get it into LexBIG objects.

Writer

Attribute of the chunk tag. This class will implement the Spring interface ItemWriter. In the case of the Loader Framework, these classes have been written for you. They are the LexGrid model objects that use Hibernate to write to the database.

...

Spring Batch gives the Loader Framework some degree of recovery from errors. Like the other features of Spring, error handling is something you need to configure in the Spring config file. Basically, Spring will keep track of the steps it has executed and make note of any step that has failed. Those failed steps can be re-run at a later time. The Spring documentation provides additional information on this function. See http://static.springsource.org/spring-batch/reference/html/configureJob.html and http://static.springsource.org/spring-batch/reference/html/configureStep.html. ConfigureJob

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
and ConfigureStep
Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
.

Database Changes

None

Client

...

Spring can accommodate parallel processing to enhance performance. The Spring documentation provides a good discussion of this topic. See http://static.springsource.org/spring-batch/reference/html/scalability.html.of this topic. Refer to the Scalability

Multiexcerpt include
nopaneltrue
MultiExcerptNameExitDisclaimer
PageWithExcerptwikicontent:Exit Disclaimer to Include
page.

Internationalization

Not internationalized

...

Test Results

See System Testing

...

Scrollbar
iconsfalse