Page History

Versions Compared

Old Version 1

changes.mady.by.user Unknown User (reillysm)

Saved on Nov 04, 2011

compared with

New Version 2

changes.mady.by.user Unknown User (shorttrd)

Saved on Dec 14, 2011

Key

This line was added.
This line was removed.
Formatting was changed.

Scrollbar

icons	false

Page info
title
title

Panel

title	Contents of this Page

Table of Contents

minLevel	2

...

Introduction

This document is a section of the Loader Guide. It is new in LexEVS v5.1.

...

The Loader Framework utilizes Spring Batch for managing its Java objects to improve performance and Hibernate provides the mapping to the LexGrid database.

Image Removed image showing the major components of the Loader Framework as described above Image Added

Assumptions

None

...

abstract-rrf-loader: a holder for common rrf-based loader code
meta-loader: a new loader to read the NCI MetaThesaurus
umls-loader: a loader for reading Unified Medical Language System (UMLS) content

Maven
The above projects utilize Maven for build and dependency management. Obtain the Maven plugin for Eclipse at http://m2eclipse.codehaus.org

...

An example may help in understanding the Framework. Our discussion will refer to the illustration below. Let's say we are writing a loader to load the ICD-9-CM codes and their descriptions, which are contained in a text file. We know we'll need a data structure to hold the data after we've read it so we have a class:

Code Block
<source>

...


ICD9SourceObject \{

...


String id;

...


String descr;

...


String getId() \{ return id; \}

...

\}

...


</source>

The Loader Framework uses Spring Batch to manage the reading, processing, and writing of data. Spring provides classes and interfaces to help do this work, and the Loader Framework also provides utilities to help loader developers. In our example, illustrated below, we will write a class that will use the Spring ItemReader interface. It will take a line of text and return an ICD9SourceObject (shown as 1 and 2). Next we'll want to process that data into a LexEVS object such as an Entity object. So we'll write class that implements Spring's ItemProcessor interface. It will take our ICD9SourceObject and output a LexEVS Entity object (shown as 3 and 4). Finally, we'll want to write the data to the database (shown as 5). Note that the LexEVS model objects provided in the Loader Framework are generated by Hibernate and utilize Hibernate to write the data to the database. This will free us from having to write SQL.

Image Removed diagram illustrating the previous paragraph Image Added

Spring

Configure Spring to be aware of your objects and to manage them. This is done via an XML configuration file. More details on the Spring config file arebelow.

...

The following diagram is from the Maven documentation:

Image Removed image illustrating the Maven directory structure Image Added

For more information on the Maven project, see http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html

...

Spring is a lightweight bean management container; among other things, it contains a batch function that is utilized by the Loader Framework. A loader using the framework will need to work closely with Spring Batch. The way it does that is through Spring's configuration file where you configure beans (your loader code) and how the loader code should be utilized by Spring Batch (by configuring a Job, Step, and other Spring Batch stuff in the spring config file). Here is sample code:

Code Block
<source>

...


<job id="ioSampleJob">

...

Panel
<step <step name="step1"> <tasklet <chunk <tasklet <chunk reader="fooReader" processor="fooProcessor" writer="compositeItemWriter" commit-interval="100"> </chunk> </tasklet> </step> </job>

Panel

<step


   <step name="step1">

<tasklet
<chunk


      <tasklet
         <chunk reader="fooReader" processor="fooProcessor" writer="compositeItemWriter" commit-interval="100">


         </chunk>


      </tasklet>


   </step>


</job>

...



<bean id= "compositeItemWriter" class="...compositeItemWriter">

...

Panel
<property {panel} <property name="delegate" ref="barWriter" /> {panel} </bean>

Panel

<property


{panel}
   <property name="delegate" ref="barWriter" />


{panel}
</bean>

...



<bean id="barWriter" class="...barWriter" />

...

 
</source>

What follows is a brief overview of those tags related to the LoaderFramework. For more detail please see the Spring documentation at http://static.springsource.org/spring-batch/reference/html/index.html.

...

Use these tags, beans:bean', to define the beans to be managed by the Spring container by specifying the packaged qualified class name. You can also specify inititialization values and set bean properties within these tags.

Code Block
<source>

...


<beans:bean id="umlsCuiPropertyProcessor" parent="umlsDefaultPropertyProcessor" class="org.lexgrid.loader.processor.EntityPropertyProcessor">

...

Panel
<beans:property name="propertyResolver" ref="umlsCuiPropertyResolver" /> </beans:bean>

Panel


  <beans:property name="propertyResolver" ref="umlsCuiPropertyResolver" />


</beans:bean>

...


</source>

Job

The job tag is the main unit of work. The job is comprised of one or more steps that define the work to be done. Other advanced and interesting things can be done within the Job such as using split and flow tags to indicate work that can be done in parellel steps to improve performance.

Code Block
<source>

...


<job id="umlsJob" restartable="true">

...

Panel
<step id="populateStagingTable" next="loadHardcodedValues" parent="stagingTablePopulatorStepFactory"/> ...

Panel


 <step id="populateStagingTable" next="loadHardcodedValues" parent="stagingTablePopulatorStepFactory"/>

...

...


</Source>

Step

One or more step tags make up a job and can vary from simple to complex in content. Among other things, you can specify which step should be executed next.

...

You can do anything you want within a Tasklet, such as sending an email or a LexBIG function such as indexing. You are not limited to just database operations. The Spring documentation also has this to say about Tasklets:
<source>

The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly
by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal
a failure. Each call to the Tasklet is wrapped in a transaction.
</source>

Chunk

Spring documentation says it best:
<source>

Spring Batch uses a "Chunk-Oriented" processing style within its most common implementation. Chunk-
oriented processing refers to reading the data one at a time, and creating "chunks" that will be
written out, within a transaction boundary. One item is read in from an ItemReader, handed to an
ItemWriter, and aggregated. Once the number of items read equals the commit interval, the entire
chunk is written out via the ItemWriter, and then the transaction is committed.
</source>

Reader

An attribute of the chunk tag. Here is the class that you defined implementing the Spring ItemReader interface to read data from your data file and create domain-specific objects.

...

Below is an image of the loader-framework-core project in Eclipse, which shows the key directories of the Loader Framework. The following is a summary of the contents of those directories.

image of loader-framework-core project in Eclipse Image Modified

Directory	Summary
connection	Connect to LexBIG and do LexBIG tasks such as register and activate
constants	Assorted constants
dao	Access to the LexBIG database
data	Directly related to data going into the LexBIG database tables
database	Database-specific tasks not related to data, such as finding out the database type (MySQL, Oracle)
fieldsetter	Spring-related classes for helping to write to the database
lexbigadmin	Common tasks for LexBIG to perform, such as indexing
listener	Listeners you can attach to a load so that the code will execute at certain points in the load, such as a cleanup listener that runs when the load is finished, or a setup listener, etc.
logging	Access to the LexBIG logger
processor	Important directory: classes to which you can pass a domain-specific object and which will return a LexBIG object
properties	Code used internally by the Loader Framework
reader	Readers and reader-related tools for loader developers
rowmapper	Classes for reading from a database; currently experimental code
setup	Loader developers should not need to dive into this directory. Classes such as JobRepositoryManager that help Spring do its work; as Spring hums along it keeps tables of its internal workings.
staging	Helper classes to use if your loader needs to load data to the database temporarlily
wrappers	Helper classes and data strucutres structures such as a Code/CodingScheme class
writer	Miscellanous Miscellaneous classes that write to the database. These are not the same classes you would use in your loader, i.e the LexBIG model objects that use Hibernate. Those classes are contained in the PersistanceLayer project (shown below). It is by using those classes in the PersistenceLayer that you let the Loader Framework do some of the heavy lifting for you.

Image RemovedImage Added

Algorithms

None

...

Spring Batch gives the Loader Framework some degree of recovery from errors. Like the other features of Spring, error handling is something you need to configure in the Spring config file. Basically, Spring will keep track of the steps it has executed and make note of any step that has failed. Those failed steps can be re-run at a later time. The Spring documentation provides additional information on this function. See [ http://static.springsource.org/spring-batch/reference/html/configureStep.html and http://static.springsource.org/spring-batch/reference/html/configureStepconfigureJob.html].

Database Changes

None

Client

...

Automated tests are run via Maven. As mentioned earlier, the projects containing the Loader Framework code are configured to work with Maven. The illustration below shows the PersistenceLayer project and its standard Maven layout. Notice the structure of the test code mirrors the structure of the application code. To run the automated test in our Eclipse environment, we select the project, right click, select Run As and select Maven test. Maven does the rest.

Image Removed image of directories in the PersistenceLayer project and associated Maven layout Image Added

Test Guidelines

...

group readers
group writers
writers configurable to skip certain records
partitionable readers to break up large source files
error-checking readers and Writers
a validating framework for inspecting content before it is inserted into the database

...

Scrollbar

icons	false

Content

Space Tools

Page History

Versions Compared

Old Version 1

New Version 2

Key

Page infotitletitle

Introduction

Assumptions

Spring

Algorithms

Database Changes

Client

Page info
title
title