BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Java EE 7, Spring Standardize Batch

Java EE 7, Spring Standardize Batch

Leia em Português

This item in japanese

Lire ce contenu en français

This month’s release of the Java EE 7 platform includes a specification for a batch processing programming model that is heavily derived from VMware’s Spring Batch project. Spring Batch was also highlighted this month with a noteworthy release that brings leaner configuration and streamlined data access.

Batch Applications for the Java Platform, known also as JSR-352, offers application developers a model for developing robust batch processing systems. The core of this programming model is a development pattern borrowed from Spring Batch, coined the Reader-Processor-Writer pattern, in which developers are encouraged to embrace a Chunk-oriented processing standard.

The Reader-Processor-Writer pattern breaks down into a three step workflow for developers to follow:

  • an ItemReader class is designed to consume a chunk of the processing data (usually a single record);
  • an ItemProcessor, for which business and domain logic is to be imposed upon the chunk;
  • and finally, an ItemWriter, to which records will be delegated post-processing, and thereafter aggregated.

 

Per the JSRs specification, a Job is described through an XML document, and contains Steps in the processing workflow. Each Step is responsible describing how a chunk of data will be processed, and at what interval commits are to be registered. More complex requirements for processing a given Step of the workflow can be handled with JSR-352’s definition of a batchlet. A batchlet is the JSR-352 parallel to Spring Batch’s tasklet, which provides a strategy for processing a Step.

JSR-352 also employs Spring Batch’s pattern for accessing and controlling jobs. Jobs are invoked through a JobOperator, and results of a job are accessible through a JobRepository. In Spring Batch, while the JobRepository name remains the same, the JobOperator is known as the JobLauncher.

Deviating somewhat from Spring Batch’s manner of defining jobs, Java EE 7 application developers are required to have the job XML documents contained in the META-INF/batch-jobs directory of their project. With Spring Batch, developers can place their job configuration in any facet of the Spring application context to make them available in the container.

Job XMLs for Java EE 7 containers define concrete Reader, Processor, and Writer classes, in addition to buffer size, commit interval, and a checkpoint policy. A checkpoint policy denotes how commits will be handled. The default value is "item", but developers may choose to use "time" for the commit strategy. In the former case, the commit-interval describes the number of records processed, while in the latter case, it describes seconds.

<job id="myJob" xmlns="http://batch.jsr352/jsl">
    <step id="myStep" >
        <chunk 
                reader="MyItemReader" 
                writer="MyItemWriter" 
                processor="MyItemProcessor"
                buffer-size="5" 
                checkpoint-policy="item"
                commit-interval="10" />
    </step>
</job>
	

The Spring Batch job configuration is nearly identical to Java EE 7, the caveat being that steps are enclosed in a tasklet directive. The reader, process, and writer attributes on the chunk configuration are references to beans that exist in the application context. As of version 2.2.0, the commit-interval on the chunk configuration denotes how many items must be processed before registering a commit.

<job id="myJob">
    <step name="myStep">
        <tasklet>
            <chunk 
                    reader="myItemReader" 
                    processor="myItemProcessor" 
                    writer="myItemWriter" 
                    commit-interval="2" />
        </tasklet>
    </step>
</job>  

<bean id="myItemReader" class="...MyItemReader" /> 
<bean id="myItemProcessor" class="...MyItemProcessor" />
<bean id="myItemWriter" class="...MyItemWriter" />
	

While currently working toward JSR-352 compliance, Spring Batch goes beyond the specification to offer developers a seamless integration with other components from the Spring ecosystem. In the case of batch processing, Spring Data can be leveraged directly as a Reader instance in the Reader-Processor-Writer pattern, to let developers retrieve chunks from a Spring Data repository. The version 2.2.0 release of Spring Batch, which also came this month, offers streamlined interfacing for MongoDB and Neo4J datastores using Spring Data.

In addition to simplified Reader interfacing, the latest release of Spring Batch makes available an extension to Spring’s Java configuration to simplify batch processing features. To enable this simplified configuration, developers need only to provide the @EnableBatchProcessing annotation on a class that is annotated with @Configuration. From there, batch processing features such as the JobRepository and JobLauncher can be directly autowired with no additional configuration.

@Configuration
@EnableBatchProcessing
public class AppConfig {

    @Autowired
    private JobBuilderFactory jobs;

    @Bean
    public Job job() {
        return jobs.get("myJob").start(step1()).next(step2()).build();
    }

    @Bean
    protected Step step1() {
       ...
    }

    @Bean
    protected Step step2() {
     ...
    }
}
	

In additional to Spring Batch 2.2.0’s improvements for data retrieval and configuration, the latest release has updated its dependency on the Spring framework to a minimum version of 3.1.2. Spring developers building batch applications will have to comply with this minimum version to use the latest release of Spring Batch.

Rate this Article

Adoption
Style

BT