JVM code can be just as fast as the equivalent native code. This might come as a shock to some people. Yes - JVM code (like Java or COBOL) can have the same speed as native code (like C++ or COBOL).
Scenario
This is generally true. However, in this case, I am looking at batch processing, a type of programming which is very common in commercial settings. In this programming model, many discrete data processing programs are run in succession, traditionally controlled by a script of some sort. On the mainframe, that script is usually JCL (Job Control Language); on distributed computers, it is either shell or cmd (.bat) scripts.
The nature of traditional batch runs on distributed (non mainframe) hardware means that lots of short run processes are utilised. This is exactly what JVM programs are bad at. However, with good batch system architecture, this can be overcome with amazing results.
To demonstrate just how fast JVM programs can be requires overcoming a number of challenges.
The Challenges
The first challenge in proving something like this has been the lack of directly comparable languages. Compiling Java to native works with gcj; however, even though the result is often slower than running the same Java byte code on the JVM, this proves little because the gcj compiler is not a super trusted, highly optimized commercial grade compiler. No offence, it has never been developed that way.
As a senior principal developer in the JVM COBOL team at Micro Focus, I am in an almost unique position to compare our battle hardened native COBOL compiler with our soon to be general availability JVM COBOL compiler. The compiler front ends are the same! The only difference is the code generators. The native compiler has an extremely effective optimising native code (machine code) generator and the JVM compiler produces JVM bytecode in class file format directly.
Note: the Micro Focus JVM COBOL compiler generates bytecode directly, it does not go through an intermediate step.
The second challenge comes from the way JVM code runs. I explained this in detail in Tuning the JVM for Unusual Uses - Have Some Tricks Under Your Hat. Which explains the interpretation/profiling, compilation, and native phases of JVM execution.
JVMs are designed to function very fast for long running processes. The longer they run, the faster they get.
Because of this, we need a way to run COBOL programs over and over again, or many separate COBOL programs, using the same JVM process.
The third challenge is how to we recreate all the advantages of a running batch file but in a way which permits very efficient use of JVM based COBOL?
The Approach
JavaScript comes to the rescue. JavaScript is an amazingly simple yet powerful programming language. There is a pure JVM implementation of it called Rhino. Rhino is Open Source and is developed by the Firefox people - Mozilla.
You can do amazing things with Rhino and JVM COBOL. I will be writing an entire post on this in the near future. However, the key thing is that we can call JVM COBOL programs directly from scripts. For example, to run the program cobol_2 (see source at bottom of this post) all that is required in JavaScript is Packages.cobol.cobol_2.main(null);
, yes it is that simple! This line looks for the program in a JVM namespace (Java people call these things packages). To put it there, all I needed to do was compile it using the Micro Focus compiler like this:
cobol cobol_2.cbl jvmgen noanim ilnamespace(cobol);
I am able to compile exactly the same code to native by doing this:
cobol cobol_2.cbl opt;
cbllink cobol_2
By so doing, I have created identical JVM and native COBOL programs. To execute the native version from JavaScript requires runtime.exec("cobol_2")).waitFor();
. For more details, please look at the JavaScript source code below.
The Power of Javascript
The power of the JavaScript approach comes from the way the script does not need re-compiling when it is changed. Wrapping compiled programs up in an interpreted scripting language is a very rapid way of developing. I first started doing things this way when I wrapped up FORTRAN quantum mechanical code in a scripting language called TCL.
JavaScript is ubiquitous (you are probably running some right now in this web page), object oriented, fast, easy to use, and powerful. It makes a great choice for the batch control logic running large operations in a high performance language like COBOL.
This project was very much quicker and easy to do in JavaScript than it would have been in any compiled language.
You can think of this as 'Lego' programming. The big strong COBOL blogs can be arranged in any order by the JavaScript to make many different and useful systems without even touching the COBOL.
The key is that JavaScript can run a JVM COBOL program in the same JVM as the script. This is because JVM COBOL programs are fully JVM compliant just like Java classes.
Making the Test Realistic
Please take a look at cobol_2 and cobol_3 below. The first is a pure mathematical processing program. The second created a file of 10,000 indexed records and then reads them in by index. This latter program really highlights why COBOL batch processing is still so popular and essential. To create and read by index 10,000 records even in a powerful relational database takes a bit of time. In COBOL, it takes a couple of seconds on a laptop!
One difference in the approach between the execution of JVM COBOL and native is the process launch overhead; this being the time taken for the Operating System to launch the native COBOL processes. To measure this, the benchmarking JavaScript was run with a COBOL program which consisted of just a single goback
statement.
I performed the tests using the in development code for Visual COBOL 1.4. The release code in the general availability version later this year should be very similar in performance. I run the code on 32 bit mode on a Dell E6400 laptop with Windows and Enterprise 64bit installed.
The Results
To ensure fairness, I ran the test three times. Each test performed 32 runs of both programs in JVM COBOL and 32 runs of both programs in native COBOL. The time for each execution of both programs was measured in milliseconds using the JavaScript Date
object. The maximum, mean, minimum, and total execution times were recorded and reported.
Run 1
In this run, the JVM approach is slightly faster overall than the native.
Results:
=========
JVM
Maximum Time: 1300
Minimum Time: 381
Mean Time: 624.3125
Total Time: 19978
Native
Maximum Time: 1404
Minimum Time: 525
Mean Time: 661.1875
Total Time: 21158
Run 2
Again, in this run, the JVM approach is slightly faster overall than the native.
Results:
=========
JVM
Maximum Time: 1283
Minimum Time: 385
Mean Time: 670.53125
Total Time: 21457
Native
Maximum Time: 1559
Minimum Time: 526
Mean Time: 705.03125
Total Time: 22561
Run 3
Here we see the native approach just pepping the JVM one. Really, there is nothing to choose between them.
Results:
=========
JVM
Maximum Time: 1320
Minimum Time: 390
Mean Time: 734.25
Total Time: 23496
Native
Maximum Time: 1782
Minimum Time: 528
Mean Time: 698.75
Total Time: 22360
CPU bound: I did one run were the file handling program was not called. This means that the only code running was the mathematical code. The results here were shockingly in favour of the JVM approach. If I had made a choice between native and JVM based on the first iteration, I would have though native was faster. Based on the group of 32 iterations, JVM proves to be twice as fast as native. However, this is also misleading if we include the process launch overhead.
Results:
=========
JVM
Maximum Time: 354
Minimum Time: 59
Mean Time: 101.09375
Total Time: 3235
Native
Maximum Time: 272
Minimum Time: 169
Mean Time: 203.53125
Total Time: 6513
First iteration:
JVM = 354
Native= 272
Process Launch Overhead
Results:
=========
JVM
Maximum Time: 313
Minimum Time: 0
Mean Time: 9.84375
Total Time: 315
Native
Maximum Time: 72
Minimum Time: 54
Mean Time: 57.84375
Total Time: 1851
The process launch overhead is around 1.5 seconds over a 32 process launch cycle. This is insufficient to qualitatively change any of the results above. For example, if we take the 1.5 second off the 6.5 second time for the CPU bound test, JVM COBOL is still 1.8 (55%) faster than native.
The Conclusions
- Running a small program in JVM and native COBOL in no way acts as a benchmark for real world performance.
- Running JVM COBOL from JavaScript is a jaw droopingly fast and easy way to implement batch processing.
- JVM Managed COBOL has comparable performance in typical batch applications to native Micro Focus COBOL on 32 bit Intel architecture (other platforms not tested).
- Process launch overhead is significant. This approach is better than traditional batch processing because it overcomes the process launch overhead. However, even when factoring out process launch overhead, JVM COBOL performance is still no slower than native across the ranges of tests performed here.
The Appendix
JavaScript
function bench()
{
this.min = 10000000;
this.max = 0;
this.count = 0;
this.current = 0;
this.total = 0;
this.update = function()
{
if(this.current > this.max)
{
this.max = this.current;
}
if(this.current < this.min)
{
this.min = this.current;
}
++this.count;
this.total += this.current;
}
this.mean = function()
{
return this.total / this.count;
}
this.display = function()
{
display(" Maximum Time: " + this.max);
display(" Minimum Time: " + this.min);
display(" Mean Time: " + this.mean());
display(" Total Time: " + this.total);
}
}
var jvm = new bench();
var nat = new bench();
var its = 32;
var runtime=java.lang.Runtime.getRuntime();
display("JVM COBOL Benchmark");
for(var i=0;i<its;++i)
{
var start = (new Date()).getTime();
Packages.cobol.cobol_2.main(null);
Packages.cobol.cobol_4.main(null);
jvm.current = ((new Date()).getTime())-start;
display("" + jvm.current);
jvm.update();
}
display("Native COBOL Benchmark");
for(var i=0;i<its;++i)
{
var start = (new Date()).getTime();
(runtime.exec("cobol_2")).waitFor();
(runtime.exec("cobol_3")).waitFor();
nat.current = ((new Date()).getTime())-start;
display("" + nat.current);
nat.update();
}
display("");
display("Results: ");
display("=========");
display("JVM");
jvm.display();
display("Native");
nat.display();
function display(what)
{
java.lang.System.out.println(what);
}
cobol_2.cbl
123456$set sourceformat(variable)
01 my-group.
03 counter pic s9(9) comp-5.
03 a pic s9(9) comp-5.
03 b pic s9(9) comp-5.
03 r pic s9(9) comp-5.
move 123456789 to a b r
perform varying counter from 1 by 1 until counter = 1000000
compute r = (a + b) / (a - b)
compute r = (r + b) / (a - b)
compute r = (r + b) / (a - b)
compute r = (r + b) / (a - b)
compute r = (r + b) / (a - b)
end-perform
.
cobol_3.cbl
123456$set sourceformat(variable)
input-output section.
file-control.
select source-file
assign to disk "count.idx"
organization indexed
access dynamic
record key is r-key
status is source-status.
data division.
file section.
fd source-file.
01 source-record.
03 raw-line pic x(256).
03 source-line redefines raw-line.
05 filler pic x(7).
05 r-key pic x(10).
05 source-body pic x(249).
working-storage section.
01 counter binary-long.
01 check binary-long.
01 source-status pic 99.
procedure division.
open output source-file
perform varying counter from 1 by 1 until counter = 10000
move counter to source-body r-key
write source-record
end-perform
close source-file
open input source-file
perform varying counter from 1 by 1 until counter = 10000
move counter to r-key
read source-file
move source-body(1:9) to check
if check not = counter
display "woops"
end-if
end-perform
close source-file
Launching JavaScript
To launch JavaScript easily, I created a batch file with this line in it:
"\Program Files (x86)\Java\jdk1.6.0_21\bin\java"
-server org.mozilla.javascript.tools.shell.Main %1 %2 %3 %4 %5
A final note: I have tried very hard to compare like with like in this post. In the mathematical COBOL, I have compared the use of comp-5 group items. For native and JVM COBOL, this means that for each calculation, the program should load and store the calculated value from a block of memory allocated to working storage. If the COBOL is changed like this (remove the my-group label)...
01.
03 counter pic s9(9) comp-5.
03 a pic s9(9) comp-5.
03 b pic s9(9) comp-5.
03 r pic s9(9) comp-5.
... the compiler is able to treat the working storage items in a much more efficient way in JVM COBOL. The result for the calculation done this way is:
Results:
=========
JVM
Maximum Time: 342
Minimum Time: 0
Mean Time: 11
Total Time: 352
Native
Maximum Time: 223
Minimum Time: 183
Mean Time: 195.09375
Total Time: 6243
Yes, here JVM COBOL is running nearly 20 times faster than native. However, this is a somewhat artificial situation and so I am not including it in the results used for this post. Again, factoring in the process launch overhead only reduces this difference qualitatively to 13.5 times; JVM COBOL remains very much faster.