This Ain't Your Daddy's Java, Part 2

Thursday, April 05, 2007 -

The previous edition of this article was spontaneous and longer than I had intended. What I really wanted to do was write about the advancements in Java technology while using a recent DVR project to give those advancements a level of context.

I broke it off at 2000 words without even touching on Java itself, it was entirely application orientated. Part 2 will remedy that, but unfortunately in order to stay on topic this time, I'm not going to pick up where I previously left off. I feel confident that no one really cared to read the rest of that story anyway.

Before I begin with the details of Java's current state of technology, I have to explain something important so there aren't any misconceptions about performance.

You can't compare Java to other programming languages in ways familiar to most people, because while advancements have been made, Java is still a platform-independent interpreted language. Rather than being directly executed by the operating system, Java must first be interpreted by the Java virtual machine before it can be run.

I know, at this point it probably looks like I'm going to give you a lecture on how virtual machines work, but I assure that isn't the case. But I absolutely must make this point clear early on because it directly effects the speed at which applications run, and accounts for Java's many benefits and some of its drawbacks.

You now have a layer of abstraction between the hardware and the compiled application, and that alone will make Java applications run slower than they might otherwise have, had they been compiled into native code as is the case with C and C++. While most people know this, what they don't know is that it doesn't necessarily mean that Java applications will always run slower than native applications.

You'll find out why shortly.

The point here is to make sure that you understand that when you're talking about an applications performance, you need to mentally decouple three things in this order: application, VM, and language. Because native applications don't have a VM, you only have to decouple the application from the language, and I'll start with that.

Not every programmer is as talented as the next, so it is a given that not every applications performance problems are due entirely to the language in which it was written. Even the most capable language ever devised can still fall prey to an incompetent software engineer, that's a law of nature. We all know how an incompetent person can screw up the best laid plans.

Some prejudices against Java can be directly traced to the short comings of the programs author, though this is not always the case. You need to remember this one, because it applies to all languages, not just Java.

So when you are considering such things, remember carefully, it isn't always slow because of what it is made of, but can also be who made it.

This is not to say that performance problems have not plagued Java throughout its history -- they have. The problem wasn't so much that these short comings were never fixed, it's that a human beings first impression is always the strongest, and tends to make you incapable or unwilling to accept changes even when they are sitting right in front of your face.

Because we can mentally decouple the language from the VM in the case of Java, we don't actually have to consider the language itself at all. Java is strongly object orientated, more beautifully structured than C++, and contains primitives for compatibility and speed when they really shouldn't even exist in a real OO language. Few people will actually complain about the language syntax, so I'll move on to the VM.

Every successive version of Sun's JVM has seen significant improvements over the previous versions in all the ways that matter. One of the very first improvements after the 1.0 generation was a new kind of garbage collector, and you need to stay with me for a moment because this might reveal some information you aren't aware of.

Many people complain about not having direct memory control in Java and cite this as a prime example of why performance is less than spectacular. In the early VM's the presence of the garbage collector served the primary purpose of forcing a safer programming paradigm onto software engineers.

If you can't directly allocate and free memory, you can't very well forget to free it and accidentally cause a memory leak. While this does eliminate memory leaks entirely, as well as an entire class of faults where you try to reference memory that has already been freed (causing a crash) there are still issues with accidental object retention, but that's a different problem.

But it was slow, and to properly explain this, I will let someone who is far more knowledgeable than myself explain this. (Brain Goetz)

The 1.0 and 1.1 JDKs used a mark-sweep collector, which did compaction on some -- but not all -- collections, meaning that the heap might be fragmented after a garbage collection. Accordingly, memory allocation costs in the 1.0 and 1.1 JVMs were comparable to that in C or C++, where the allocator uses heuristics such as "first-first" or "best-fit" to manage the free heap space. Deallocation costs were also high, since the mark-sweep collector had to sweep the entire heap at every collection.

This fault gave rise to design patterns that effectively forced programmers to replicate old programming habits for managing memory. To avoid the cost of having objects caught up in the collector, people would begin pooling objects with arrays rather than giving them a null reference (making them eligible for the GC.) Today this is unnecessary, yet due to old bad habits or simple ignorance, people still code like this in Java which can cause slower performance due to unintentional gunk in the GC engine.

Todays very advanced GC can run concurrently with other active threats, essentially eliminating program pauses during memory recovery. This has been the case since at least 1.4.

Brian continues with the benefits of the 1.2 series, which is now four versions out of date.

In HotSpot JVMs (Sun JDK 1.2 and later), things got a lot better -- the Sun JDKs moved to a generational collector. Because a copying collector is used for the young generation, the free space in the heap is always contiguous so that allocation of a new object from the heap can be done through a simple pointer addition.

This is something I like to mention to people who refuse to believe that Java can be better than native programs at much of anything, much less memory management. The generational collector reduces the number of CPU instructions required to allocate an object to 10. This is anywhere between six and ten times faster than even the best malloc() implementations in C, and this isn't merely theoretical either.

According to another article by Brian Goetz, a study showed that by giving a number of native applications a very conservative garbage collector -- perl being a notable entrant -- applications can see a performance increase over manual memory management.

Freeing that memory, or more accurately collecting dead objects is as Brian has said, free.

Part 3 of this series will wrap up the advantages of the progressively better garbage collectors featured in the JVM, and introduce other advancements that give Java programs performance advantages over natively compiled applications such as escape analysis. If there is room, I will also address third-party contributions to the Swing interface library that allow native acceleration at the cost of platform independence, and Sun's own attempts to gain native interface acceleration.

Part 1.

Other posts from this blog: Java, Sun, Software Development, Programming

Like this post? Subscribe to RSS, or get daily emails:

Got something to say? Post a Comment. Got a question or a tip? Send it to me. If all else fails, you can return to the home page.