"Do more with less" is a frequent mantra when supporting legacy applications. After years of accumulating cruft these same applications are now expected to support the "big data" workloads of today. In order to achieve these performance levels, it is often necessary to run profilers to find and remove the bottlenecks. Before doing any optimization however, it is very helpful if the code is easy to understand. (We assume that the code is hard to understand because otherwise the optimization would have already been done.) Once it is improved, it becomes clear where optimizations can be made while minimizing changes to the rest of the code base. It is often assumed that high performance code must be difficult to read. This leads to the false assumption that easy to read code must not be optimal.
The Java Virtual Machine (JVM) has many strategies for optimizing code at runtime, but it is very limited in the time it has for determining which ones should be applied. As a result, the JVM works better on straightforward, easy-to-understand code. This is advantageous because most developers also prefer very readable code. What follows are a few techniques for writing both readable and performant code for the JVM.
The purpose of each method and class should be quickly and easily understood. If this is not the case it may be a good idea to refactor it into much smaller pieces. Break larger methods down into smaller methods that each have a clear singular purpose.
Eclipse, IDEA, and NetBeans each have their own approach to extracting methods and refactoring classes. The time you spend learning these features for your platform will pay great dividends in the future.
There is almost no performance penalty in the JVM for small private or static methods. These methods are easily inlined at runtime. Private and static methods greatly help readability by breaking down complex tasks into easy to understand parts with limited scope.
Public and protected methods are also frequently inlined but a little more analysis must be done at run time to ensure the correct behavior. This is because the implementation may have been overridden and can't be inlined without an analysis of how many times and where this has been done. Note that this same problem can also frustrate the developer. Projects with too many interfaces and abstract base classes can be very difficult to trace through. This is one of the reasons that composition over inheritance is generally encouraged.
The size of the method also plays a role. By default methods larger than 35 bytecodes will not be inlined unless they are called very frequently. It is not always clear how many bytecodes a particular Java method will be compiled into or how frequently it will be called by simply reading the source code. Fortunately the JVM provides some command line options to make this easier. Add the following to the arguments when starting up the JVM.
This command will write the method inlining details to the console at runtime. Pay attention to the last part of each line. If the line ends with "hot method too big" the JVM would have inlined this method had it been smaller. Break this method down into smaller parts or simplify it until the message disappears. If the line ends with only "too big" these methods should also be considered for refactoring into smaller pieces. They were not called frequently enough to be candidates for inlining; however, they were deemed to be too large by the JVM.
Methods are easier to understand when all the data fields they use are local. This is also helpful to the JVM because everything fits closely together on the stack as opposed to distributed out in the heap. When all the fields are near at hand they can be cached and prefetched by the underlying hardware. Keep in mind that references to member variables do have a price. The address for the object must be loaded and the field dereferenced in order to be used.
In simple cases, the JVM can eliminate redundant member references but it is a better approach to eliminate the clutter and make the code more readable by using local variables. Only use member variables within methods when it is required to modify or read the state of the class. Eliminate member references inside tight loops by reading the value once into a local variable before entering the loop.
Exposing members while they are being constructed or modified should always be avoided. This helps simplify the dependencies readers need to think about and reduces the risk of external code interacting with these variables at inappropriate times.
Sometimes it is unclear how to reduce the references within a method due to the tight coupling of the algorithm in use. In this case it may be better to create a class to represent a running instance of this algorithm which will then be created and used inside the large method. This should cause the method to be smaller because the logic has been moved to the new class and it also allows for easy replacement of the algorithm in the future.
Take care to make sure that the instance of the algorithm class does not get set to a member variable or get returned from the method. If this is done correctly, the JVM will use escape analysis to recognize that this object is only used locally. Once this is known, the JVM will not allocate a new object but instead will inline the object construction and usages. This makes use of the stack instead of the heap and eliminates the need to garbage collect this short lived object.
The happy path is the code that should be executed in the normal case when nothing goes wrong. Try to keep all the happy path code in one place where it will be sequentially easy to read. Call out as needed to methods specifically written to support the infrequent corner cases.
Modern CPUs are much faster than their memory subsystems so they prefetch data in order to maintain reasonable speeds. By ensuring the sequential nature of the happy path, the code will work in harmony with this prefetching behavior. At run time the JVM will inline those methods that are called most often. Assuming these methods are in the happy path, the prefetch will now be loading the body of these inlined methods. This now gives us fast, sequential execution and easy-to-think-about, clean readable code.
Primitives are better for raw performance but boxed values will be necessary when using the built-in collections classes. Mixing them together however can lead to unexpected performance issues and clutter. If possible, pick one style and be consistent.
Changing between the two styles can lead to overlooking the need to ensure boxed primitives are not null or introducing unnecessary defensive null checks. Null checks clutter up the code flow and obscure the work to be completed. The best approach is to ensure nulls are not produced at any point in the code.
Mixing both styles also leads to confusion any time equivalence
needs to be checked. Using
== is appropriate for
primitives but it is rarely the desired behavior when checking
boxed values. This is because
== is checking
identity ensuring both arguments are the SAME object rather than
Autoboxing adds hidden costs because it becomes a method call that internally may use a pool to limit the number of objects created. This "object pooling" is common with small Integers but may not be helpful because the garbage collector on modern JVMs is very efficient at reclaiming short lived objects.
new to explicitly box a primitive via the
constructor (as in
new Double(4.2) ) is always
faster than letting the JVM autobox. Surprisingly, this technique
is not much more expensive than using primitives due to the
efficiencies of the modern garbage collector.
Never use exceptions as a form of flow control. Limit their use to truly exceptional cases. Frequent use of 'try', 'catch' and 'finally' greatly detracts from easily understanding the code flow.
Checked exceptions have long been a pain point within Java due to the extra boilerplate they impose. A lesser known problem with exceptions is that the just-in-time compiler does not compile catch blocks. As a result any code within the catch block will never execute as quickly as other code.
The use of 'finally' can, at times, be excused because it is used to promote safety and clarity when releasing a finite resource such as a lock or connection handle. Caution should be used however because frequent use can easily lead to misunderstandings related to the order of execution. As stated previously, sequential code leverages the prefetch behavior of the underlying hardware and finally blocks can disrupt this.
When possible, refactor out the body of the catch into a single method. Then use a conditional check for that particular case. The body of the catch code will no longer be inside the happy path. The explicit check also encourages fail fast by pushing the contract requirements for the code up to the the front.
Now that the code is readable you are ready to fire up the profiler and begin the optimization process. After applying the above techniques, however, it may already be performing much better than expected. The recommendations here demonstrate that there is no reason to sacrifice readability for performance when developing on the JVM.