code generation optimization

When generating code, it is useful to consider more than just assuring that the generated code works. Of course this is always the most important issue, but it can be very useful to be tune the process.
One of the problems with code generation, is that it invites you to generate a lot of it. Which is good as all the code which is generated does not need to be written, but it can easily cause serious development delays during compilation. On a project I was working on (depending on the development system), clean compiles easily took 10 minutes. And that was without unit testing. Faster systems (and especially faster disks and the use of linux instead of windows) can partly improve this, but you still have to wait. This has some nasty results as developers switching to their e-mail, or starting to browse the internet, which effectively increase the “wait time” by a large factor.

One of the ways to improve this is to tune the generated code. Chances are that the generated code is largely boilerplate, and that a lot of it can be reused across different classes instead of generated entirely for each one.

  • Use inheritance. Often the generated classes contain methods which are present in each class. These should be moved to a common base class.
  • Split methods into parts which are the same for each class (and moved to a base class), and the parts which are different (generated in the template). You could make the base class abstract and provide an entry point for the changing stuff. Some uses could be to create instances of objects (which is also not possible using generic code).
  • Use generics. In the transformation I did a lot of the differences between classes was differing types. This can easily and powerfully be solved by using generics.

Obviously you need a fully working test suite to be able to refactor your code like this. This assures the generated code will still work and can trap some cases where reuse fails (for example, I had problems moving shared code to a common base class for EJB3 beans, the recommended solutions seem to fail in my environment).
There are many advantages to be gained.

  • The code generation itself is faster as less code needs to be written (if the number of files can be reduces this is even better).
  • The compilation itself is faster. Especially java file size and number of embedded (anonymous) classes seem to increase compilation time, though this is just an impression, I haven’t measured this.
  • All the code which is moved out of the templates is more practical to edit. IDE help is very limited when editing templates, making it difficult to use code completion, error reporting etc. In the base classes, there is no such problem as this ordinary code.
  • Code should execute faster. When the code from base classes is reused, this means the code is also executed more often, so the hotspot compiler will kick in faster and do a better job of compiling these code fragments.
  • The jar file will be smaller, which means you will not need to increase permgen as quickly and java heap can be bigger without causing the disk cache to be used (you should always assure your java programs don’t use the disk cache, the garbage collection will cause your program to trash).

I have just finished such a cleanup operation on equanda. For the core module (which generates the EJB3/JPA data access objects), I have been able to reduce the generated jar to 77% of the original size. More importantly, the compile time was also vastly reduced. Though it took some time to do, the code review which was done in the process also uncovered some problems which were fixed also fixed. All in all, it was well worth the effort and brings the release of the first release of equandaa a lot closer.

One Comment

  1. extincimica says:

    Great post, didn’t thought reading it was going to be so amazing when I read your link.

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen