Category Archives: equanda

code generation optimization

When generating code, it is useful to consider more than just assuring that the generated code works. Of course this is always the most important issue, but it can be very useful to be tune the process.
One of the problems with code generation, is that it invites you to generate a lot of it. Which is good as all the code which is generated does not need to be written, but it can easily cause serious development delays during compilation. On a project I was working on (depending on the development system), clean compiles easily took 10 minutes. And that was without unit testing. Faster systems (and especially faster disks and the use of linux instead of windows) can partly improve this, but you still have to wait. This has some nasty results as developers switching to their e-mail, or starting to browse the internet, which effectively increase the “wait time” by a large factor.

One of the ways to improve this is to tune the generated code. Chances are that the generated code is largely boilerplate, and that a lot of it can be reused across different classes instead of generated entirely for each one.

  • Use inheritance. Often the generated classes contain methods which are present in each class. These should be moved to a common base class.
  • Split methods into parts which are the same for each class (and moved to a base class), and the parts which are different (generated in the template). You could make the base class abstract and provide an entry point for the changing stuff. Some uses could be to create instances of objects (which is also not possible using generic code).
  • Use generics. In the transformation I did a lot of the differences between classes was differing types. This can easily and powerfully be solved by using generics.

Obviously you need a fully working test suite to be able to refactor your code like this. This assures the generated code will still work and can trap some cases where reuse fails (for example, I had problems moving shared code to a common base class for EJB3 beans, the recommended solutions seem to fail in my environment).
There are many advantages to be gained.

  • The code generation itself is faster as less code needs to be written (if the number of files can be reduces this is even better).
  • The compilation itself is faster. Especially java file size and number of embedded (anonymous) classes seem to increase compilation time, though this is just an impression, I haven’t measured this.
  • All the code which is moved out of the templates is more practical to edit. IDE help is very limited when editing templates, making it difficult to use code completion, error reporting etc. In the base classes, there is no such problem as this ordinary code.
  • Code should execute faster. When the code from base classes is reused, this means the code is also executed more often, so the hotspot compiler will kick in faster and do a better job of compiling these code fragments.
  • The jar file will be smaller, which means you will not need to increase permgen as quickly and java heap can be bigger without causing the disk cache to be used (you should always assure your java programs don’t use the disk cache, the garbage collection will cause your program to trash).

I have just finished such a cleanup operation on equanda. For the core module (which generates the EJB3/JPA data access objects), I have been able to reduce the generated jar to 77% of the original size. More importantly, the compile time was also vastly reduced. Though it took some time to do, the code review which was done in the process also uncovered some problems which were fixed also fixed. All in all, it was well worth the effort and brings the release of the first release of equandaa a lot closer.

http post data to servlet with form-based authentication

Accessing web pages or servlet which are not authenticated or which use basic authentication is relatively easy, especially when using the commons-httpclient library.

However, things get more tricky once the page you try to access is using form based authentication. Unfortunately, httpclient does not handle that automatically, so you have to code this yourself.

This can be done with the following code which handles both the cases of BASIC and FORM based authentication.


    public String getPage( String urlString, ImportCredentials credentials )
        throws Exception
    {
        HttpClient httpConn = new HttpClient();
        httpConn.getParams().setAuthenticationPreemptive( true );
        httpConn.getState().setCredentials( AuthScope.ANY, new Credentials( credentials ) );
        HttpConnectionManagerParams conpar = new HttpConnectionManagerParams();
        conpar.setConnectionTimeout( 5 * 60000 ); // 5 minutes
        conpar.setSoTimeout( 5 * 60000 ); // 5 minutes
        httpConn.getHttpConnectionManager().setParams( conpar );
        GetMethod get = new GetMethod( urlString );
        get.getParams().setCookiePolicy( CookiePolicy.BROWSER_COMPATIBILITY );
        get.setDoAuthentication( true );
        httpConn.executeMethod( get );
        String responseBody = get.getResponseBodyAsString();
        if ( responseBody.contains( "j_security_check" ) )
        {
            NameValuePair[] data = new NameValuePair[2];
            data[ 0 ] = new NameValuePair( "j_username", credentials.getUsername() );
            data[ 1 ] = new NameValuePair( "j_password", credentials.getPassword() );
            String loginUrl = urlString;
            while ( loginUrl.length() > 0 && loginUrl.charAt( loginUrl.length() - 1 ) == '/' )
            {
                loginUrl = loginUrl.substring( 0, loginUrl.length() - 1 );
            }
            PostMethod authpost = new PostMethod( loginUrl + "/j_security_check" );
            authpost.setRequestBody( data );

            //Release Get Connection
            get.releaseConnection();

            int httpRes = httpConn.executeMethod( authpost );
            responseBody = authpost.getResponseBodyAsString();
            if ( httpRes == 301 || httpRes == 302 || httpRes == 307 )
            {
                // redirected, get content page
                get = new GetMethod( authpost.getResponseHeader( "Location" ).getValue() );
                get.setRequestHeader( "Content-Type", "text/plain; charset=UTF-8" );
                authpost.releaseConnection();
                httpConn.executeMethod( get );
                responseBody = get.getResponseBodyAsString();
                get.releaseConnection();
            }
        }
        else
        {
            get.releaseConnection();
        }
        return responseBody;
    }

Unfortunately I fear this is application server specific. It may be that some application servers do not have a page called “j_security_check”. In that case the test and authentication URL will need to be modified.

The example above is is the simple case where you are just getting a page. It gets more tricky if the original request is a http POST. Basically, you could just replace the original get by a post, but this can give character set problems. The servlet engine does not seem to remember the character encoding across the different requests (and explicitly setting them does not work either), so the only solution is to do a dummy request first, just to assure the communication is authenticated, and then do the “real” request with the correct character encoding.

The end result (from the equanda ImportUtil class) looks like this:


   public static String importData( String importData, String urlString, ImportCredentials credentials )
        throws Exception
    {
        HttpClient httpConn = new HttpClient();
        httpConn.getParams().setAuthenticationPreemptive( true );
        httpConn.getState().setCredentials( AuthScope.ANY, new Credentials( credentials ) );
        HttpConnectionManagerParams conpar = new HttpConnectionManagerParams();
        conpar.setConnectionTimeout( 5 * 60000 ); // 5 minutes
        conpar.setSoTimeout( 5 * 60000 ); // 5 minutes
        httpConn.getHttpConnectionManager().setParams( conpar );
        PostMethod post = new PostMethod( urlString );
        post.setRequestHeader( "Content-Type", "text/plain; charset=UTF-8" );
        post.getParams().setCookiePolicy( CookiePolicy.BROWSER_COMPATIBILITY );
        post.setDoAuthentication( true );

        // no data yet, need to assure the authentication is done first
        httpConn.executeMethod( post );
        //get response
        String responseBody = post.getResponseBodyAsString();
        if ( responseBody.contains( "j_security_check" ) )
        {
            NameValuePair[] data = new NameValuePair[2];
            data[ 0 ] = new NameValuePair( "j_username", credentials.getUsername() );
            data[ 1 ] = new NameValuePair( "j_password", credentials.getPassword() );
            String loginUrl = urlString;
            while ( loginUrl.length() > 0 && loginUrl.charAt( loginUrl.length() - 1 ) == '/' )
            {
                loginUrl = loginUrl.substring( 0, loginUrl.length() - 1 );
            }
            PostMethod authpost = new PostMethod( loginUrl + "/j_security_check" );
            authpost.setRequestBody( data );

            //Release Get Connection
            post.releaseConnection();

            int httpRes = httpConn.executeMethod( authpost );
            if ( httpRes == 301 || httpRes == 302 || httpRes == 307 )
            {
                // redirected, get content page
                GetMethod get = new GetMethod( authpost.getResponseHeader( "Location" ).getValue() );
                authpost.releaseConnection();
                httpConn.executeMethod( get );
                get.releaseConnection();
            }
        }
        else
        {
            post.releaseConnection();
        }

        // now do the real post, as otherwise the character set is not remembered
        post = new PostMethod( urlString );
        post.setRequestHeader( "Content-Type", "text/plain; charset=UTF-8" );
        post.setRequestBody( importData );
        httpConn.executeMethod( post );
        responseBody = post.getResponseBodyAsString();
        post.releaseConnection();
        
        return responseBody;
    }

generate versus annotate, code generation is still an important tool

In 2003, when I was only just starting to write JEE applications, each enterprise bean (using EJB2) required you to write a set of files. This was a pain, though fortunately there were tools like xdoclet to aid in this. You just had to add some javadoc attributes and this was used to generate a lot of the files, assuring the programmers did not need to cut and paste too much (and avoiding all the mistakes you make because it is just so plain boring).

This concept has later been extended into annotations, where instead of compile time generation of java source files, the information is read at runtime, and everything is generated on the fly. This is a big (development) boost as compile cycles are a lot shorter (xdoclet was not particularly fast), and the technology has been used for many good uses (see EJB3 and JPA, tapestry,…)

To jump-start my way in the JEE world, I followed a “Jboss advanced” course. While talking to the other participants at the course, it was clear to me that most of us did not believe xdoclet to be enough. Many (including me) were developing some kind of framework to add even more metadata, improve consistency of the program and reduce work.

Even with the recent evolutions, with annotations, I still believe there is a need for such frameworks. Annotations are a great tool to add metadata, but it is limited to adding information about one class. While it would be possible to use them to generate entire new classes when needed, this would be a pain as these classes would then not be available when writing code (more exactly, the IDE would not be able to help you to reference these classes, and your compiler would also be unhappy). This is a limitation that affects productivity again. There are also cases where it would be useful to be able to customize some of the behaviour which is annotated. In some cases this is more easily done using other methods.

I am still a big proponent of compile time code generation. To assure programmer and user interfaces to a system are consistent, there is not a lot that can beat the effectiveness of code generation. It is often forgotten is just how much boiler plate code is being written all the time.

If you are going to write an enterprise application, you are going to need a data access layer, a crud user interface, probably some web services to access the data layer, some basic (preferable user customizable) reporting, possibly integration with a full text search engine etc. Just for the crud user interface, you need access control and rights management (who is allowed to see or edit which fields, partly determined by the administrator and partly by the users themselves), you need consistency so that for example linking records is always done in the same way, easy navigation inside the forms, aids to have fast keyboard entry of data etc.

That is a lot of boilerplate code to produce, and a lot of code to maintain and modify each time you add a table or a field in your data layer.

Unfortunately, these modifications, or the evolution of software is often also one of the weak points of code generation. It happens a lot that code generation is static. I mean that the code which is generated has a certain behaviour, and it can only be changed by editing the generated code. The end result is one time generated code, where the generated stuff is committed in the source repository and maintained like any other code. This way you can easily generate a (possibly very powerful) prototype, but the problem of maintaining the software has not become any easier (probably the reverse as you now need to maintain foreign code). In a demonstration that was given at JavaOne afterglow this year, it seemed that Ruby on Rails for example suffers from this problem.

Better would be to assure the generated code has enough hooks to allow customization, but is smart enough to assure that these hooks are not removed/deleted/regenerated when the generation is done again (and does not need modifications). This is for example how it is handled in torque, one of the first tools tools I have seen to implement this idea.

While I believe there are probably quite a few frameworks based on advanced code generation in use to aid in software development and maintenance, I assume most of them have been developed in-house as part of certain projects and are not available for general use. On the open source front, I know only of tools which help with some of the aspects which are given above, usually the persistence problem, but only equanda seems to have the intention of covering the whole spectrum of boilerplate code (a lot of the examples above already working (not the web services or full text search engine integration) and some other stuff thrown in for good measure).