Category Archives: web development

Managing Java logs with logstash and Kibana

Logstash is a tool to analyze and manage logs. You can collect, parse, store and forward log entries. The data is stored in elasticsearch. You can then analyse, visualise and search through the logs using Kibana. To make it easy for you to try, the basic logstash distribution combines all three tools to make it easy for you to install.

The main strength of logstash compared to many other log management solution is the ability to dissect your log lines and split in in the field, allowing you to search and correlate logs better.

For example, a log line like

~myApp 2013-12-06T17:43:04.234+0100 [0.0.0.0-http-10.32.92.147-8080-3] INFO  b.v.a.d.l.MyClass - logging something

can be split in the application, timestamp, server, thread, severity, class and message.

Also combinations of lines can be recognized allowing stack traces or other multi-line log messages to not be split. For example the following log entry for a REST call.

~dc-devt 2013-12-06T17:43:04.234+0100 [0.0.0.0-http-10.32.92.147-8080-3] INFO  b.v.a.d.l.PreProcessLoggingInterceptor - 
Service: GET http://10.32.92.147:8080/appContext/rest/service
UserId: itsmeagain
Response types application/json
Query Parameters: 
	limit -> [10]
	sortColumn -> [number]
	start -> [0]
Path parameters: 
Reply type: class myapp.PagedList
Output document:
{...contents snipped...}
Duration: 0.078s

To make this possible, you have to match your logging format with your logstash configuration.
I use the following logback format configuration:

<pattern>~%property{log.appkey} %date{yyyy-MM-dd'T'HH:mm:ss.SSSZ} [%property{log.server}-%thread] %-5level %logger{36} - %msg%n%xEx</pattern>

The resulting log lines are quite long as they contain additional information to allow aggregating logs from many applications and servers on one logstash server, they are not really meant for direct consumption.

In logstash, you can configure how to process the input. Let’s look at the configuration matching the format above.
Let’s start by defining the log sources.

input {
  file {
    type => "syslog"
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
  }
 
  file {
    type => "logfile"
    path => [ "/home/joachim/temp/*.log", "/home/joachim/temp/logs/*.log" ]
  }
}

In this case just define the files which need to be scanned. It is very practical that this supports wildcards.
I did a test using UDP to send an receive messages. This is crazy fast (as expected) but unfortunately the most of the messages never reached logstash (I expect they were dropped while logstash was still handling a previous message).

The output is sent to elasticsearch for storage. This will allow Kibana to access the data so you can visualize the result.

output {
  elasticsearch 
  {
    embedded => true
  }
}

Now we need to add the filtering of the log messages. First I want to combine the lines which are part of the same log message. To allow us to figure out which lines that belong together, I included a tilde as marker to indicate that a new log message starts. The filter basically says that all lines which do not start with a tilde need to be appended to the previous line.

filter {
 
  multiline {
    pattern => "^[\[~]"
    negate => true
    what => "previous"
  }
 
  # add other filters
}

We already defined the filter block, so additional filters need to be added inside that block.

We need to disect the log fields according to the pattern we configured.

grok {
  match => [ 
    "message", "~%{NOTSPACE:application} %{TIMESTAMP_ISO8601:timestamp} \[%{DATA:server}\-%{DATA:thread}\] %{LOGLEVEL:severity}\s+%{JAVAFILE:category} \- %{GREEDYDATA:shortmessage}"
  ]
}

I also add some rules to extract additional fields when present. This extract data from specific log messages like the logging of the REST calls above.

grok {
    match => [ "message", "Duration: %{NUMBER:duur:float}s" ]
    tag_on_failure => [] 
  }
 
  grok {
    match => [ "message", "UserId: %{WORD:ldapid}" ]
    tag_on_failure => [] 
  }
 
  grok {
    match => [ "message", "Service: %{WORD:http_command} %{URI:endpoint}" ]
    tag_on_failure => [] 
  }

Finally, make sure the timestamp from the message is used as timestamp of the log entry (otherwise the time of processing the message is used).

date {
  # 2013-09-23T11:27:14.177+0200
  match => [
      "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
  ]
}

With these settings and some logs in which are processed, the logs can be investigated and interesting things can be done.

The dashboard which is included in the logstash distribution, Kibana allows you to customize the view on the log data in your elasticsearch database.
You can customize Kibana by adding several dashboards. In each dashboard you can add rows and for each row you can add additional panels which display the information in a specific way. To access the data, you can define filters and query and these can be combined. For example the following two rows in the dashboard show (based on the configuration above) an overview of events, an overview of logged errors and an idea of the number of total, mean and maximum duration of the REST calls.

logstash-graphics

When viewing the details of the events you can choose the columns which are visible and by clicking on the line see the full details of a log entry.

logstash-events

There is also a view of the fields (used to select the event column) which also allow quick summaries to be displayed.

logstash-fields

What is Geomajas?

Geomajas, a GIS framework for the web, focuses on efficient data delivery, security and performance.

Geomajas is open source, anyone can see how the solution is built and there is a world-wide community active on the project. The openness guarantees high quality as there are many eyes looking at how things are done. It is distributed under the GNU Affero General Public License (AGPL).

Integrate, transform and secure GIS data

Geomajas is an application framework which is focused on building spatial applications. You can easily and generically access spatial data. Data sources are abstracted to raster or vector data.

The command pattern is used for remote access to the framework. Internally, everything is wired using spring services (1). Certain complex operations are implemented using a pipeline. This combination makes it possible to customize almost everything. It is easy to add extra commands to the framework and it is easy to customize the pipelines. The latter can be data source (2) specific.

Geomajas structure

The framework is built using many plug-ins. The back-end provides a basic set of commands and services and wiring code. Specific implementations (for example data sources) or extensions (like security implementations, printing,… are handled in plug-ins. This gives you a lot of flexibility, You can choose to include only the functionality you need, thus keeping your application lean.

Most of the spatial services are implemented using a pipeline. This is a construct which can be extended or replaced specific for each data source. Extension can be done by adding interceptors around a group of steps or by adding individual steps.

Pipeline structure

Services can be transactional and you can combine information from several data sources, add (business) rules to evaluate,…

Various data sources are available to access spatial databases and standard GIS data sources.

The data which is accessible in Geomajas is grouped in applications. Each application can contain a set of maps. Each map is itself built from a list of layers. A layer is a description of a data source with some additional configuration (like styles). A map can be part of several applications and a layer can be included in many maps.

The framework is cloud-ready. It the load is too large to handle on one machine, you can simple add extra machines to handle more load. There is no need for sharing state between machines except for caching. The caching itself uses a distributed cache (3) to assure this can also be scaled out.

Security

Security is fully integrated in the Geomajas framework. When accessing the commands, the framework uses a security token to request the policies from a policy server. The token itself is provided by an authentication service. The framework itself assure that the policies are enforced.

Security context

The framework does not itself handle passwords or store policies (though a specific security plug-in might do so). This handling can be redirected to specialized services (4).

Security login

Because of the deep integration in the framework, all data can be affected by security. For example, raster images may have parts cleared when the area is not allowed to be visible.

Caching

Caching is integrated into Geomajas (5). This integration has several advantages. It is security aware, the security context is included in cache checks. Also, the cache is integrated with both the read and write of the data sources. The cached entries are automatically invalidated when data is written to the data source. This way the data can be cached and still up-to-date. The need to clear caches regularly to prevent cached data from being too stale is also removed (6).
…..

Faces

To make data available again Geomajas uses faces. Faces convert data and commands to allow access using a client specific protocol. While this focused on providing map interaction in the browser in the past, in future there will be support for various standard formats like WMS, WFS etc.

The faces fully tap into the power of the framework. The client only has to support one format for communication with the server. All conversions to and from that format are handled on the server. This format can be very efficient to reduce both transfer time and ease processing on the client, making the client more light-weight. For security, you only need to provide your credentials once (see above). No credentials are stored on the client so they cannot be intercepted. All the server enhancements which are configured on the server are seamlessly available. This includes caching, rasterization,…

Standard functionality

Geomajas currently has the following set of plug-ins:

  • Faces
    • GWT face using the SmartGWT widget toolkit. Ideal face for developing rich browser based applications in Java.
    • Pure GWT face. A face for building browser based applications which does not depend on a specific widget library. This is best for people who do not want do use SmartGWT and for mobile applications (as it is more lightweight than the GWT face).
    • Dojo face, a face for building web applications for Geomajas using JavaScript and the dojo library. This is no longer recommended (use one of the GWT faces instead).
    • REST face which allows communicating to Geomajas using GeoJSON.
  • Data sources
    • GeoTools layer: provides access all vector data sources supported by GeoTools, most significantly to WFS data and data stored in ArcGIS.
    • Google layer: provides access to Google maps raster data.
    • OpenStreetMap layer provides access to rendered OpenStreetMap data (various renderings). This can be customized to access many tile server which use Mercator as projection.
  • Other
    • Caching: provides powerful caching which can stay up-to-date with regards to changes in the data source and which is fully integrated in the framework, including security.
    • Rasterizing: fast access to vector data by offloading the actual rasterizing to the server.
    • Geocoder: geocoder which allows location coordinates to be found based on a string representation like address or POI name.
    • Printing: simple printing plug-in which allows a map to be exported as a PDF file.
    • Reporting: integration of Geomajas and JasperReports, allowing maps to be included in a report. This report can be exported to various formats including PDF, Word, Open Document Format,…
    • Static security: security provider which contains a static configuration of the roles.

There are also a couple of plug-in which contain additional widgets for the GWT face.

Geomajas is supported by Geosparc. There is a Geosparc Edition which is more stable and for which commercial support is available. The Geosparc edition is available under AGPL or a commercial license. Geosparc provides training, SLA’s and consulting services through it world-wide partner network.

More details:
http://geomajas.org/
http://geosparc.com/

(1) Dependency injection or Inversion of Control.
(2) Data sources are often usually called layers.
(3) Infinispan
(4) Examples include OpenAM (formerly OpenSSO), Kerberos, PERMIS,…
(5) In fact, the caching shows the power of configuration in Geomajas. All the caching is handled in a plug-in. There are nothing caching specific in the core, only configuration hooks.
(6) This obviously assumes all writes are through the Geomajas services. If this is not the case, limiting the cache validity is highly recommended.

Activiti BPM first steps

After many years of following what is happening in the BPM world, I have gotten round to properly integrating BPM in a real project. This is a project which includes an approval process including spatial evaluations. The plan was to build it around Activiti and Geomajas. Because of budget restrictions, they two parts are conceived as independent applications which use mutual links to connect the two parts.

The good news is that it all worked. There are some things I would do differently if I knew what I know now.

I started building the process in the Activiti modeler. This is a very nice web application which allows you to easily build your process diagram. There is nothing to install (apart from Activiti itself) and it works really nice and is user-friendly. Until you want to make your process runnable that is. The process I built could not be loaded by Activiti. So I switched to the Activiti designer application. This is supposed to import the models from the Activiti modeler without problem. Apparently not. You are better of immediately starting with the Activiti Designer. It takes a bit of getting used to if you are not used to Eclipse but the application itself is really nice and works well.

On to the next step, building the BPM dashboard. The plan was to reuse and customize the Activiti explorer. I expected to be able to download the application and use that as war overlay. Don’t work. So I copied (should I say forked) the original project to customize, trying to build one war for the front-end. Turned out to be a lot of work (meaning I gave up). As an end result, I just copied the Activiti Explorer and Activiti REST wars produced by the Activiti installer and explained what changes are necessary to make it fit in our environment. Previously, in their pre-Activiti days, I heard some of the Activiti developers claim that there is no need to develop your own BPM user interface. It seems they are no longer going that path.

Once the initial hurdles were taken, it was more straightforward. The process was running nicely. What remains is connecting the two applications. From the BPM dashboard, you need to be able to go to the mapping application to investigate some data. In some cases, the actual work is done in the mapping application and the “finish task” should actually be handled there instead of in the BPM dashboard.

To link to the mapping dashboard, a Spring bean was written which knows the URL of the other component and builds a link which includes the necessary query parameters. Unfortunately, tweaking Activiti explorer to use a spring enabled engine is non-trivial. The following steps were needed:

  • I had to replace the ActivitiWebScript and ActivitiStreamingWebScript classes by variants which allow the processEngine to be set using Spring.
  • I had to copy activiti-spring, spring-tx and spring-jdbc into the WEB-INF/lib folder of the Activiti REST application.
  • In Activiti REST’s WEB-INF/classes/web-application-context.xml I inserted the following
    <context:component-scan base-package="mypackage.bpm" />
    
    <bean id="ktunaxa" class="org.ktunaxa.bpm.KtunaxaConfiguration">
        <property name="mapDashboardBaseUrl" value="http://localhost:8080/map/"/>
        <property name="bpmDashboardBaseUrl" value="http://localhost:8080/activiti-explorer/"/>
    </bean>
    
    <bean id="dataSource" class="org.springframework.jdbc.datasource.SimpleDriverDataSource">
        <property name="driverClass" value="org.postgresql.Driver"/>
        <property name="url" value="jdbc:postgresql://localhost:5432/db"/>
        <property name="username" value="un"/>
        <property name="password" value="pw"/>
    </bean>
    
    <bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
        <property name="dataSource" ref="dataSource"/>
    </bean>
    
    <bean id="processEngineConfiguration" class="org.activiti.spring.SpringProcessEngineConfiguration">
        <property name="dataSource" ref="dataSource"/>
        <property name="transactionManager" ref="transactionManager">
        <property name="databaseSchemaUpdate" value="true">
    
        <property name="jobExecutorActivate" value="true">
    
        <property name="dbCycleUsed" value="true"/>
    
        <property name="deploymentResources" value="classpath*:diagrams/.*.bpmn20.xml"/>
    </bean>
    
    <bean id="processEngine" class="org.activiti.spring.ProcessEngineFactoryBean">
        <property name="processEngineConfiguration" ref="processEngineConfiguration" />
    </bean>
    
  • In the same file, for both the activitiWebScript and activitiStreamingWebScript beans, add the following setter
    <property name="processEngine" ref="processEngine"/>
  • To allow using HTML links in the task name or description we had to disable HTML code escaping. This was done in components/tasks/task-list.js in the activiti-explorer webapp. Lines 196 and 197 (Activiti 5.4) contain some $html() which should just be replaced by whatever is insie the brackets.

Building the link to go to the mapping application is done using this code

public String getUrl(DelegateExecution execution) {
    String objectId = (String) execution.getVariable(VAR_OBJECT_ID);
    return mapDashboardBaseUrl + "?" + QUERY_OBJECT_ID + "=" + objectId + "&" + QUERY_TASK_ID + "=" + execution.getId();
}

This code is in a Spring wired class to allow using it as ${myBean.getUrl(execution)} in the process definition.

When the actual task finishing needs to be done in the mapping component, a form like the following is used

<div>
  <p><a href="${myBean.getUrl(execution)}">Object in the mapping component</a></p>

<script type="text/javascript">
<!--
window.location = "${myBean.getUrl(execution)}";
//-->
</script>
</div>

Using that execution id in the mapping component to finish the task can be done using code like

Task task = taskService.createTaskQuery().executionId(taskExecutionId).singleResult();
if (task != null) {
  taskService.complete(task.getId());
}

The application then redirects back to the BPM dashboard.

Deploying works nicely in the Activiti probe application. I was expecting to be able to have it easier by using the deploymentResources engine configuration. I may have done something wrong, but it unfortunately did not work. It is needed in install the process stuff both as jar in WEB-INF/lib of Activiti REST and deploy it using Activiti Probe. The mapping component uses a plain Spring engine configuration which accesses the same database as the Activiti REST web application.

Conclusion, it works! However, the customer now wants the BPM front-end to be customized (e.g. allowing sorting of issues, reassigning tasks etc). The original effort of trying to figure out how to combine the independent web applications may have been spent better by building my own BPM front-end.