Category Archives: devops

Managing Java logs with logstash and Kibana

Logstash is a tool to analyze and manage logs. You can collect, parse, store and forward log entries. The data is stored in elasticsearch. You can then analyse, visualise and search through the logs using Kibana. To make it easy for you to try, the basic logstash distribution combines all three tools to make it easy for you to install.

The main strength of logstash compared to many other log management solution is the ability to dissect your log lines and split in in the field, allowing you to search and correlate logs better.

For example, a log line like

~myApp 2013-12-06T17:43:04.234+0100 [0.0.0.0-http-10.32.92.147-8080-3] INFO  b.v.a.d.l.MyClass - logging something

can be split in the application, timestamp, server, thread, severity, class and message.

Also combinations of lines can be recognized allowing stack traces or other multi-line log messages to not be split. For example the following log entry for a REST call.

~dc-devt 2013-12-06T17:43:04.234+0100 [0.0.0.0-http-10.32.92.147-8080-3] INFO  b.v.a.d.l.PreProcessLoggingInterceptor - 
Service: GET http://10.32.92.147:8080/appContext/rest/service
UserId: itsmeagain
Response types application/json
Query Parameters: 
	limit -> [10]
	sortColumn -> [number]
	start -> [0]
Path parameters: 
Reply type: class myapp.PagedList
Output document:
{...contents snipped...}
Duration: 0.078s

To make this possible, you have to match your logging format with your logstash configuration.
I use the following logback format configuration:

<pattern>~%property{log.appkey} %date{yyyy-MM-dd'T'HH:mm:ss.SSSZ} [%property{log.server}-%thread] %-5level %logger{36} - %msg%n%xEx</pattern>

The resulting log lines are quite long as they contain additional information to allow aggregating logs from many applications and servers on one logstash server, they are not really meant for direct consumption.

In logstash, you can configure how to process the input. Let’s look at the configuration matching the format above.
Let’s start by defining the log sources.

input {
  file {
    type => "syslog"
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
  }
 
  file {
    type => "logfile"
    path => [ "/home/joachim/temp/*.log", "/home/joachim/temp/logs/*.log" ]
  }
}

In this case just define the files which need to be scanned. It is very practical that this supports wildcards.
I did a test using UDP to send an receive messages. This is crazy fast (as expected) but unfortunately the most of the messages never reached logstash (I expect they were dropped while logstash was still handling a previous message).

The output is sent to elasticsearch for storage. This will allow Kibana to access the data so you can visualize the result.

output {
  elasticsearch 
  {
    embedded => true
  }
}

Now we need to add the filtering of the log messages. First I want to combine the lines which are part of the same log message. To allow us to figure out which lines that belong together, I included a tilde as marker to indicate that a new log message starts. The filter basically says that all lines which do not start with a tilde need to be appended to the previous line.

filter {
 
  multiline {
    pattern => "^[\[~]"
    negate => true
    what => "previous"
  }
 
  # add other filters
}

We already defined the filter block, so additional filters need to be added inside that block.

We need to disect the log fields according to the pattern we configured.

grok {
  match => [ 
    "message", "~%{NOTSPACE:application} %{TIMESTAMP_ISO8601:timestamp} \[%{DATA:server}\-%{DATA:thread}\] %{LOGLEVEL:severity}\s+%{JAVAFILE:category} \- %{GREEDYDATA:shortmessage}"
  ]
}

I also add some rules to extract additional fields when present. This extract data from specific log messages like the logging of the REST calls above.

grok {
    match => [ "message", "Duration: %{NUMBER:duur:float}s" ]
    tag_on_failure => [] 
  }
 
  grok {
    match => [ "message", "UserId: %{WORD:ldapid}" ]
    tag_on_failure => [] 
  }
 
  grok {
    match => [ "message", "Service: %{WORD:http_command} %{URI:endpoint}" ]
    tag_on_failure => [] 
  }

Finally, make sure the timestamp from the message is used as timestamp of the log entry (otherwise the time of processing the message is used).

date {
  # 2013-09-23T11:27:14.177+0200
  match => [
      "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
  ]
}

With these settings and some logs in which are processed, the logs can be investigated and interesting things can be done.

The dashboard which is included in the logstash distribution, Kibana allows you to customize the view on the log data in your elasticsearch database.
You can customize Kibana by adding several dashboards. In each dashboard you can add rows and for each row you can add additional panels which display the information in a specific way. To access the data, you can define filters and query and these can be combined. For example the following two rows in the dashboard show (based on the configuration above) an overview of events, an overview of logged errors and an idea of the number of total, mean and maximum duration of the REST calls.

logstash-graphics

When viewing the details of the events you can choose the columns which are visible and by clicking on the line see the full details of a log entry.

logstash-events

There is also a view of the fields (used to select the event column) which also allow quick summaries to be displayed.

logstash-fields