Our application produces a lot of logs. Sometimes we want to extract some specific information from it, for example the timestamps and duration of the REST calls in it or have a specific list of all the errors (possible excluding some common ones).
You can use logstash to do this kind of filtering. In this case, extracting information from one or more log files into specific files.
Let’s start by defining the input files.
input { file { type => "logfile" path => [ "/home/joachim/temp/logs/*.log" ] sincedb_path => "/dev/null" } } |
We just specify where the logs we want to process can be found. Note that only changes which occur after logstash is started will be processed.
Our logged items can contain multiple lines so lines need to be combined. The data is also structured, so the fields can be extracted. This uses the format I already explained earlier.
filter { multiline { pattern => "^[\[~]" negate => true what => "previous" # enable_flush is not recommended but assures that the last log statement (multi-line) of the file is also processed enable_flush => true } grok { match => [ "message", "~%{NOTSPACE:application} %{TIMESTAMP_ISO8601:timestamp} \[%{DATA:server}\-%{DATA:thread}\] %{LOGLEVEL:severity}\s+%{JAVAFILE:category} \- %{GREEDYDATA:shortmessage}" ] } |
Now extract some additional fields which are not available in each message.
grok { match => [ "message", "Duration: %{NUMBER:duration:float}s" ] tag_on_failure => [] } grok { match => [ "message", "ldapid: %{WORD:ldapid}" ] tag_on_failure => [] } grok { match => [ "message", "role: %{WORD:role}" ] tag_on_failure => [] # deze regel matcht niet altijd } grok { match => [ "message", "organisation: %{WORD:organisation}" ] tag_on_failure => [] # deze regel matcht niet altijd } grok { #match => [ "message", "Service: %{WORD:http_command} %{URI:endpoint}" ] match => [ "message", "Service: %{WORD:http_command} %{NOTSPACE:endpoint}" ] tag_on_failure => [] } |
Specify the date format.
date { # 2013-09-23T11:27:14.177+0200 match => [ "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ] } |
Now add some tags for the type of output. These are very practical for determining the output files. This is the last part of the filter section in the configuration file.
# add tag to type of output if [severity] == "ERROR" { mutate { add_tag => "error" } } if "b.v.a.d.s.util.ProfilingSchedule" in [message] { mutate { add_tag => "profile" } } if [endpoint] =~ /http/ { mutate { add_tag => "endpoint" } } } |
Now put the details we want in different files. We have one file which only contains the errors, one file which contains the profiling info and one file which contains a CSV file (space separated) of REST requests.
output { if "error" in [tags] and ( "huidige_gebruiker" not in [endpoint] ) { file { path => "dc-errors.txt" max_size => 20000000 message_format => "%{message}" } } if "profile" in [tags] { file { path => "dc-profile.txt" max_size => 20000000 message_format => "%{message}" } } if "endpoint" in [tags] { file { path => "dc-requests.txt" max_size => 20000000 message_format => "%{timestamp} %{http_command} %{endpoint} %{duration}" } } #remove this line and uncomment lines below for debugging #stdout { # message_format => "%{timestamp}" #} } |
Leave a Reply