Logstash GROK Filter

Grok is a filter used by Logstash to parse unstructured data to structured format which is queryable by elasticsearch. Logstash comes with almost 120 patterns by default. Grok can be used to parse any logs which is not included in the Logstash default shipping like any custom logs from your application.

http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/ are great tools for building and testing grok patterns with any log formats.

Grok works by combining text patterns into something that matches your logs.

The syntax for a grok pattern is %{SYNTAX:SEMANTIC}

SYNTAX represent name of the pattern that would match the strings exerts from the logs files annd SEMANTIC represent the identifier for each matched patterns.

For example, a given log input of “100” and a grok filter  %{NUMBER:duration} %{IP:ipaddress}  the data will be parsed as

duration : 100

ipaddress :

Optional data types conversions can be incorporated at the time of parsing. For example  %{NUMBER:number:int} would convert the number SEMANTIC into integer format. Supported data types are int, float and IPV4.

Visit http://grokdebug.herokuapp.com/patterns for a list of patterns

Apache access log file parsing

Log sample : GET /index.php 5685 0.063

The pattern for this could be:

%{IP:sourceip} %{WORD:http_method} %{URIPATHPARAM:requested_url} %{NUMBER:bytes_transfered} %{NUMBER:duration}
input {
  file {
    path => "/var/log/http.log"
filter {
  grok {
    match => { "message" => "%{IP:sourceip} %{WORD:http_method} %{URIPATHPARAM:requested_url} %{NUMBER:bytes_transfered} %{NUMBER:duration}" }

This configuration would parse /var/log/http.log one line at a time and the filter output data will have extra fields as below


http_method: GET

requested_url: /index.php

bytes_transfered: 5685

duration: 0.063

Grok works on the basis of regular expressions. Regular expressions matches are valid in grok as well. Visit https://github.com/kkos/oniguruma/blob/master/doc/RE for regex formats.