Logstash GROK Filter
Grok is a filter used by Logstash to parse unstructured data to structured format which is queryable by elasticsearch. Logstash comes with almost 120 patterns by default. Grok can be used to parse any logs which is not included in the Logstash default shipping like any custom logs from your application.
http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/ are great tools for building and testing grok patterns with any log formats.
Grok works by combining text patterns into something that matches your logs.
The syntax for a grok pattern is %{SYNTAX:SEMANTIC}
SYNTAX represent name of the pattern that would match the strings exerts from the logs files annd SEMANTIC represent the identifier for each matched patterns.
For example, a given log input of “100 192.168.10.10” and a grok filter %{NUMBER:duration} %{IP:ipaddress} the data will be parsed as
duration : 100
ipaddress : 192.168.10.10
Optional data types conversions can be incorporated at the time of parsing. For example %{NUMBER:number:int} would convert the number SEMANTIC into integer format. Supported data types are int, float and IPV4.
Visit http://grokdebug.herokuapp.com/patterns for a list of patterns
Apache access log file parsing
Log sample : 10.10.5.1 GET /index.php 5685 0.063
The pattern for this could be:
%{IP:sourceip} %{WORD:http_method} %{URIPATHPARAM:requested_url} %{NUMBER:bytes_transfered} %{NUMBER:duration}
input { file { path => "/var/log/http.log" } } filter { grok { match => { "message" => "%{IP:sourceip} %{WORD:http_method} %{URIPATHPARAM:requested_url} %{NUMBER:bytes_transfered} %{NUMBER:duration}" } } }
This configuration would parse /var/log/http.log one line at a time and the filter output data will have extra fields as below
sourceip: 10.10.5.1
http_method: GET
requested_url: /index.php
bytes_transfered: 5685
duration: 0.063
Grok works on the basis of regular expressions. Regular expressions matches are valid in grok as well. Visit https://github.com/kkos/oniguruma/blob/master/doc/RE for regex formats.