Data in Splunk Enterprise transitions through several phases.

Data in Splunk Enterprise transitions through several phases.

  • Input
  • Parsing
  • Indexing
  • Search

Three key functions as it move data through the data pipeline.  First, it consumes data from files, the network, or elsewhere. Then it indexes that data ( it first parses and then indexes the data. Finally, it runs interactive or scheduled searches on the indexed data.

You might create a deployment with many instances that only consume data, several other instances that index the data, and one or more instances that handle search request.   These are the component type available for use in a distributed environment:

  • Indexer
  • Forwarder
  • Search Head
  • Deployment Server
  • License Master
  • Cluster Master

The indexer is the Splunk Enterprise component that creates and manages indexer.

The primary functions of an indexer are:

  • Indexing incoming data
  • Searching the indexed data.

Universal forwarders:  These have a very light footprint and forward only unparsed data.

Heavy forwarders:  These have a larger footprint but can parse, and even index, data before forwarding it.

Splunk Data Sources

Files :                     Splunk can monitor specific files or directories.

Network :              Splunk can listen on TCP or UDP ports.

Scripted inputs:  Splunk can read the machine data output by programs or scripts such as a UNIX.

The data that you start with is a called raw data. Splunk indexes raw data by creating a time-based map of the words in the data without modifying the data itself. Splunk divides a stream of machine data into individual events.


Four Default Fields
Source Where did the data came from?
Source Type What kind of data is it?
Host Which host or machine did the data came from
_time When did the event happen?

These default fields are indexed along with the raw data.

The timestamp (_time) field is special because Splunk indexers uses it to order event, enabling Splunk to efficiently retrieve events within a time range.


Splunk Enterprise stores all of the data it processes in indexes.  An index is a collection of databases, which are subdirectories located in $SPLUNK_HOME/var/lib/splunk.

Indexes consist of two types of files: rawdata files and index files.


The set of index stanzas in indexes.conf must be identical across all peers.


There are two key types of files in a bucket

The processed external data in compressed form  (rawdata).

Indexes that points to the rawdata (index files).



Configuration files consist of one or more stanzas, or sections. Each stanza begins with a stanza header in square brackets. This header identifies the settings held within that stanza. Each set is an attribute value pair that specifies particular configuration settings.


When you look at the Data Summary in the search view, you see tabs for the Hosts, Sources, and Source Types that described the type of data you added to your Splunk index.

The host of an even it the hostname, IP address, or fully qualified domain name of the network machine from which the event originated.

The source of an event is the file or directory path, network port, or scripts form which event originated.

The source type of an event tells you what kind of data it is, usually based on how it is formatted.  (access_cominbned_wcookie: Apache web server logs)


Fields exist in machine data in many forms.  Often, a field is a value (with a fixed, delimited position on the line) or a name and value pair, where there is a single value to each field name. A field can be multivalued that is can appear more than once in an event has a different value for each appearance.

Keywords:  purchase

Fields: status = 200

Booleans:  AND            OR               NOT

Phrases: “failed password”

Wildcard:  50* (matches 500,501, 502, 503, etc)

Comparison operators:  =,!=, <.<=, >=,> (status>300)