What’s in an index?

Splunk Enterprise stores all of the data it processes in indexes. An index is a collection of databases,

which are subdirectories located in $SPLUNK_HOME/var/lib/splunk.

Indexes consist of two types of files: rawdata and index files

Default set of indexes

Splunk Enterprise comes with a number of preconfigured indexes, including:

main: This is the default Splunk Enterprise index..
_internal: Stores Splunk Enterprise internal logs and processing metrics.
_audit: Contains events related to the file system change monitor, auditing, and all user search history.

When you index a data source, Splunk assigns metadata values.

The metadata is applied to the entire source
Splunk applies defaults is not specified
You can override them on per-event basis (during the parsing phase)

Metadata	Default
Source	Path of input file, network hostname:port, or script name
Host	Splunk hostname of the inputting instance (forwarder)
Sourcetype	Uses the source filename if Splunk cannot automatically determine
Index	Defaults to main

Splunk store events in Indexes
Splunk users can specify which index to search

( index=main sourcetype=access_combined_wcookie action=purchase)

All new inputs to splunk are stored in the main index.
The default is /opt/splunk/var/lib/splunk

main	All processed data is stored here unless otherwise specified
summary	For summary indexing system
_internal	Splunk indexes its own logs. Metrics from its processing here
_audit	It is for audit trails.
_interosepction	Tracks system performance. Resource usage data of Splunk
_thefishbucket	Contains checkpoint information for file monitoring inputs.

It is always good to create separate indexes for access control and segregation of duties.
By using multiple indexes, you can set granular retention times

	Daily logs	Retention Times for logs	Index Name	Access Control	Access Control	Access Control
Plao Alto Firwall	10 GB	30	fwlog	Firewall Team	Security Team
Linux syslog	20 GB	60	oslog	Admin Team	Security Team
Windows logs	10 GB	60	oslog	Admin Team	Security Team
Proxy logs	15 GB	90	weblog	Web Team	Security Team	Audit Team
Application logs	10 GB	90	applog	App Team
Web logs	35 GB	90	weblog	Web Team	Security Team	Audit Team
Daily Total	100 GB

An index stores events in unit called buckets
A bucket is a directory containing a set of raw data and indexing data.
Buckets have a time span and a max data size.

How data Flow through an Index
Inputs ->	Hot ->	Warm ->	Cold ->	Archive OR Delete
	These are the newest buckets – open for WRITE	Recently indexed data, bucket are closed (READ ONLY)	Oldest data still in the index (read only)	Frozen : where data when it’s ready for archive or deletion (No longer searchable)
	AWS Storage	AWS Storage	Aws Storage	Aws Storage
Storage Type where you put your buckets	EBS General Purpose SSD (gp2)	EBS General Purpose SSD (gp2)	• EBS Throughput Optimised HDD (st1)	Glacier
	EBS Provisioned IOPS SSD (io1) volume types provide the highest performance and are ideal for special use cases

Hot Buckets

Data is read and parsed. It goes through the license meter. The event is written into a hot bucket. The buckets are closed when it reaches time span or max size. Then converted to warm status.

“hot_” in the index’s db directory with a name

Hot buckets are renamed when rolled from hot to warm

When rolls bucket, it moves the entire bucket subdirectory.

Hot and warm buckets are searched first and should be on the fastest disks.

Select Settings > Indexes

Example

Index Settings
Index Name	fwlog
Home Path	/opt/dataidx/fw/db
Cold Path	/opt/dataidx/fw/colddb
Thawed path	/opt/dataidx/fw/thaweddb
Max size	500000 MB
Max size hot/warm/cold	10000 MB
Frozen archive path	/opt/dataidx/frozen

Edit the stanza in indexe.conf to more advanced options.

Indexing Activity

Search and Reporting > Reports > License Usage Data Cube

How to Inspecting Buckets

Search: | dbinspect index=name span or timeformat

Display a chart with the span size of 1 day, using the command line interface (CLI)

| dbinspect index=_internal span=1d

Default dbinspect output for a local _internal index.

| dbinspect index=_internal

Check for corrupt buckets

Use the corruptonly argument to display information about corrupted buckets, instead of information about all buckets.

The output fields that display are the same with or without the corruptonly argument.

| dbinspect index=_internal corruptonly=true

Count the number of buckets for each Splunk server

Use this command to verify that the Splunk servers in your distributed environment are included in the dbinspect command.

Counts the number of buckets for each server.

| dbinspect index=_internal | stats count by splunk_server

Find the index size of buckets in GB

Use dbinspect to find the index size of buckets in GB.

For current numbers, run this search over a recent time range.

| dbinspect index=_internal | eval GB=sizeOnDiskMB/1024| stats sum(GB)

Deleting Events you need to have can_delete role

delete command to make the unwanted data not to show up in searches
Index=web host=myhost source=access_combined_wcookie | delete

splunk clean eventdata indexname wipes out all data from the index

How can you tell your indexer is working?

index=[your_index_name]

index=_internal LicenseUsage idx=[your_index_name]

To check the license usage, search

Index=_internal Metrics series=[your_index_name]| stats sum(kbps)

index=_internal Metrics group=”per_sourcetype_thruput” series=access* | timechart span=1h sum(kb) by series

index=_internal Metrics group=”per_sourcetype_thruput” series=access* | timechart span=1h sum(kb) by series | sort – sum(MB)

Determine how many active sources are being indexed.

Search | dbinspect index=main OR index=[your_index_name]

How to calculate the data compression rate of bucket

Search : | dbinstpect index=main OR index=[your_inedex] | were eventCount > 10000 | fields index,id,state,eventCount,rawSize,sizeOnDiskMB,sourceTypeCount | eval TotalRawMB=(rawSize / 1024 / 1024) | eval compression=tostring(round( sizeOnDiskMB / TotalRawMB * 100, 2 )) + “%” | table index, id, state, sourceTypeCount, TotalRawMB, sizeOnDiskMB, compression

Default set of indexes

Find the index size of buckets in GB

Published by Upen Patel