What’s in an index?
Splunk Enterprise stores all of the data it processes in indexes. An index is a collection of databases,
which are subdirectories located in $SPLUNK_HOME/var/lib/splunk.
Indexes consist of two types of files: rawdata and index files
Default set of indexes
Splunk Enterprise comes with a number of preconfigured indexes, including:
- main: This is the default Splunk Enterprise index..
- _internal: Stores Splunk Enterprise internal logs and processing metrics.
- _audit: Contains events related to the file system change monitor, auditing, and all user search history.
When you index a data source, Splunk assigns metadata values.
- The metadata is applied to the entire source
- Splunk applies defaults is not specified
- You can override them on per-event basis (during the parsing phase)
Metadata | Default |
Source | Path of input file, network hostname:port, or script name |
Host
|
Splunk hostname of the inputting instance (forwarder) |
Sourcetype | Uses the source filename if Splunk cannot automatically determine |
Index | Defaults to main |
- Splunk store events in Indexes
- Splunk users can specify which index to search
( index=main sourcetype=access_combined_wcookie action=purchase)
- All new inputs to splunk are stored in the main index.
- The default is /opt/splunk/var/lib/splunk
main | All processed data is stored here unless otherwise specified |
summary
|
For summary indexing system |
_internal
|
Splunk indexes its own logs. Metrics from its processing here |
_audit
|
It is for audit trails. |
_interosepction
|
Tracks system performance. Resource usage data of Splunk |
_thefishbucket
|
Contains checkpoint information for file monitoring inputs. |
- It is always good to create separate indexes for access control and segregation of duties.
- By using multiple indexes, you can set granular retention times
Daily logs | Retention Times for logs | Index Name | Access Control | Access Control | Access Control | |
Plao Alto Firwall | 10 GB | 30 | fwlog | Firewall Team | Security Team | |
Linux syslog | 20 GB | 60 | oslog | Admin Team | Security Team | |
Windows logs | 10 GB | 60 | oslog | Admin Team | Security Team | |
Proxy logs | 15 GB | 90 | weblog | Web Team | Security Team | Audit Team |
Application logs | 10 GB | 90 | applog | App Team | ||
Web logs | 35 GB | 90 | weblog | Web Team | Security Team | Audit Team |
Daily Total | 100 GB |
- An index stores events in unit called buckets
- A bucket is a directory containing a set of raw data and indexing data.
- Buckets have a time span and a max data size.
How data Flow through an Index | ||||
Inputs -> | Hot -> | Warm -> | Cold -> | Archive OR Delete |
These are the newest buckets – open for WRITE
|
Recently indexed data, bucket are closed (READ ONLY) |
Oldest data still in the index (read only) |
Frozen : where data when it’s ready for archive or deletion
(No longer searchable)
|
|
AWS Storage | AWS Storage | Aws Storage | Aws Storage | |
Storage Type where you put your buckets | EBS General Purpose SSD (gp2) | EBS General Purpose SSD (gp2) | • EBS Throughput Optimised HDD (st1) | Glacier |
EBS Provisioned IOPS SSD (io1) volume types provide the highest performance and are ideal for special use cases |
Hot Buckets
Data is read and parsed. It goes through the license meter. The event is written into a hot bucket. The buckets are closed when it reaches time span or max size. Then converted to warm status. “hot_” in the index’s db directory with a name Hot buckets are renamed when rolled from hot to warm When rolls bucket, it moves the entire bucket subdirectory. Hot and warm buckets are searched first and should be on the fastest disks.
|
Select Settings > Indexes
Example
Index Settings | |
Index Name | fwlog |
Home Path | /opt/dataidx/fw/db |
Cold Path | /opt/dataidx/fw/colddb |
Thawed path | /opt/dataidx/fw/thaweddb |
Max size | 500000 MB |
Max size hot/warm/cold | 10000 MB |
Frozen archive path | /opt/dataidx/frozen |
OR
Edit the stanza in indexe.conf to more advanced options.
Indexing Activity
Search and Reporting > Reports > License Usage Data Cube
How to Inspecting Buckets
Search: | dbinspect index=name span or timeformat
Display a chart with the span size of 1 day, using the command line interface (CLI)
- | dbinspect index=_internal span=1d
Default dbinspect output for a local _internal index.
- | dbinspect index=_internal
Check for corrupt buckets
Use the corruptonly argument to display information about corrupted buckets, instead of information about all buckets.
The output fields that display are the same with or without the corruptonly argument.
- | dbinspect index=_internal corruptonly=true
Count the number of buckets for each Splunk server
Use this command to verify that the Splunk servers in your distributed environment are included in the dbinspect command.
Counts the number of buckets for each server.
- | dbinspect index=_internal | stats count by splunk_server
Find the index size of buckets in GB
Use dbinspect to find the index size of buckets in GB.
For current numbers, run this search over a recent time range.
- | dbinspect index=_internal | eval GB=sizeOnDiskMB/1024| stats sum(GB)
Deleting Events you need to have can_delete role
- delete command to make the unwanted data not to show up in searches
- Index=web host=myhost source=access_combined_wcookie | delete
splunk clean eventdata indexname wipes out all data from the index
How can you tell your indexer is working?
index=[your_index_name]
index=_internal LicenseUsage idx=[your_index_name]
To check the license usage, search
Index=_internal Metrics series=[your_index_name]| stats sum(kbps)
index=_internal Metrics group=”per_sourcetype_thruput” series=access* | timechart span=1h sum(kb) by series
index=_internal Metrics group=”per_sourcetype_thruput” series=access* | timechart span=1h sum(kb) by series | sort – sum(MB)
Determine how many active sources are being indexed.
Search | dbinspect index=main OR index=[your_index_name]
How to calculate the data compression rate of bucket
Search : | dbinstpect index=main OR index=[your_inedex] | were eventCount > 10000 | fields index,id,state,eventCount,rawSize,sizeOnDiskMB,sourceTypeCount | eval TotalRawMB=(rawSize / 1024 / 1024) | eval compression=tostring(round( sizeOnDiskMB / TotalRawMB * 100, 2 )) + “%” | table index, id, state, sourceTypeCount, TotalRawMB, sizeOnDiskMB, compression