Storage requirements for Logs

For step-by-step space estimation method:
Deployment Planning

Total number of data sources
Verify raw log sizes
Daily, peak, retained, future volume
Total number of nodes
Confirm estimates and create baselines
Compression rates( typically 25% to 30 % of original size)
Retention needs for compliance like PCI-DSS or others.
How much searching (concurrent and total)
Dashboard and ad-hoc searches

IT is time to transform it into multi-server distributed deployment

Daily indexing volume increase to over 100 GB / day

More than eight saved search are executing concurrently

Number of concurrent user is greater than four

More than 500 GB of total storage required

Need to search large quantities of data for a small set less than 5 percent of results

You need to gather the following information.

Who will be the users of Splunk?
What are their roles in using Splunk?
What is the main reason for deployment of Splunk Environment?
What are the requirements for use cases?
What is the current logging environment?
What is the current physical environment?

Indexer minimum requirements

CPU	2 x 6 core at 2 GHz per core
Memory	16 GB RAM
Disk	800 IOPS RAID 1 + 0
Network	1 Gb Ethernet NIC

Estimating your storage requirements
• A rule of thumb for syslog-type data, once it has been compressed and indexed in Splunk, occupies approximately 50% of its original size:
 15% of the raw data file
 35% for associated index files.
For a better estimate, you can test specific types of data.

Storage requirement for Clustering
With index clustering, you must consisted the replication factor and searcher factor to arrive at total storage requirements.
• Replication factor include copies of the raw data
• Search factor includes searchable (indexed) copies of the data.

site_replication_factor = origin:1, site1:2, site2:1, total:3
site_search_factor = origin:1, site1:2, site2:1, total:3

Total rawdata disk usage = Rawdata total * replication factor
Total index disk usage = Index data total * search factor
Example: 100 GB of syslog data combine in to Splunk
3 peers , and replication factor = 3, search factor = 3 requires total 115 GB across all peers
 Total raw data = (15 GB *3 ) = 45 GB/day
 Total index files = (35 GB * 2 ) = 105GB/day

NOTE: This does not include disk space needed to rebuild search factor if required.
http://docs.splunk.com/Documentation/Splunk/latest/indexer/Systemrequirments

Daily Index data 300 GB	Replication Factor 3 and Search Factor 2	Replication Factor 3 and Search Factor 3
Index files (35%)	105 * 2 = 210 GB	105 * 3 = 315 GB
Rawdata (15%)	45 * 3 = 135 GB	45 * 3 = 135 GB
Total size for cluster	345 GB	450 GB
Per Index storage per day	345 GB / 3 = 115 GB	450 GB / 6 = 75 GB

Storage requirements for Logs – Splunk

Published by Upen Patel