Storage requirements for Logs – Splunk

For step-by-step space estimation method:
Deployment Planning

  1. Total number of data sources
  2. Verify raw log sizes
  3. Daily, peak, retained, future volume
  4. Total number of nodes
  5. Confirm estimates and create baselines
  6. Compression rates( typically 25% to 30 % of original size)
  7. Retention needs for compliance like PCI-DSS or others.
  8. How much searching (concurrent and total)
  9. Dashboard and ad-hoc searches
IT is time to transform it into multi-server distributed deployment
Daily indexing volume increase to over 100 GB / day
More than eight saved search are executing concurrently
Number of concurrent user is greater than four
More than 500 GB of total storage required
Need to search large quantities of data for a small set less than 5 percent of results

 

You need to gather the following information.

  • Who will be the users of Splunk?
  • What are their roles in using Splunk?
  • What is the main reason for deployment of Splunk Environment?
  • What are the requirements for use cases?
  • What is the current logging environment?
  • What is the current physical environment?

Indexer minimum requirements

CPU 2 x 6 core at 2  GHz per core
Memory 16 GB RAM
Disk 800 IOPS RAID 1 + 0
Network 1 Gb Ethernet NIC

Estimating your storage requirements
• A rule of thumb for syslog-type data, once it has been compressed and indexed in Splunk, occupies approximately 50% of its original size:
 15% of the raw data file
 35% for associated index files.
For a better estimate, you can test specific types of data.

Storage requirement for Clustering
With index clustering, you must consisted the replication factor and searcher factor to arrive at total storage requirements.
• Replication factor include copies of the raw data
• Search factor includes searchable (indexed) copies of the data.

site_replication_factor = origin:1, site1:2, site2:1, total:3
site_search_factor = origin:1, site1:2, site2:1, total:3

Total rawdata disk usage = Rawdata total * replication factor
Total index disk usage = Index data total * search factor
Example: 100 GB of syslog data combine in to Splunk
3 peers , and replication factor = 3, search factor = 3 requires total 115 GB across all peers
 Total raw data = (15 GB *3 ) = 45 GB/day
 Total index files = (35 GB * 2 ) = 105GB/day

NOTE: This does not include disk space needed to rebuild search factor if required.
http://docs.splunk.com/Documentation/Splunk/latest/indexer/Systemrequirments

Daily Index data  300 GB Replication Factor 3 and Search Factor 2 Replication Factor 3 and Search Factor 3
Index files (35%) 105 * 2 =  210 GB 105 * 3  = 315 GB
Rawdata (15%) 45 * 3 =  135 GB 45 * 3 =  135 GB
 

Total size for cluster

345  GB 450 GB
Per Index storage per day 345 GB /  3 =  115 GB 450 GB / 6 =  75 GB