Storage requirements for Logs – Splunk
For step-by-step space estimation method:
Deployment Planning
- Total number of data sources
- Verify raw log sizes
- Daily, peak, retained, future volume
- Total number of nodes
- Confirm estimates and create baselines
- Compression rates( typically 25% to 30 % of original size)
- Retention needs for compliance like PCI-DSS or others.
- How much searching (concurrent and total)
- Dashboard and ad-hoc searches
IT is time to transform it into multi-server distributed deployment |
Daily indexing volume increase to over 100 GB / day |
More than eight saved search are executing concurrently |
Number of concurrent user is greater than four |
More than 500 GB of total storage required |
Need to search large quantities of data for a small set less than 5 percent of results |
You need to gather the following information.
- Who will be the users of Splunk?
- What are their roles in using Splunk?
- What is the main reason for deployment of Splunk Environment?
- What are the requirements for use cases?
- What is the current logging environment?
- What is the current physical environment?
Indexer minimum requirements
CPU | 2 x 6 core at 2 GHz per core |
Memory | 16 GB RAM |
Disk | 800 IOPS RAID 1 + 0 |
Network | 1 Gb Ethernet NIC |
Estimating your storage requirements
• A rule of thumb for syslog-type data, once it has been compressed and indexed in Splunk, occupies approximately 50% of its original size:
15% of the raw data file
35% for associated index files.
For a better estimate, you can test specific types of data.
Storage requirement for Clustering
With index clustering, you must consisted the replication factor and searcher factor to arrive at total storage requirements.
• Replication factor include copies of the raw data
• Search factor includes searchable (indexed) copies of the data.
site_replication_factor = origin:1, site1:2, site2:1, total:3
site_search_factor = origin:1, site1:2, site2:1, total:3
Total rawdata disk usage = Rawdata total * replication factor
Total index disk usage = Index data total * search factor
Example: 100 GB of syslog data combine in to Splunk
3 peers , and replication factor = 3, search factor = 3 requires total 115 GB across all peers
Total raw data = (15 GB *3 ) = 45 GB/day
Total index files = (35 GB * 2 ) = 105GB/day
NOTE: This does not include disk space needed to rebuild search factor if required.
http://docs.splunk.com/Documentation/Splunk/latest/indexer/Systemrequirments
Daily Index data 300 GB | Replication Factor 3 and Search Factor 2 | Replication Factor 3 and Search Factor 3 |
Index files (35%) | 105 * 2 = 210 GB | 105 * 3 = 315 GB |
Rawdata (15%) | 45 * 3 = 135 GB | 45 * 3 = 135 GB |
Total size for cluster |
345 GB | 450 GB |
Per Index storage per day | 345 GB / 3 = 115 GB | 450 GB / 6 = 75 GB |