Splunk is ‘Google’ for our machine-generated data. It’s a software/engine that can be used for searching, visualizing, monitoring, reporting, etc. of our enterprise data. Splunk takes valuable machine data and turns it into powerful operational intelligence by providing real-time insights into our data through charts, alerts, reports, etc.
This is one of the most frequently asked Splunk interview questions. Below are the components of Splunk:
* Search Head: Provides the GUI for searching
* Indexer: Indexes the machine data
* Forwarder: Forwards logs to the Indexer
* Deployment Server: Manges Splunk components in a distributed environment
Splunk 8.2.1 (as of June 21, 2021)
Splunk Indexer is the Splunk Enterprise component that creates and manages indexes. The primary functions of an indexer are:
* Indexing incoming data
* Searching the indexed data
* Picture
There are two types of Splunk Forwarders as below:
* Universal Forwarder (UF): The Splunk agent installed on a non-Splunk system to gather data locally; it can’t parse or index data.
* Heavyweight Forwarder (HWF): A full instance of Splunk with advanced functionalities.
It generally works as a remote collector, intermediate forwarder, and possible data filter, and since it parses data, it is not recommended for production systems.
* props.conf
* indexes.conf
* inputs.conf
* transforms.conf
* server.conf
* Enterprise license
* Free license
* Forwarder license
* Beta license
* Licenses for search heads (for distributed search)
* Licenses for cluster members (for index replication)
Splunk app is a container/directory of configurations, searches, dashboards, etc. in Splunk.
$splunkhome/etc/system/default
Splunk Free does not include below features:
* Authentication and scheduled searches/alerting
* Distributed search
* Forwarding in TCP/HTTP (to non-Splunk)
* Deployment management
If the license master is not available, the license slave will start a 24-hour timer, after which the search will be blocked on the license slave (though indexing continues). However, users will not be able to search for data in that slave until it can reach the license master again.
A summary index is the default Splunk index (the index that Splunk Enterprise uses if we do not indicate another one).
If we plan to run a variety of summary index reports, we may need to create additional summary indexes.
Splunk DB Connect is a generic SQL database plugin for Splunk that allows us to easily integrate database information with Splunk queries and reports.
There are multiple ways in which we can extract the IP address from logs. Below are a few examples:
By using a regular expression:
rex field=_raw "(?<ip_address>d+.d+.d+.d+)"
OR
rex field=_raw "(?<ip_address>([0-9]{1,3}[.]){3}[0-9]{1,3})"
This is another frequently asked interview questions on splunk which will test Developer or Engineers knowledge. The transaction command is the most useful in two specific cases:
>> When the unique ID (from one or more fields) alone is not sufficient to discriminate between two transactions. This is the case when the identifier is reused, for example, web sessions identified by a cookie/client IP. In this case, the time span or pauses are also used to segment the data into transactions.
>> When an identifier is reused, say in DHCP logs, a particular message identifies the beginning or end of a transaction.
>> When it is desirable to see the raw text of events combined rather than an analysis of the constituent fields of the events.
In other cases, it’s usually better to use stats.
>> As the performance of the stats command is higher, it can be used especially in a distributed search environment
If there is a unique ID, the stats command can be used
It is a component of Splunk Enterprise which creates and manages indexes. The primary functions of an indexer are 1) Indexing raw data into an index and 2) Search and manage Indexed data.
Some disadvantages of using Splunk tool are:
* Splunk can prove expensive for large data volumes.
* Dashboards are functional but not as effective as some other monitoring tools.
* Its learning curve is stiff, and you need Splunk training as it’s a multi-tier architecture. So, you need to spend lots of time to learn this tool.
* Searches are difficult to understand, especially regular expressions and search syntax.
The advantages of getting data into Splunk via forwarders are TCP connection, bandwidth throttling, and secure SSL connection for transferring crucial data from a forwarder to an indexer.
License master in Splunk ensures that the right amount of data gets indexed. It ensures that the environment remains within the limits of the purchased volume as Splunk license depends on the data volume, which comes to the platform within a 24-hour window.
Commonly used Splunk configuration files are:
* Inputs file
* Transforms file
* Server file
* Indexes file
* Props file
It is a warning error that occurs when you exceed the data limit. This warning error will persist for 14 days. In a commercial license, you may have 5 warnings within a 1-month rolling window before which your Indexer search results and reports stop triggering.
However, in a free version, license violation warning shows only 3 counts of warning.
Alerts can be used when you have to monitor for and respond to specific events. For example, sending an email notification to the user when there are more than three failed login attempts in a 24-hour period.
Map-reduce algorithm is a technique used by Splunk to increase data searching speed. It is inspired by two functional programming functions 1) reduce () 2) map().
Logstash, Loggly, LogLogic, Sumo Logic, etc. are some of the top direct competitors to Splunk.
Splunk licenses specify how much data we can index per calendar day.
In terms of licensing, for Splunk, 1 day is from midnight to midnight on the clock of the license master.
They are included with Splunk. Therefore, no need to purchase separately.
This is another frequently asked Splunk commands interview question. Get a thorough idea of commands We can restart the Splunk web server by using the following command:
splunk start splunkweb
Search factor determines the number of data maintained by the indexer cluster. It determines the number of searchable copies available in the bucket.
Replication factor determines the number of copies maintained by the cluster as well as the number of copies that each site maintains.
Lookup command is generally used when you want to get some fields from an external file. It helps you to narrow the search results as it helps to reference fields in an external file that match fields in your event data.
There are 5 default fields which are barcoded with every event into Splunk. They are: 1) host, 2) source, 3) source type, 4) index, and 5) timestamp.
In order to extract fields from either sidebar, event lists or the settings menu using UI.
Another way to extract fields in Splunk is to write your regular expressions in a props configuration file.
A summary index is a special index that stores that result calculated by Splunk. It is a fast and cheap way to run a query over a longer period of time.
You can prevent the event from being indexed by Splunk by excluding debug messages by putting them in the null queue. You have to keep the null queue in transforms.conf file at the forwarder level itself.
It is a SQL database plugin which enables to import tables, rows, and columns from a database add the database. Splunk DB connect helps in providing reliable and scalable integration between databases and Splunk Enterprises.
The alert manager adds workflow to Splunk. The purpose of alert manager o provides a common app with dashboards to search for alerts or events.
The main components of Splunk are:
1. Indexer: Indexes machine data from the application server logs
2. Forwarder: Forwards logs to index which is implemented on the application server logs
3. Search head: Provides GUI for searching post to the index & forwarder implementation on the application server logs
4. Deployment Server (Management Console Host): Manages the Splunk components (indexer, forwarder and search head) in a distributed environment
Three ways to troubleshoot Splunk performance issue.
* See server performance issues.
* See for errors in splunkd.log.
* Install Splunk app and check for warnings and errors in the dashboard.
Index time is a period when the data is consumed and the point when it is written to disk. Search time take place while the search is run as events are composed by the search.
There are three different kinds of Splunk dashboards:
* Real-time dashboards
* Dynamic form-based dashboards
* Dashboards for scheduled reports
Set value OFFENSIVE=Less in splunk_launch.conf
Splunk Btool is a command-line tool that helps us troubleshoot configuration file issues or just see what values are being used by our Splunk Enterprise installation in the existing environment.
In fact, both contain preconfigured configuration, reports, etc., but Splunk add-on do not have a visual app. On the other hand, a Splunk app has a preconfigured visual app.
File precedence is as follows:
System local directory — highest priority
App local directories
App default directories
System default directory — lowest priority
This kind of question is asked to understand the scope of your knowledge. You can answer that question by saying that Splunk has a lot of competition in the market for analyzing machine logs, doing business intelligence, for performing IT operations and providing security. But, there is no one single tool other than Splunk that can do all of these operations and that is where Splunk comes out of the box and makes a difference. With Splunk you can easily scale up your infrastructure and get professional support from a company backing the platform. Some of its competitors are Sumo Logic in the cloud space of log management and ELK in the open source category. You can refer to the below table to understand how Splunk fares against other popular tools feature-wise. The detailed differences between these tools are covered in this blog: Splunk vs ELK vs Sumo Logic.
Both are features provided by Splunk for the high availability of Splunk search head in case any search head goes down. However, the search head cluster is newly introduced and search head pooling will be removed in the next upcoming versions.
The search head cluster is managed by a captain, and the captain controls its slaves. The search head cluster is more reliable and efficient than the search head pooling.
If you feed the data into a Splunk instance via Splunk Forwarders, you can reap three significant benefits – TCP connection, bandwidth throttling, and an encrypted SSL connection to transfer data from a Forwarder to an Indexer. Splunk’s architecture is such that the data forwarded to the Indexer is load-balanced by default.
So, even if one Indexer goes down due to some reason, the data can re-route itself via another Indexer instance quickly. Furthermore, Splunk Forwarders cache the events locally before forwarding it, thereby creating a temporary backup of the data.
Below are the steps to add folder access logs to Splunk:
* Enable Object Access Audit through group policy on the Windows machine on which the folder is located
* Enable auditing on a specific folder for which we want to monitor logs
* Install Splunk universal forwarder on the Windows machine
* Configure universal forwarder to send security logs to Splunk indexer
In Splunk, the Summary Index refers to the default Splunk index that stores data resulting from scheduled searches over time. Essentially, it is the index that Splunk Enterprise uses if a user does not specify or indicate another one.
The most significant advantage of the Summary Index is that it allows you to retain the analytics and reports even after your data has aged.
In Splunk, the License Master ensures that the right amount of data gets indexed. Since the Splunk license is based on the data volume that reaches the platform within a 24hr-window, the License Master ensures that your Splunk environment stays within the constraints of the purchased volume.
If ever the License Master is unreachable, a user cannot search the data. However, this will not affect the data flowing into the Indexer – data will continue to flow in the Splunk deployment, and the Indexers will index the data. But the top of the Search Head will display a warning message that the user has exceeded the indexing volume. In this case, they must either reduce the amount of data flowing in or must purchase additional capacity of the Splunk license.