Curing Alert Fatigue Through Suppression and Fidelity

A few of us at my organization were luckily enough to attend DerbyCon last week. This was my first time at the conference and it was amazing. I was able to meet some fellow infosec community members including @HackingDave as well as attend some great talks. One of the talks we attended was by @subtee and @kwm called "Blue Team Keeping Tempo with Offense". This talk really hit home for me. In our organization, our team has both red teamers as well as blue teamers who partner to improve our security posture. This was a fundamental aspect of their talk and something we value as a security threat team.

In order to "keep up with offense", Casey and Keith spoke heavily on where blue teams tend to break down. This captured my ear as a blue teamer trying to gain an advantage. One of their main discussion points was that we as blue teamers tend to end up with too many alerts, too much noise, and too many false positives. It got me thinking about how we have tackled this problem in our organization and I thought it would be useful to share our strategy with others.

We use Splunk for our logging solution as well as their Enterprise Security application. One of the first things we did to create our alerting platform was disable all of their default correlation searches. They generate way too much noise and they did not reflect what really needed to be alerted on in our environment. For example, they were flagging our mail servers as having too many services enabled. Instead, we have done a thorough investigation on our logs and have created approximately 35 custom correlation searches that have proven very accurate in our environment. We continue to add new indicators and dashboards regularly based on the threat hunting we are performing.

One of the points that Casey and Keith made was about embracing false positives. We are not afraid of dealing with a few false positives to catch the really bad events. However, towards the end of last year we did do a project to evaluate the fidelity (false/true positive ratio) of our alerts and one of the tasks that came out of that effort was the creation of whitelisting capabilities for the "noisier" alerts.

In order to tackle this problem, one of my coworkers (@brkr19) created a template dashboard where we could add values for different variables in our alerts to suppress very specific events once proven as a false positive. This has dramatically improved our alert fidelity so we can focus on the bad. I think the easiest way to show the power behind this solution is to show an example. In this case, we have two custom rules in our AV client that block files from executing in local Temp directories. If you don't have this type of rule, enable one now as it has saved our environment many times over. However, even though the rule is blocking the activity, we want to have eyes on that block so we can do further analysis on the malware that is attempting to install and make sure there are no other indicators on the machine. So we created an alert to ensure we are monitoring this activity. The alert code is below, but a quick overview of what the rule is doing:

  • Search for any hits to the rule in the AV solution management index
  • Rename fields to be CIM compliant and extract new fields
  • Whitelist lookup (more to come later)
  • Override lookup (more to come later)
  • Stats to pull field values
  • Evaluation of alert severity and alerting mechanism for primary threat responder
    • Used for email vs sms alert handling

Correlation Search:

index=AVSOLUTION (rule="block process from running in local temp | Launch Process Attempts" OR rule="block process from running in Temp | Launch Process Attempts")
| rename Parameter as path
| rename Caller_MD5 as hash
| rex field=process ".*[\\\/](?<file_name>.*)"
| search NOT [
| inputlookup ess_whitelist_temp_directory_blocking.csv
| eval current_date=strftime(now(), "%Y-%m-%d")
| eval expire_date=strptime(expire_date, "%Y-%m-%d")
| eval expire_date=strftime(expire_date, "%Y-%m-%d")
| where expire_date > current_date
| fields process path hash]
| lookup ess_override_temp_directory_blocking path process hash OUTPUT severity alert_action
| fillnull value="low" severity
| fillnull value="ON-CALL-EMAIL-ALERT" alert_action
| stats values(_time) as _time, values(Begin_Time) as event_time, values(user) as user, values(file_name) as file_name, values(path) as path, values(hash) as hash, values(File_Size) as file_size, values(Caller_Process_ID) as pid, values(process) as process, values(action) as action, values(alert_action) as alert_action, values(severity) as severity, values(dest_ip) as dest_ip, values(rule) as signature, count by dest
| eval severity=case(isnotnull(mvfind(severity, "critical")),"critical", isnotnull(mvfind(severity, "high")),"high", isnotnull(mvfind(severity, "medium")),"medium", isnotnull(mvfind(severity, "low")),"low", isnotnull(mvfind(severity, "informational")),"informational", 1=1, severity)
| eval alert_action=case(isnotnull(mvfind(alert_action, "ON-CALL-SMS-ALERT")),"ON-CALL-SMS-ALERT", isnotnull(mvfind(alert_action, "ON-CALL-EMAIL-ALERT")),"ON-CALL-EMAIL-ALERT", 1=1, alert_action)

The key pieces of this alert are the whitelisting and override capabilities. For whitelisting, we copied our whitelisting template and renamed the field variables accordingly. Many people are concerned when it comes to whitelisting as it can grow to be unmanageable. However, we built in an expiring feature within our whitelist so we can add entries for a short period of time (or longer if need be) and get alerted on them in the future. For example, a team might be installing new software for the next week, but it installs out of Temp. We can whitelist the software using very specific variables (user, hash, directory, process, etc) and have that entry expire in a week so we get alerted on it in the future. We can even use wildcards. These whitelists are in the forms of dashboards so adding entries during alert handling is very quick.

To make it even quicker, we created a workflow action that auto-populates the fields direct from the Incident Review dashboard.

The last critical component of our fidelity improvement project was to introduce an override template. This allows us to change the severity and alerting method for alerts based on specific criteria. For example, if the process for the file is powershell.exe attempting to launch from Temp, we change the alert severity to High and send an sms alert to our threat responders.

Code for the whitelisting template can be found here: https://github.com/security-storm/SplunkBits/blob/master/Template-WhiteList.xml

Code for the override template can be found here:https://github.com/security-storm/SplunkBits/blob/master/Template-Override.xml

Link in the workflow action: https://splunk.com:8000/en-US/app/SplunkEnterpriseSecuritySuite/threatintelesswhitelisttempdirectoryblocking?form.input_timeframe=%2B2w&form.input_path=$path$&form.input_process=$process$&form.input_hash=$hash$&form.input_description=&earliest=0&latest=

To use these dashboards, simply create a new dashboard, copy the source from the code above and paste them in the source box in your Splunk dashboard. Then you can change the field names to match your data. Once you add something to your whitelist, it will automatically create the lookup file that your searches will pull from.

If you add this snippet to your alerts, it will eliminate any events that match the specific variables in the whitelist per entry. You will just need to match the fields variables to your search. They will need to match the fields you are using as variables in your whitelist.

| search NOT [
| inputlookup ess_whitelist_temp_directory_blocking.csv
| eval current_date=strftime(now(), "%Y-%m-%d")
| eval expire_date=strptime(expire_date, "%Y-%m-%d")
| eval expire_date=strftime(expire_date, "%Y-%m-%d")
| where expire_date > current_date
| fields process path hash]

This snippet will use the override:

| lookup ess_override_temp_directory_blocking path process hash OUTPUT severity alert_action
| fillnull value="low" severity
| fillnull value="ON-CALL-EMAIL-ALERT" alert_action

We are still very energized from DerbyCon and are excited to share more of our Splunk searches in the near future.