This article was contributed by Anastasia Yevdokimova who works as a digital writer at SOC Prime, analyzing and reporting on technical and cybersecurity topics.
Cybersecurity teams have to work hard around the clock to maintain a safe environment within an organization. While the number of new threats adds up, there is another problem that stands in the way — the false positives rate. Recent research by Fastly shows some metrics that prove why false alerts is a serious issue that needs immediate attention:
- 45% of all the incoming alerts on average are false positives.
- 75% of SOC teams spend equal or more time on them than on the actual attacks.
- The equal amount of downtime observed was caused by cyber-attacks as well as false positives.
Not paying attention to such network noise may cost the organization a lot because due to alert fatigue, security professionals may overlook the true threat. A silver lining is that there are ways to reduce false positives while gaining greater visibility into the actual threat landscape. Below, we take a peek at some of the most popular of them.
Learn to Discern False Positives
When it comes to the noise of false positives, it is useful to try to understand its origins and trace why it happens. Often some members of the team, especially at the executive level focus on just reducing the numbers of false positives statistically while not giving enough credit to the cause and effect relationship.
However, knowing why something happens is always more effective for tackling the problem rather than trying to squeeze statistical data into the mentally satisfying percentages.
First of all, it is necessary to define the malicious event. How is it different from a legitimate one and why? If the attack in your environment is indicated by virtually any rare event, then the number of false alerts may overflow the network.
To be able to reduce the number of false positives, it might be useful to divide them into two major parts – structured and unstructured ones. Obviously, the methods for identifying and neutralizing each of them will also differ.
Structured false positives are triggered when a persistent unusual behavior of a small number of legitimate users is observed. Why are they false and how are they different from true threats? The reason for that is unexpectedly simple. Usually, a behavior of a small number of network hosts (for example, users with specific technical roles) is drastically different from the rest, that's why they trigger lots of false positives.
For example, they might use rare applications for performing software updates, make requests for unusual APIs, call unknown numbers in the middle of the night, etc. Yet, this behavior is regular, so it is possible to white-list it, thus eliminating the false positives. The issue is that such lists will be highly specific and it's difficult to create them before deploying detection rules.
These are random short-term events that can be found all throughout the network, unlike the previous kind which is limited to a few hosts. Such false positives are typically proportionate to the traffic volume. The behaviors that trigger them are widespread, evenly distributed, and common (like web browsing).
These events can create a lot of false alarms unless they are captured by rules that exclude such a white noise. Machine-learning models can also be helpful in automating this process so that the security staff is free for performing more intelligent work.
Distinguishing between different kinds of false positives is helpful for eliminating the cause and thus reducing their inflow. Once it's done, SOC experts need to make sure that they deploy the right rules and have a proper incident response plan.
The confusion often triggers at the stage of deploying rules in a SIEM or EDR environment. Some organizations follow the concept that the more rules — the better is the security posture. However, this approach often ends up in false positives piling up and requiring a lot of money and human resources to deal with.
On the other hand, not having enough rules also leads to a lack of the necessary security. Consequently, there is a need to strike the right balance when it comes to rules deployment while making sure that they are being properly modified to fit the local context.
For example, if SOC engineers deploy rules that are based on wildcards, especially with common words in the string lines, this may lead to an increased number of false positives. Let's say there is an application code that contains words such as “from”, “where”, “select”, and there is a rule that is designed to find strings with these words in order to detect SQL injections.
Eventually, since the code is basically built on the common commands, false alerts will fire pretty often. What's more, this rule might be unnecessary since there is another rule that is already deployed for these kinds of instances and may even conflict with the second one which will result in confusing alerts.
So a continuous audit, customization, and refreshment of rules might come in handy. If the organization struggles to keep up with writing the new rules on a regular basis, they might as well use SOC Prime's Detection as Code platform where seasoned specialists constantly supply the highest-quality detection rules.
Also, they have a Quick Hunt module that allows instant checks for the newest threats that do not require extensive expertise and can be executed even by a beginner-level SOC employee.
The modern threat landscape doesn't leave time for rest that's why security departments of organizations face constant pressure to keep up. According to sources, a hacker attack occurs every 39 seconds on average. A rather high influx of false-positive alerts only makes things worse. As a result, not only do SOC engineers lack time to deal with new threats but also they have to waste time on cases that are effectively pointless.
To have a more accurate picture of what's happening on a defense line, security experts may use certain thresholds and prioritization (another term for this is triage). For example, a router in the VoIP call center may send a certain number of data packets to a physical port which is okay, but if the number of these packets will suddenly jump unusually high, this will be an indicator of a DoS attack.
An organization should set its own thresholds for data packets that are okay to send, that are risk-tolerant, and amounts that need an instant reaction. Furthermore, they should prioritize alerts by appropriate types and set up rules of the incident response for each of these types. So, in a call center, the router issues could be marked as “infrastructure”, and the voice transfer through VoIP could be marked as “end users' services”.
As a result, each of the alert types should have its own response plans. For instance, a SOC team needs to make sure that both routers' and users' accounts are protected from DoS attacks, yet alert thresholds for them would be entirely different in numbers and in nature as well.
An increased number of false positives might also occur due to the organizational silos between technologies and teams. If the team is divided and doesn't share the knowledge between different levels of SOC, there might be a duplication of efforts and conflicting results that only grow the number of false alerts that are hard to deal with.
An efficient security operations center often consists of a stack of technological solutions that need to complement each other and work together as one. Various levels of SOC experts have different experiences and areas of responsibility and it is critical for them to understand each other and maintain a continuous feedback loop for creating better outcomes.
For quick results, they might apply automated solutions for smoothing out the performance of security platforms by adding quality rules translated from another format. For example, there's Uncoder.IO, a free online content translation engine that instantly converts Sigma into any other SIEM or EDR format that you might need. However, in a more long-term perspective, members of the SOC team should have seamless communication to ensure they are on the same page.
Teams should be encouraged to share knowledge with each other to improve the ways of reducing false positives in the future. For example, they could be documenting and sharing instances of some particular false alerts in documentation, special repositories, or discuss them in meetings.
If the group discussions don't show to be efficient enough and there's no time to fill out the documentation, the feedback process can be automated into existing workflows, which will make it easier and clearer how to improve performance over time.
SOC teams have to move fast and never stop and that movement can be ensured by the minimum time wasted on things that don't matter. Ultimately, reducing false positives leads to better use of talent, time, and material resources.