Embracing randomness to detect threats through entropy

By Bhabesh Raj Rai, Security Research

Contents

What is Entropy?

The calculation

Use cases using the entropy method

Layer your defenses to detect evasion and blindspots

TL;DR

Adversaries are constantly shifting tactics and uncovering new ways to attack businesses. One way analysts can defend against the continuously changing threat landscape is by layering defenses to help eliminate unknown blind spots.

Here we explain how Logpoint analysts use randomness calculations through the entropy plugin to create another layer of defense against attackers. ‘Entropy’ is a measure of randomness in a system and is used to quantify uncertainty in a random variable. We also give an example of how malware uses a technique called domain generation algorithm (DGA) to generate a large set of domain names for C2 communication which makes it difficult for defenders to block the domains but defenders can exploit the randomness of DGA to create detections using entropy as a parameter.

** Get research and analysis, insight, plus hints and tips, on how to leverage Entropy in the main blog.

Head to the contents and click each section for quick navigation.

Install the new plugin on your Logpoint SIEM to begin using the entropy process command in your queries.

Using randomness calculations, analysts using Logpoint can create another layer of defense against attackers. With the release of the entropy plugin, Logpoint analysts can deploy use cases that can also be solved through randomness calculations.

Install the new plugin on your Logpoint SIEM to begin using the entropy process command in your queries.

What is Entropy?

In essence, entropy is a measure of randomness of a system. More specifically, we are using entropy to quantify uncertainty in a random variable.

You may have heard of malware using domain generation algorithm (DGA) to periodically generate a large set of domain names for C2 communication. This process masks the real C2 domains making the job of denylisting the domains futile. Defenders can exploit this inherent randomness of DGA for creating detections using entropy as a parameter.

The calculation

We use Claude Shannon’s infamous Shannon Entropy formula:

In layman terms, the Shanon Entropy formula assigns higher numbers to rarer events and lower numbers to common events. We will exploit the behavior of Shannon entropy to detect randomized data. For example, the entropy of login.microsoftonline.com is 3.47 while that of eywonbdkjgmvsstgkblztpkfxhi.ru is 4.48.

Analysts can use entropy as a different method of detecting known indicators. Some examples of use cases that can be also be solved using entropy are:

detection of domain generation algorithms (DGA) domains [T1637.001]
detection of DNS tunneling domains [T1071.004]
detection of random process names
detection of obfuscated PowerShell script executions [T1027]

Using entropy for the above use cases provides an additional detection method for analysts to provide redundancy in their defense. One detection method may be signature-based while other backed by statistical analysis via entropy.

Let’s dive into some of the use cases in detail.

Use cases using entropy method

First up is detection of domains generated by domain generation algorithms (DGAs) [T1637.001]. As said earlier, various malware families have used DGA to periodically generate a large number of domain names for C2 communication. Malware may change up to 1000 domains per day using DGA. Malware using the same DGA setting will contact the same domain at a given point in time.

Since the domains are randomly generated by DGA, their entropy is abnormally high compared to most of the normal traffic. Use historical data to identify a normal baseline value beforehand for your environment. Lower threshold value lowers your false negative rate but increases false positives.

Using entropy to detect DGA domains

We stress the “compared to most” part because there are some benign traffic with high entropy. Allowlisting is necessary to remove benign traffic. Setting the threshold value of the entropy is up to the analysts and depends upon several factors like how well they have allowlisted the normal traffic.

Same logic can be used to detect DNS tunneling [T1071.004] for C2 communication. Adversaries send a large number of DNS requests containing the exfiltrated data. The activity is difficult to detect as it blends well into legitimate traffic and the fact that the compromised hosts are not directly connecting to the C2 server.

Using entropy to detect DNS tunneling activity of dnscat2

Apart from DNS traffic, you can use entropy to detect randomness of HTTP host value.

Using entropy to detect highly random HTTP host values

At last, you can use entropy to calculate the randomness of PowerShell script contents that were executed. First to get this data source, you have to enable PowerShell script block logging that records the content of all script blocks that it processes.

Naturally, obfuscated scripts such as from amsi.fail to disable AMSI [T1562.001] or Invoke-Obfuscation have high entropy which can seem abnormal compared to other scripts. Even unobfuscated but long scripts like the widely used PowerView can be detected by this method.

Using entropy to detect obfuscated PowerShell scripts

However just like the above use cases, analysts need to allowlist legitimate scripts that have higher entropy than the threshold value in use.

Layer your defenses to detect evasion and blindspots

There are several other methods to enable the use cases described above but entropy provides a statistical method which, if done right, is harder for adversaries to bypass. We recommend analysts to have overlapping defenses in place to anticipate evasion and bypass attempts.

As an example, consider the blue team of an enterprise has deployed our alert—Discovery via PowerSploit Recon Module Detected—from our Alert Rules to detect execution of the PowerView tool. That detection rule uses a static list of PowerView cmdlet names to monitor hits. The blue team knows that adversaries can evade this detection if they change the cmdlet names in the source code. So, they deploy another detection rule using the entropy method described above to work as a second layer of defense.

Analysts should also keep in mind that their detections can have unknown blindspots, making layering and using different detection methods a very useful tool.

For help with implementation contact Global Services.