Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets

Li, Jingwen; Liu, Jianyi; Zhang, Ru

doi:10.3390/electronics13061106

Open AccessArticle

Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets

by

Jingwen Li

,

Jianyi Liu

and

Ru Zhang

^*

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(6), 1106; https://doi.org/10.3390/electronics13061106

Submission received: 25 January 2024 / Revised: 6 March 2024 / Accepted: 14 March 2024 / Published: 18 March 2024

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, advanced persistent threat (APT) attacks have become a significant network security threat due to their concealment and persistence. Correlation analysis of APT groups is vital for understanding the global network security landscape and accurately attributing threats. Current studies on threat attribution rely on experts or advanced technology to identify evidence linking attack incidents to known APT groups. However, there is a lack of research focused on automatically discovering potential correlations between APT groups. This paper proposes a method using attack behavior patterns and rough set theory to quantify APT group relevance. It extracts two types of features from threat intelligence: APT attack objects and behavior features. To address the issues of inconsistency and limitations in threat intelligence, this method uses rough set theory to model APT group behavior and designs a link prediction method to infer correlations among APT groups. Experimental results on publicly available APT analysis reports show a correlation precision of 90.90%. The similarity coefficient accurately reflects the correlation strength, validating the method’s efficacy and accuracy.

Keywords:

APT attacks; correlation analysis; rough set; threat intelligence; behavior patterns of APT groups

1. Introduction

APT attacks refer to the persistent and targeted intrusions of an advanced group against specific targets. Unlike generic hacker attacks, APT attacks aim to attack infrastructure and steal sensitive intelligence. They exhibit a strong national strategic intention, which seriously threatens a nation’s cyberspace security [1]. In recent years, APT attacks have demonstrated a clear trend towards cyberwarfare. Some APT groups have started collaborating and sharing intelligence to pursue common strategic goals. In 2010, the “Stuxnet” incident [2] set a precedent for nation-state APT attacks. U.S. and Israeli intelligence agencies planned and developed this incident [3]. Germany and France contributed supply chain intelligence, and Dutch agents facilitated payload delivery [4]. In December 2016, the U.S. Department of Homeland Security attributed attacks on the Democratic National Committee and U.S. election campaigns to coordinated efforts by APT28 and APT29, linked to Russian intelligence agencies [5]. In 2019, Symantec reported APT34 infrastructure use during Operation Waterbug [6]. Understanding the correlation between APT groups can help to discover the potential collaboration and information sharing between APT groups and provide direction for APT attack discovery and traceability.

Existing research [7,8,9,10,11,12] mainly focuses on APT attack profiling, detection, and attribution. These studies typically discover and trace attack behaviors by analyzing the characteristics of specific APT groups. Few studies have focused on analyzing the correlations between APT groups. Some methods [13,14,15,16] explore correlations between attack behaviors to uncover associations between incidents. However, individual incident relevance is subject to change and does not fully represent APT group attack patterns. More accurate measurement requires assessing multiple attack characteristics comprehensively.

In industry, APT group correlation analysis is largely a manual process carried out by security analysts, heavily reliant on their expertise. Analysts establish correlations by examining attack resources like IP addresses, domains, vulnerabilities, or by comparing similarities in signatures and code strings. However, expert-based methods can be biased and may not handle the growing scale of attacks effectively. Analyzing APT group correlations has several challenges. Firstly, fragmented information makes effective organization difficult, leading to errors in correlation analyses. Secondly, knowledge-based methods relying on expert experience require constant knowledge base updates, consuming time and resources and causing delays in relation disclosures. Lastly, there is a lack of an effective calculation method to measure APT group correlations.

In this paper, we introduce an APT group correlation analysis method using attack behavior patterns and rough set theory. According to reference [17], attackers are defined based on key indicators such as tradecraft, infrastructure, malware, intention, and external information. This paper synthesizes fragmented information from multiple dimensions to create a sample set of APT groups. We extract dynamic and static behavior features from APT malware to gather information about tradecraft, infrastructure, and malware characteristics. We also gather APT security events to obtain information about intentions. To address the issue of inconsistent and inaccurate information in raw data, we employ rough set theory, a mathematical tool used to handle imprecise and incomplete information [18], to model APT group behavior patterns. To uncover potential correlations, we design a link prediction method, which is crucial for revealing hidden relations [19].

The main contributions of this paper are as follows:

(1) This paper presents an APT group knowledge model, generated semi-automatically from threat intelligence. It integrates the features of attack objects and attack behaviors for more comprehensive APT group profiling.

(2) This paper proposes an innovative method for APT group correlation analysis, leveraging rough set theory. To our knowledge, this is the first academic study automating the analysis of APT group correlations. The method dynamically creates APT group behavior patterns by approximating upper and lower bounds and considers fuzzy behavior to design a link prediction method for correlation inference. The correlation precision is up to 90.90%.

(3) This paper analyzes the development and evolution trends of APT attacks by observing the changes in correlations among different groups in a dataset covering 14 years from 2008 to 2022.

The remainder of the paper is structured as follows. Section 2 discusses related work on attack attribution and cyber security incident correlation analyses. Section 3 presents our APT group correlation analysis approach. In Section 4, we provide a detailed summary of our approach’s experimental results. Section 5 showcases the evolution patterns of APT group correlations through case studies. Finally, Section 6 concludes the paper and outlines future work and prospects.

2. Related Work

This section introduces some related works about attack attribution and existing methods for cyber security incident correlation analyses. The limitations of these existing methods are further analyzed and discussed.

2.1. Attack Attribution

Attribution refers to the process of attributing cyber attack actions to a specific entity, attacker, or group. Researchers typically utilize various attack-related data to model attackers, such as malicious code, indicators of compromise (IOCs), and tactics, techniques, and procedures (TTPs). They then employ association rules or classification algorithms to identify attackers.

Malicious code serves as a crucial tool in APT attacks. Using malicious code to identify threat actors holds significant importance in attack tracing. Several studies have proposed feature dimensions for malicious code attribution from different perspectives. Son et al. [20] proposed nine factors for relation analysis of malware, considering perspectives such as attack propagation, malware, and attack sources. Hamed et al. [21] developed twelve views from four perspectives: opcode, bytecode, system calls, and titles. Parunak [22] defined malware behaviors as event sequences generated by a specific grammar and estimated the similarity between different ransomware based on the generated strings. Kida et al. [23] used the NATO phonetic alphabet as embedded fuzzy hashes to identify similarities. Liras et al. [24] extracted dynamic and static features from malware for identifying APT attacks. These features are only for specific APT groups and cannot cover all APT groups networks. Several studies have improved malware attribution models. Li et al. [25] proposed a model that integrates SMOTE and random forest algorithms for dealing with imbalanced data from different groups. Dib et al. [26] designed a deep learning model that integrates multi-layer features from both strings and images. Wang et al. [27] used semantic analysis to identify attack groups. Black et al. [28] proposed a contextual similarity technique that strengthens the results of function similarity. However, malicious code only exists in specific stages of the APT attack process, and a single malware instance cannot provide a comprehensive view of the entire attack.

IOCs serve as forensic evidence of potential intrusions of a host system or network. Different groups have their own unique IOC libraries. Zhao et al. [29] developed an automated IOC extraction method based on word embedding and a syntactic dependency analysis of threat texts for identifying IOC domains. However, the large number of IOCs are prone to frequent changes, with attackers often altering their IOCs within a short timeframe.

TTPs are the most important feature to distinguish APT groups as they reflect the behavior patterns of attackers. Compared to malicious code and indicators, patterns are more resistant to change. FireEye [30] used the existing knowledge of TTPs to cluster unknown groups using cosine similarity. Noor et al. [31] utilized latent semantic analysis methods to index threat intelligence into TTPs and trained machine learning classifiers for APT group identification. Kim et al. [32] extracted TTPs from sandbox reports and calculated their correlation with threat actors using vector similarity. They also introduced IOCs to refine the correlation results. While TTPs are valuable for attacker profiling, they cannot be directly derived from data and require expert judgment. This process consumes significant manpower and time, leading to delays in threat intelligence.

APT attacks typically involve multiple views and encompass various correlations, such as those between attack targets, attack resources, and attack behaviors. However, existing attack attribution methods often focus on describing the attacker from a singular dimension. This limited perspective hinders a comprehensive understanding of the attacker’s profile and can lead to deviations or even errors in correlation analysis results.

2.2. APT Group Correlation Analysis

Some studies have explored correlation analysis methods for cyber security incidents. In 2017, Perl [13] introduced multi-type attributes for attack correlation and suggested different similarity calculation methods based on attribute types. Rezapour et al. [14] proposed combining blacklist and victim similarity measures to assess attacker similarity. Karafili et al. [15] developed an automated argumentation-based reasoner using both technical and social evidence. Xu et al. [16] proposed a dependence probability algorithm to analyze and quantify the correlation between network attack traces. However, these methods heavily rely on pre-defined feature sets or association rules based on expert knowledge, which may not adapt well to the dynamic changes in the network environment.

The utilization of rough set theory in this study enables the definition of behavior patterns for APT groups, facilitating the dynamic expansion of new attack knowledge. Furthermore, this paper surpasses the limitations of inherent association rules by introducing a link prediction method to construct an APT group relation network.

3. Methodology

This section contains a description of an approach for modeling the behavior patterns of APT groups and a method for analyzing the correlation between groups. The model is structured into three principal components: APT knowledge representation generation, construction of behavior patterns of APT groups, and APT group correlation analysis. The model framework is shown in Figure 1.

The APT knowledge representation generation module collects security events and malware samples from available threat intelligence sources. This module extracts features related to attack objects and attack behaviors exhibited by APT groups from the collected samples and conducts knowledge representation to facilitate the comprehensive profiling of APT groups.

The module for the construction of the behavior patterns of APT groups utilizes rough set operators to establish the behavior pattern of APT groups. This module employs the concept of upper and lower approximations and constructs precise domains, fuzzy domains, and irrelevant domains to effectively describe the behavior patterns of APT groups.

The APT group correlation analysis module introduces a measurement method based on link prediction to quantify the similarity between different APT groups. This module enables the detection of potential hidden correlations and facilitates the construction of the APT group relation network.

3.1. Knowledge Representation of APT Groups

This paper combines attack objects and behavior features to profile APT groups, defining the knowledge representation of APT groups.

Definition 1.

Knowledge representation of APT groups (

G r o u p

) refers to the knowledge representation system used to describe the behavior pattern of APT groups, including the feature representation of an attack object, which is composed of security event feature sequences, and the feature representation of attack behavior, which is composed of malware feature sequences.

G r o u p = O b j e c t ⋃ B e h a v i o r

(1)

Definition 2.

Feature representation of attack objects (

O b j e c t

) refers to a knowledge representation system composed of a set of security event feature sequences. Each feature sequence consists of a set of security event attributes.

The feature representation of an attack object is defined as the following four-tuple [33]:

O b j e c t = < U^{o}, R^{o}, V^{o}, f^{o} >, R^{o} = C^{o} ⋃ D

(2)

U^{o} = {e_{1}, e_{2}, . . ., e_{M}}

is the set of security event feature sequence objects, also called a universe.

R^{o}

is a finite nonempty set of attack object attributes and the subsets

C^{o}

and D are called the condition attribute set and the decision attribute set, respectively.

C^{o} = {t i m e, t a r g e t, t a r g e t_c l a s s, l o c a t i o n, m e t h o d, t o o l, c a r r i e r, v u l n e r a b i l i t y}

. A detailed explanation of the different attributes follows:

$t a r g e t$ denotes the type of the targeted entity in the attack, including individuals, organizations, devices, operating systems, and so on.
$t a r g e t_c l a s s$ denotes the industry to which the target entity belongs.
$l o c a t i o n$ denotes the geographical location of the targeted entity in the attack.
$m e t h o d$ denotes the attack methods employed by the attacker to target entities, including backdoor attacks, phishing emails, exploit attacks, and so on.
$t o o l$ denotes the specific tool, malware name, or family employed by the attacker to target entities.
$c a r r i e r$ denotes the means utilized by attackers to propagate their attacks, including the vectors for malware delivery and vulnerability exploitation and so on.
$v u l n e r a b i l i t y$ denotes the CVE number of the vulnerability exploited by the attacker.
$t i m e$ denotes the disclosure time of security events. In cases where a specific time is not available, the time stated in the technical report takes precedence.

Let

D = {g_{1}, g_{2}, . . ., g_{K}}

denote the set of APT groups that launched these security events.

V^{o} = ⋃_{a \in R^{o}} V_{a}

, where

V_{a}

is the set of values of attribute a, and

f^{o} : R^{o} \to V^{o}

is an information or a description function.

Take the example of a reported security event in threat intelligence. “In 2017, researchers from ESET uncovered Gazer, a new malware tool used by the infamous threat actor Turla to spy on embassies and consulates in Europe”. In this case, the feature sequence of this security event can be represented as

e_{1} = [i d, O r g a n i z a t i o n, D i p l o m a c y, E u r o p e, M a l w a r e, G a z e r, N o n e, N o n e, 2017, T u r l a]

.

Definition 3.

Feature representation of attack behavior (

B e h a v i o r

) refers to a knowledge representation system composed of a set of malware feature sequences. Each feature sequence consists of a set of malware attributes.

The feature representation of attack behavior is defined as the following four-tuple:

B e h a v i o r = < U^{b}, R^{b}, V^{b}, f^{b} >, R^{b} = C^{b} ⋃ D

(3)

U^{b} = {m_{1}, m_{2}, . . ., m_{j}, . . . m_{N}}

is the set of malware feature sequence objects.

R^{b}

is a finite nonempty set of attack behavior attributes and the subset

C^{b}

is condition attribute set.

C^{b} = {t i m e, S t a t i c, D y n a m i c, V u l n e r a b i l i t y}

is the attribute set of malware. Thus,

$S t a t i c$ denotes the static feature set of malware, including attributes such as the import and export functions of executable files, APIs, language types, resource items, etc. These features provide insights into the coding intentions and language preferences exhibited by malicious code.
$D y n a m i c$ denotes the dynamic feature set of malware, including attributes such as commands, files, registry entries, processes, and dynamic link libraries observed during the execution of malware samples. These features aid in refining the understanding of the attack process.
$V u l n e r a b i l i t y$ denotes the vulnerability feature set of malware.
$t i m e$ denotes the date on which the malware was initially detected in VirusTotal (www.virustotal.com, accessed on 15 August 2023).

V^{b} = ⋃_{a \in R^{b}} V_{a}

, where

V_{a}

is the set of values of attribute a. Each

v a l u e

in

V_{a}

is defined as follows:

v a l u e = \{\begin{matrix} {\frac{a . f r e (x)}{100} | x \in U^{b}}, & a \in D y n a m i c \\ {0, 1}, & a \in S t a t i c \\ {0, 1}, & a \in V u l n e r a b i l i t y \\ {y e a r (x) | x \in U^{b}}, & a = t i m e \end{matrix}

(4)

Dynamic feature values are defined as the occurrence frequency divided by 100, enabling the differentiation of high-, medium-, and low-frequency features. The static and vulnerability features are encoded as either 0 or 1, depending on their presence. The time feature corresponds to the year when the malware was initially detected.

f^{b} : R^{b} \to V^{b}

is the information or a description function.

The malware behavior features collected from VirusTotal often contain significant redundancy. To address this issue, this paper utilizes a feature selection algorithm based on mutual information [34] to reduce the size of the conditional attribute set. Mutual information provides a measure of the dependency between condition attributes and decision attributes. For any condition attribute

c_{i} \in C^{b}

and decision attribute set D, the mutual information of

c_{i}

and D is defined as

I (c_{i}, D) = \sum_{c \in c_{i}} \sum_{g_{j} \in D} p (c, g_{j}) log \frac{p (c, g_{j})}{p (c) p (g_{j})}

(5)

Here, c denotes the attribute value in

c_{i}

,

g_{j}

denotes the APT group categories in D, and

p (c, g_{j})

represents the joint probability distribution of condition attribute

c_{i}

and the decision attribute set D, while

p (c)

and

p (g_{j})

denote the marginal probability distribution of condition attribute

c_{i}

and the decision attribute set D, respectively. When

I (c_{i}, D) = 0

,

c_{i}

and D are independent of each other. When the mutual information between two random variables is large, the two variables are dependent. The mutual information values between each condition attribute

c_{i}

and the decision attribute D are calculated and sorted. The top

κ

condition attributes with the highest mutual information values are selected to form the condition attribute subset, thus achieving a reduction in the condition attribute set size. The effectiveness of attribute reduction is examined in Section 4.2.

3.2. Behavior Pattern Construction of APT Groups Based on Rough Sets

In threat intelligence, experts typically attribute APT security events and malware based on their experience, resulting in raw data inaccuracies and inconsistencies. APT attacks involve various techniques and tactics, and thus it is hard to create an accurate formula to describe APT group behavior. This paper takes inspiration from Loia et al.’s work [35] and uses rough set theory to define APT group behavior patterns. One of the basic ideas of rough set theory is to discover knowledge through the classification of the equivalence relation and the classification of the approximation of the target [33]. The equivalence relation is an indiscernible relation. An indiscernible relation can be defined as follows.

Definition 4.

Indiscernible relation. Given a subset of attribute set

A \subseteq R

, an indiscernible relation

i n d (A)

on the universe U can be defined as follows,

i n d (A) = {(x, y) | (x, y) \in U^{2}, \forall_{a \in A} (a (x) = a (y))}

(6)

The equivalence class of an object x is denoted by

{[x]}_{i n d (A)}

, or simply

{[x]}_{A}

and

[x]

, if no confusion arises. The pair

(U, {[x]}_{i n d (A)})

is called an approximation space.

Further, we define the upper and lower approximate sets of the sample set of group

g_{k}

.

\bar{a p r} (S (g_{k})) = {x \in U | [x] ⋂ S (g_{k}) \neq \emptyset}

(7)

\underset{̲}{a p r} (S (g_{k})) = {x \in U | [x] \subseteq S (g_{k})}

(8)

S (g_{k}) = {x \in U | D (x) = g_{k}}

denotes the sample set of the group

g_{k}

.

\underset{̲}{a p r} (S (g_{k}))

represents the upper approximation set of

S (g_{k})

, which encompasses the unique feature sequences specific to APT group

g_{k}

.

\bar{a p r} (S (g_{k}))

represents the lower approximation set of

S (g_{k})

, which encompasses all feature sequences that can be associated with APT group

g_{k}

. When

\bar{a p r} (S (g_{k})) \neq \underset{̲}{a p r} (S (g_{k}))

, we refer to this as a rough set.

According to the rough membership [36] of each feature sequence in the sample set

S (g_{k})

, the calculation is performed as shown in Equation (9). The APT group sample sets are divided into three distinct regions, forming the behavior patterns of the APT groups.

μ (x) = \frac{| S (g_{k}) \cap [x] |}{| [x] |}, x \in S (g_{k})

(9)

Definition 5.

Behavior pattern of APT groups (

P a t t e r n

) refers to the approximate description of the behavior of APT groups, which is determined by the rough membership of the feature sequence. It consists of three components: the precise domain, the fuzzy domain, and the irrelevant domain.

P a t t e r n (g_{k}) = < P r e c (g_{k}), F u z z (g_{k}), I r r (g_{k}) >

(10)

P r e c (g_{k}) = {x \in S (g_{k}) | μ (x) = 1}

(11)

F u z z (g_{k}) = {x \in S (g_{k}) | 0 < μ (x) < 1}

(12)

I r r (g_{k}) = {x \in S (g_{k}) | μ (x) = 0}

(13)

P r e c

denotes the precise domain, which consists of the unique feature sequences belonging to a specific APT group.

F u z z

represents the fuzzy domain, containing feature sequences associated with multiple APT groups. These sequences do not only represent a single APT group but also overlap with others. Their presence in the fuzzy domain indicates behavior similarities between APT groups.

I r r

denotes the irrelevant domain, which consists of feature sequences that are entirely unrelated to the behavior patterns of group

g_{k}

.

Based on Equation (9) and Equations (11)–(13), the precise domain, fuzzy domain, and irrelevant domain are calculated as follows:

P r e c (g_{k}) = \underset{̲}{a p r} (S (g_{k}))

(14)

F u z z (g_{k}) = \bar{a p r} (S (g_{k})) - \underset{̲}{a p r} (S (g_{k}))

(15)

I r r (g_{k}) = U - \bar{a p r} (S (g_{k}))

(16)

3.3. Correlation Measurement Method Based on Link Prediction

The analysis of APT group correlations relies on an effective similarity calculation function. In reference [37], a general form of the rough set similarity measurement function is proposed, as shown in Equation (17):

F (A, B) = \frac{ψ_{1} (| A ⋂ B |)}{ψ_{2} (| A |, | B |, | A ⋂ B |)}

(17)

Considering the proportion of similar feature sequences within fuzzy domains of different groups, we use the Jaccard coefficient as a direct correlation measure, which is calculated as follows:

D I R (g_{i}, g_{j}) = \frac{| F u z z (g_{i}) ⋂ F u z z (g_{j}) |}{| F u z z (g_{i}) ⋃ F u z z (g_{j}) |}, g_{i}, g_{j} \in D

(18)

Given the limitations of threat intelligence feeds, it is possible that some correlations remain undiscovered through direct behavior analysis. Link prediction addresses the challenge of predicting the existence of unknown connections based on uncertain structural information within a network [38], which can estimate the likelihood of a connection between two nodes that are not currently linked within the known network [39]. Consider an undirected network

G (V, E)

, where V is the node set and E is the edge set, with

n = | V |

and

m = | E |

representing the number of nodes and edges, respectively. The goal is to identify missing links or predict the emergence of future links from the set of non-existing ones.

Lu et al. [40] have summarized different node similarity indices in link prediction. Table 1 shows a partial list of these indices.

Considering that attackers who share common collaborators are more inclined to cooperate, this paper incorporates the network structure of APT groups, using the number of neighboring nodes and path weights to devise a new node similarity index. Figure 2 shows the connected sub-network graph, with a diameter of 3, obtained after expanding the pair of groups

(g_{i}, g_{j})

by one layer.

Let us define the set of all paths passing through nodes

g_{i}

,

g_{k}

, and

g_{j}

as

p a t h (g_{i}, g_{k}, g_{j})

. Each path in this set consists of a series of adjacent node pairs, denoted as

p = {< u, v >, \dots, < x, y >, < y, z > | u, v, x, y, z \in D}

. The weight of the entire path is determined by dividing the minimum distance in the path by the number of hops. For instance, considering Figure 2, the weight of the path

(g_{i} \to g_{5} \to g_{j})

is calculated as 0.3/2 = 0.15.

w (p) = \frac{m i n ({d i s t (u, v) | < u, v > \in p})}{| p |}

(19)

The common neighbor correlation coefficient

C N (g_{i}, g_{j})

between two group nodes is defined for all

g_{k} \in (Γ (g_{i}) \cap Γ (g_{j}))

. It is calculated by summing the weights of all paths passing through their common neighbor nodes, considering the node pair

< g_{i}, g_{j} >

as the start and end point. Since the scales of the local networks where different nodes are located may vary, this method normalizes the calculation results by dividing them by the total number of common neighbors. The calculation of

C N (g_{i}, g_{j})

is as follows:

C N (g_{i}, g_{j}) = \frac{\sum_{g_{k} \in (Γ (g_{i}) ⋂ Γ (g_{j}))} \sum_{p \in p a t h (g_{i}, g_{k}, g_{j})} w (p)}{| Γ (g_{i}) ⋂ Γ (g_{j}) |}

(20)

The similarity between APT groups is determined by

C N (g_{i}, g_{j})

and

D I R (g_{i}, g_{j})

.

S i m (g_{i}, g_{j}) = \frac{C N (g_{i}, g_{j}) + D I R (g_{i}, g_{j})}{2}

(21)

In order to better show the correlation between different APT groups, we define the APT group relation network as

N = (D, E)

, where D represents the node set consisting of APT groups, and

E = {S i m (g_{i}, g_{j}) | g_{i}, g_{j} \in D, S i m (g_{i}, g_{j}) > γ}

represents the edge set. The weight of each edge corresponds to the similarity between the related groups. This network provides insights into the level of correlation among different APT groups and reveals clusters of related groups.

4. Experiment

4.1. Data Gathering and Preprocessing

To validate this method, a large dataset of samples from 76 well-known APT groups was gathered, covering the period from 2008 to 2022. This dataset was composed of two sub-datasets: the security event sample set and the malware sample set.

Regarding security event samples, 432 of them were collected from HACKMAGEDDON (https://www.hackmageddon.com, accessed on 15 August 2023) websites. Furthermore, we also gathered 1164 additional security event samples obtained from APT technical reports. These reports were gathered from an open-source technical report platform (https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections, accessed on 15 August 2023). Hence, a total of 1596 APT security event samples were considered. Based on the feature representation of the attack object, we manually extracted security event samples and transformed them into structured data.

The malware sample set was composed of two sub-datasets that were later joined: a first dataset including a set of APT malware samples, and a second dataset with a collection of “general” malware samples. The samples of these two datasets were analyzed from static, dynamic, and vulnerability points of view using VirusTotal. VirusTotal is a renowned and trusted virus scanner that analyzes nearly one million distinct files daily using over 50 different tools [41]. APT malware samples were collected from the APT technical reports mentioned above, and this dataset consists of 3219 samples from 76 well-known APT groups. General malware samples are defined as those that, up to this point, are not known to belong to any APT group. A total of 12,510 general malware samples were crawled from VirusShare (https://www.virusshare.com/, accessed on 15 August 2023), and they can be classified as follows: up to 8025 instances of malware were mainly considered trojan, 2261 were considered adware, 1231 were worms, 246 were downloaders, 673 were viruses, 23 were hacktools, 15 were ransomware, and 25 were bankers. In this classification, the names given to each sample by VirusTotal were used.

Considering all the information obtained from static and dynamic analyses, we obtained a total of 162,750 features from the corresponding analysis reports. However, this includes some repetitive and redundant features. We performed a data preprocessing procedure as follows: (1) Feature normalization was applied to standardize the format and merge features that represent the same operation. (2) Features irrelevant to the attack were deleted. The preprocessed feature set forms the attribute set of attack behavior feature representation. It consists of 42,090 features distributed among 47 categories, which include 6 static feature categories, 40 dynamic feature categories, and vulnerability features.

4.2. Feature Verification and Analysis

To mitigate the curse of dimensionality caused by high-dimensional data spaces in the attribute set of attack behavior feature representation, we employed a feature selection algorithm based on mutual information to reduce the attribute set.

The goal of these experiments is to assess the selected features’ ability to represent various APT group behavior patterns. All instances from the malware sample set were used in these experiments. We designed four classification tasks to compare feature selection algorithms based on TF-IDF, high-frequency words, and mutual information in order to evaluate their attribution accuracy and correlation results. Classification tasks are shown in Table 2. The dataset was randomly divided into a training set and a testing set, with four-fifths of the dataset used for training and one-fifth for testing. We trained a K-nearest neighbor (KNN) model for sample attribution. The number of neighbors was varied between 1 and 10 (we experimented with 3, 5, and 10 neighbors), and the best accuracy during cross-validation was obtained for k = 3. In the correlation experiment, we focused solely on the direct correlation coefficient to measure the similarity between APT groups. The correlation threshold was

γ

= 0.2. The results are shown in Table 3.

The method based on high-frequency words only considers the frequency of features, and does not consider the correlation between features and decision attributes, resulting in poor results. In experiments A and B, the TF-IDF method achieves the best results, with an F1 value as high as 96.41%. This is because when the number of categories is small and the data are relatively balanced, the TF-IDF method can filter out features with a low frequency in the entire dataset by calculating the feature frequency and weight, which improves the generalization ability of the classifier. However, after adding the APT group classification, a sample imbalance problem occurs. The TF-IDF method may delete important features in some small sample categories, affecting the performance of the classifier and causing confusion in different APT groups, which reduces the correlation precision. However, the method based on mutual information can more accurately select the features highly related to the decision attributes, so it can better distinguish different classes and has the best effect in both attribution experiments and correlation experiments.

After performing attribute reduction based on mutual information, the attribute set retained a total of 172 features. The correlation among these attributes is displayed in Figure 3. Certain attributes exhibit strong correlations. This is often because they are frequently combined to facilitate the execution of an attack process. The following examples illustrate the reasons for the formation of strongly correlated attribute groups.

A strongly correlated group of attributes corresponds to an office macro virus attack. The attack process is as follows: (1) The attacker executes winword.exe to launch Microsoft Word software. (2) They invoke vbe.dll, vbeintl.dll, riched20.dll, x86_microsoft.vc.crt, and other libraries to execute the macro code. (3) Subsequently, dwmapi.dll and desktop.ini are utilized to modify the registry settings in hku∖∖software∖∖policies∖∖microsoft∖∖control panel∖∖desktop, thereby hijacking desktop shortcuts. Additionally, (4) malicious payloads are downloaded to the %AppData% directory. (5) Upon clicking the shortcut, the malware initiates execution, employing rpcrtremote.dll for remote connection and communication. (6) Finally, registry entries such as hku∖∖software∖∖microsoft∖∖office∖∖word∖∖resiliency∖∖startupitems are modified to eliminate traces and clear the history information of unsuccessful Office opening attempts records.

The relations network between different APT groups and various types of general malware is shown in Figure 4. General malware exhibits fewer correlations with APT groups. The correlation between viruses and certain APT groups may be due to virus attacks being a common penetration method of APT groups; these groups have previously employed similar viruses. Since this paper primarily focuses on the correlation among APT groups, the subsequent experiments do not consider general malware samples.

4.3. Correlation Analysis

To assess the effectiveness of the correlation measurement method based on rough set theory, this study compiled a dataset consisting of 79 pairs of APT group relationships that were disclosed in public reports by reputable security firms. These relationships encompassed various aspects, including, but not limited to, homogeneity, cooperation, and affiliation with the same intelligence organization, among others. However, it is important to clarify that we do not determine the veracity of these relationships. There is potential for both malicious actors deliberately misleading attribution and for those investigating to intentionally manipulate the information. This paper does not express opinions about this. The paper’s objective is to explore the key technologies used in APT group correlation analysis using publicly available data and demonstrate their effectiveness through presenting public findings.

Drawing upon the APT group correlation analysis concepts presented in previous relevant studies and various widely used inter-class distance measurement methods, we conducted three comparative experiments.

1. Average distance method: the Euclidean distance is used to assess the similarity between two samples. The distance between two groups is calculated as

D_{k, l}^{2} = \frac{1}{n_{k} n_{l}} \sum_{x_{i} \in k, x_{j} \in l} d_{i j}^{2}

(22)

which represents the sum of the squares of the distances between different samples within each pair of groups, and this value is then averaged. The similarity between two groups is determined as

s i m i l a r i t y_{a v e} = e^{- 1 / 2 D}

(23)

2. Center distance method: We utilize k-means to cluster the sample space into 100 clusters. We select the top three clusters in each group, apply the central theorem to determine the centers of the different clusters, and subsequently employ the average distance method to calculate the similarity between the groups.

3. Set similarity method: without applying rough sets, we calculate a similarity measure based on link prediction for the entire APT sample set.

It is important to emphasize that the dataset in this paper represents only a subset of the data within the realm of threat intelligence. As a result, this experiment evaluates precision as the primary performance metric. We set the threshold range for the correlation coefficient to be [0.1, 0.9], selecting the association scheme with the highest precision for each method. The results are shown in Table 4. Line graphs depicting the association precision and the number of relationships were plotted based on the threshold variations, as illustrated in Figure 5.

As can be seen from this figure, the method based on the center distance has the worst effect. This is because each APT group manifests multiple behavior patterns, and only considering the cluster center distance ignores many similar behaviors. The methods based on the average distance and the set similarity take into account every sample within the APT group dataset. However, the inclusion of unique or irrelevant samples can lead to increased variance, a lower organization similarity, and a higher error. The method based on rough sets only focuses on the fuzzy behavior patterns of APT groups, which results in a higher similarity between related groups and can ensure greater precision when identifying more relationships.

This paper compares the HPI and HDI in Table 1, as well as the correlation results without using the link prediction method. The correlation between the number of recognized relationships and precision based on different similarity indices is shown in Figure 6.

As shown in Figure 6, since the HPI and HDI only consider the number of common neighbors and do not take into account the correlation coefficient between groups, they tend to enhance the correlation between weakly related groups, which can increase false positives. Compared with the method without introducing neighbor correlation coefficient, the new similarity coefficient proposed in this paper has higher precision when the number of identified correlated group pairs is the same, which proves that the number of false correlations is less. Moreover, as the similarity threshold decreases, the number of correlated group pairs that can be obtained increases. Compared to the method that does not introduce the neighbor correlation coefficient (the green line in Figure 6), the precision of the proposed method (the red line in Figure 6) decreases less, which further proves that this method can eliminate some false predictions.

The APT group relation network generated from the behavior patterns of APT groups is shown in Figure 7. To achieve better clarity in the visualization, we utilize consistent colors to indicate APT groups belonging to the same country. It can be observed that the calculation results effectively quantify the degree of correlation between different APT groups. Related groups demonstrate a high intra-cluster similarity and a greater number of connections, while the inter-cluster similarity remains relatively low.

4.4. Evaluation Analysis

To evaluate the results of the evolution analysis, the APT group samples were categorized into four temporal stages based on their time attribute, namely (1) [2008–2011], (2) [2012–2015], (3) [2016–2018], and (4) [2019–2022]. The data distribution across these stages is presented in Table 5.

The experimental results are presented in Table 6. When comparing the correlation results based on single-dimensional and multi-dimensional features, we observe that the precision based on security event samples is 66.66%, the precision based on malware samples is 86.66%, and the fusion of features achieves a precision of 90.90%. This method successfully identifies 20 true positive group pairs, surpassing the sum of correlation results from the two individual dimensions. These findings indicate that multi-dimensional features effectively complement APT group characteristics from various perspectives.

Upon careful examination of the experimental results presented in Table 6, it is evident that certain group pairs are classified as false positives due to the lag in threat intelligence. For instance, in our experiment, the group pair <APT33, MUDDYWATER> was identified during the [2016–2018] period. However, during this timeframe, no security company reported any correlation between these groups. It was only in 2019 when a security blog disclosed that their infrastructures overlapped [42]. Manual analysis by security experts often requires a significant amount of time, making it challenging to identify correlations immediately. Further investigation revealed that out of the nine false positive group pairs identified during [2016–2018], six were subsequently disclosed to have potential correlations several years later, while only three pairs did not exhibit any apparent correlation. This demonstrates that our method can detect APT group correlations more promptly than expert analysis, even without an extensive knowledge base.

5. Case Study

To further illustrate the proposed method, we present a case study that exemplifies the process of APT group correlation and evolution analyses.

5.1. Correlation Analysis

Using APT29 and CosmicDuke as examples, we conducted a study of their attack processes and extracted a similar attack process to create a TTP attack graph, as depicted in Figure 8. Applying the method proposed in this paper to identify the common fuzzy domain between CosmicDuke and APT29, we constructed a TTP attack graph based on their feature sequences, as illustrated in Figure 9.

Figure 9 illustrates several key aspects of their attack behavior. First, the attackers utilize ntdll.dll to load shared modules and execute malicious payloads (T1129). They also employ software restriction policies to repeatedly execute malicious payloads for persistence (T1543.003). Furthermore, they employ masquerading techniques (T1036), sleep commands (T1497.003), and hidden files (T1564.001) to evade defenses. Finally, they gather information about system devices (T1120) and time (T1124). After comparing with Figure 8, we can conclude that Figure 9 is a subset of Figure 8. This demonstrates that the simplified attack behavior feature representation in this paper can still preserve the original behavior patterns and their correlations with other groups.

5.2. Temporal Evolution Analysis

The temporal evolution of the relation network reflects the potential changes in the behavior patterns of APT groups over time. Groups that consistently exhibit similar behavior patterns over an extended period are more closely related. Figure 10 illustrates the evolutionary graph of the APT group relation network, using a set of prominent APT groups as an example.

By analyzing the changes in the APT group relation network, several observations can be made. From 2008 to 2011, which represents the initial emergence and development of APT groups, there were isolated instances of APT-related malicious samples and attack incidents. However, no clear correlations were observed among the various groups during this period. Subsequently, from 2012 to 2015, specific APT groups began to form. There were exchanges and cooperation among APT groups sharing the same national background. Transitioning from the period to 2016 to 2018, the relationships between APT groups became more chaotic. Security companies conducted extensive analyses and disclosed numerous APT attacks, which resulted in the exposure of significant attack tools. The similarity of attack behaviors among APT groups sharply increased. Using coordinated attacks to launch cyber warfare has gradually become a trend in the game of great powers. Finally, from 2019 to 2022, the majority of APT groups had developed their own distinctive attack patterns and tendencies, and the relationships tended to become more stable.

Take the Russian and Iranian APT groups as an example. Russian APT groups appeared earlier; they have been operating since at least 2008. Being a branch of APT29, CosmicDuke exhibits a strong correlation with APT29, while both of them have weaker correlations with APT28. Iranian APT groups emerged after 2014, and INSIKT GROUP stated that there was overlap between APT33, Charming Kitten, and MUDDYWATER before 2019 [42]. In 2019, the NSA [43] reported that Russian APT groups had employed a substantial number of attack tools of Iranian origin. Concurrently, a significant volume of APT34’s tool codes were leaked, and a considerable number of MuddyWater’s tool codes were offered for sale [44]. The Russian APT group Turla also targeted some of APT34’s infrastructure and leveraged APT34’s infrastructure for their own attack activities [6]. This allowed the APT groups in these two countries to establish certain ties over a long period of time.

6. Conclusions

This paper presents a novel approach to define the behavior patterns of APT groups using rough set theory, thereby enabling the measurement and analysis of correlations between APT groups. By extracting the feature representation of attack objects and attack behaviors from threat intelligence, we construct a behavior pattern model for APT groups. Furthermore, we propose a similarity calculation method based on link prediction to quantify the correlations between different attack groups. The validity and precision of our method are verified by comparing the obtained correlations with those disclosed by security companies. Through this case study, we explore correlations from various perspectives and examine the temporal evolution of APT group correlations. In future research, we plan to address the following aspects: (1) Incorporating the malicious software behavior sequences as a component of the attack behavior features and exploring the contexts between different attack steps. This will capture the potential correlations that may be overlooked by discrete features. (2) Conducting an analysis of the semantic-level similarity of security events and aggregating similar events to provide a comprehensive understanding of the security landscape. (3) Designing a graph evolution algorithm to automate the analysis of the evolving APT group relation network. By addressing these future research directions, we aim to further enhance our understanding of APT group behaviors, correlations, and evolution, ultimately contributing to the advancement of cybersecurity knowledge practices.

Author Contributions

Conceptualization, J.L. (Jingwen Li), J.L. (Jianyi Liu) and R.Z.; Funding acquisition, J.L. (Jianyi Liu) and R.Z.; Methodology, J.L. (Jingwen Li); Software, J.L. (Jingwen Li); Supervision, J.L. (Jianyi Liu) and R.Z.; Validation, J.L. (Jingwen Li); Writing—original draft, J.L. (Jingwen Li); Writing—review and editing, J.L. (Jingwen Li), J.L. (Jianyi Liu) and R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China Key Program (NO. U21B2020, U1936216) and the Beijing University of Posts and Telecommunications Fundamental Research Funds for the Central Universities (2021XD-A11-1).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xiang, G.; Shi, C.; Zhang, Y. An APT event extraction method based on BERT-BiGRU-CRF for APT attack detection. Electronics 2023, 12, 3349. [Google Scholar] [CrossRef]
Wikipedia. Stuxnet. [EB/OL]. Available online: https://en.wikipedia.org/wiki/Stuxnet#cite_note-57 (accessed on 28 November 2022).
Kushner, D. The real story of stuxnet. IEEE Spectr. 2013, 50, 48–53. [Google Scholar] [CrossRef]
Zetter, K.; Modderkolk, H. Revealed: How a Secret Dutch Mole Aided the U.S.-Israeli Stuxnet Cyberattack on Iran. [EB/OL]. Available online: https://news.yahoo.com/revealed-how-a-secret-dutch-mole-aided-the-us\israeli-stuxnet-cyber-attack-on-iran-160026018.html (accessed on 3 September 2019).
NCCIC. Grizzly Steppe—Russian Malicious Cyber Activity. [EB/OL]. Available online: https://www.cisa.gov/uscert/sites/default/files/publications/JAR_16-20296A_GRIZZLY%20STEPPE-2016-1229.pdf (accessed on 29 December 2019).
Symantec DeepSight Adversary Intelligence Team. Waterbug: Espionage Group Rolls out Brand-New Toolset in Attacks against Governments [EB/OL]. Available online: https://symantec-enterprise-blogs.security.com/blogs/threat-intelligence/waterbug-espionage-governments (accessed on 21 June 2019).
Youn, J.; Kim, K.; Kang, D.; Lee, J.; Park, M.; Shin, D. Research on Cyber ISR Visualization Method Based on BGP Archive Data through Hacking Case Analysis of North Korean Cyber-Attack Groups. Electronics 2022, 11, 4142. [Google Scholar] [CrossRef]
Alkhpor, H.K.; Alserhani, F.M. Collaborative Federated Learning-Based Model for Alert Correlation and Attack Scenario Recognition. Electronics 2023, 12, 4509. [Google Scholar] [CrossRef]
Lajevardi, A.M.; Amini, M. Big knowledge-based semantic correlation for detecting slow and low-level advanced persistent threats. J. Big Data 2021, 8, 148. [Google Scholar] [CrossRef]
Wei, R.; Cai, L.; Zhao, L.; Yu, A.; Meng, D. Deephunter: A graph neural network based approach for robust cyber threat hunting. In Proceedings of the 17th EAI International Conference on Security and Privacy in Communication Networks, Online, 6–9 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 3–24. [Google Scholar]
Han, X.; Pasquier, T.; Bates, A.; Mickens, J.; Seltzer, M. Unicorn: Runtime Provenance-Based Detector for Advanced Persistent Threats. In Proceedings of the 27th Annual Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2020; The Internet Society: Reston, VA, USA, 2020. [Google Scholar]
Luh, R.; Janicke, H.; Schrittwieser, S. Aidis: Detecting and classifying anomalous behavior in ubiquitous kernel processes. Comput. Secur. 2019, 84, 120–147. [Google Scholar] [CrossRef]
Kurtz, Z.; Perl, S. Measuring similarity between cyber security incident reports. In Proceedings of the 2017 Forum of Incident Response Security Teams (FIRST) Conference, San Juan, Puerto Rico, 11–16 June 2017. [Google Scholar]
Rezapour, A.; Tzeng, W.G. A robust algorithm for predicting attacks using collaborative security logs. J. Inf. Sci. Eng. 2020, 36, 597–619. [Google Scholar]
Karafili, E.; Wang, L.; Lupu, E.C. An argumentation-based reasoner to assist digital investigation and attribution of cyber-attacks. Forensic Sci. Int. Digit. Investig. 2020, 32, 300925. [Google Scholar] [CrossRef]
Xu, J.; Yun, X.; Zhang, Y.; Sang, Y.; Cheng, Z. Networktrace: Probabilistic relevant pattern recognition approach to attribution trace analysis. In Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia, 1–4 August 2017; pp. 691–698. [Google Scholar]
Office of the Director of National Intelligence. A Guide to Cyber Attribution; Office of the Director of National Intelligence: Washington, DC, USA, 2018.
Zhang, P.; Li, T.; Wang, G.; Luo, C.; Chen, H.; Zhang, J.; Wang, D.; Yu, Z. Multi-source information fusion based on rough set theory: A review. Inf. Fusion 2021, 68, 85–117. [Google Scholar] [CrossRef]
Biswas, A.; Biswas, B. Community-based link prediction. Multimed. Tools Appl. 2017, 76, 18619–18639. [Google Scholar] [CrossRef]
Son, K.H.; Kim, B.I.; Lee, T.J. Cyber-attack group analysis method based on association of cyber-attack information. KSII Trans. Internet Inf. Syst. 2020, 14, 260–280. [Google Scholar]
Haddadpajouh, H.; Azmoodeh, A.; Dehghantanha, A.; Parizi, R.M. Mvfcc: A multi-view fuzzy consensus clustering model for malware threat attribution. IEEE Access 2020, 8, 139188–139198. [Google Scholar] [CrossRef]
Van Dyke Parunak, H. A grammar-based behavioral distance measure between ransomware variants. IEEE Trans. Comput. Soc. Syst. 2021, 9, 8–17. [Google Scholar] [CrossRef]
Kida, M.; Olukoya, O. Nation-state threat actor attribution using fuzzy hashing. IEEE Access 2022, 11, 1148–1165. [Google Scholar] [CrossRef]
Liras, L.F.M.; De Soto, A.R.; Prada, M.A. Feature analysis for data-driven apt-related malware discrimination. Comput. Secur. 2021, 104, 102202. [Google Scholar] [CrossRef]
Li, S.; Zhang, Q.; Wu, X.; Han, W.; Tian, Z. Attribution classification method of apt malware in iot using machine learning techniques. Secur. Commun. Netw. 2021, 2021, 9396141. [Google Scholar] [CrossRef]
Dib, M.; Torabi, S.; Bou-Harb, E.; Assi, C. A multi-dimensional deep learning framework for iot malware classification and family attribution. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1165–1177. [Google Scholar] [CrossRef]
Wang, H.; Zhang, W.; He, H.; Liu, P.; Luo, D.X.; Liu, Y.; Jiang, J.; Li, Y.; Zhang, X.; Liu, W.; et al. An evolutionary study of iot malware. IEEE Internet Things J. 2021, 8, 15422–15440. [Google Scholar] [CrossRef]
Black, P.; Gondal, I.; Vamplew, P.; Lakhotia, A. Function similarity using family context. Electronics 2020, 9, 1163. [Google Scholar] [CrossRef]
Zhao, J.; Yan, Q.; Li, J.; Shao, M.; He, Z.; Li, B. Timiner: Automatically extracting and analyzing categorized cyber threat intelligence from social data. Comput. Secur. 2020, 95, 101867. [Google Scholar] [CrossRef]
Berninger, M. Going Atomic: Clustering and Associating Attacker Activity at Scale [EB/OL]. Available online: https://www.mandiant.com/resources/blog/clustering-and-associating-attacker-activity-at-scale (accessed on 12 May 2019).
Noor, U.; Anwar, Z.; Amjad, T.; Choo, K.-K.R. A machine learning-based fintech cyber threat attribution framework using high-level indicators of compromise. Future Gener. Comput. Syst. 2019, 96, 227–242. [Google Scholar] [CrossRef]
Kim, K.; Shin, Y.; Lee, J.; Lee, K. Automatically attributing mobile threat actors by vectorized att&ck matrix and paired indicator. Sensors 2021, 21, 6522. [Google Scholar]
Zhang, Q.; Xie, Q.; Wang, G. A survey on rough set theory and its applications. CAAI Trans. Intell. Technol. 2016, 1, 323–333. [Google Scholar] [CrossRef]
García, D.E.; DeCastro-García, N. Optimal feature configuration for dynamic malware detection. Comput. Secur. 2021, 105, 102250. [Google Scholar] [CrossRef]
Loia, V.; Orciuoli, F. Understanding the composition and evolution of terrorist group networks: A rough set approach. Future Gener. Comput. Syst. 2019, 101, 983–992. [Google Scholar] [CrossRef]
Sun, L.; Wang, L.; Ding, W.; Qian, Y.; Xu, J. Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans. Fuzzy Syst. 2020, 29, 19–33. [Google Scholar] [CrossRef]
Yang, Q.; Li, Y.L.; Chin, K.S. Constructing novel operational laws and information measures for proportional hesitant fuzzy linguistic term sets with extension to PHFL-VIKOR for group decision making. Int. J. Comput. Intell. Syst. 2019, 12, 998–1018. [Google Scholar] [CrossRef]
Shang, K.K.; Small, M.; Yan, W.S. Link direction for link prediction. Phys. A Stat. Mech. Its Appl. 2017, 469, 767–776. [Google Scholar] [CrossRef]
Guo, T.; Jiye, Z. A new measurement of link prediction based on common neighbors. J. China Univ. Metrol. 2016, 27, 121–124. [Google Scholar]
Lü, L.; Zhou, T. Link prediction in complex networks: A survey. Phys. A Stat. Mech. Its Appl. 2011, 390, 1150–1170. [Google Scholar] [CrossRef]
Rhode, M.; Burnap, P.; Jones, K. Early-stage malware prediction using recurrent neural networks. Comput. Secur. 2018, 77, 578–594. [Google Scholar] [CrossRef]
Insikt Group. Iranian Threat Actor Amasses Large Cyber Operations Infrastructure Network to Target Saudi Organizations. [EB/OL]. Available online: https://www.recordedfuture.com/iranian-cyber-operations-infrastructure (accessed on 26 June 2019).
National Security Agency. Turla Group Exploits Iranian APT To Expand Coverage Of Victims. [EB/OL]. Available online: https://media.defense.gov/2019/Oct/18/2002197242/-1/-1/0/NSA_CSA_TURLA_20191021%20VER%203%20-%20COPY.PDF (accessed on 21 October 2019).
GROUP-IB. Catching Fish in Muddy Waters. [EB/OL]. Available online: https://www.group-ib.com/blog/muddywater/ (accessed on 29 May 2019).

Figure 1. APT group correlation analysis model.

Figure 2. Analysis diagram of node correlation.

Figure 3. Heat map of the conditional attribute set of attack behavior feature representation.

Figure 4. The relation network between different APT groups and various types of general malware.

Figure 5. The correlation between the number of recognized relationships and precision based on different correlation analysis methods.

Figure 6. The correlation between the number of recognized relationships and precision based on different similarity indices.

Figure 7. APT group relation network generated from behavior patterns of APT groups.

Figure 8. TTP attack graph generated from similar attack processes of CosmicDuke and APT29.

Figure 9. TTP attack graph generated from common fuzzy behavior feature sequences of CosmicDuke and APT29.

Figure 10. Evolution graph of the APT group relation network.

Table 1. Well-known node similarity indices.

Name	Equation
Common Neighbors (CN)	$C N (x, y) = \| Γ (x) \cap Γ (y) \|$ ¹
Jaccard Index	$J a c c a r d (x, y) = \frac{\| Γ (x) \cap Γ (y) \|}{\| Γ (x) \cup Γ (y) \|}$
Hub-Promoted Index (HPI)	$H P I (x, y) = \frac{2 \| Γ (x) \cap Γ (y) \|}{min {k_{x}, k_{y}}}$ ²
Hub-Depressed Index (HDI)	$H D I (x, y) = \frac{2 \| Γ (x) \cap Γ (y) \|}{max {k_{x}, k_{y}}}$
Katz Index	$K a t z (x, y) = \sum_{l = 1}^{\infty} β^{l} \cdot \| p a t h_{x y}^{〈l〉} \|$ ³

¹

Γ (x)

denotes the set of neighbors of x. ²

k_{x}

is the degree of node x. ³

p a t h_{x y}^{〈l〉}

is the set of all paths with length l connecting x and y.

β

is a free parameter controlling the path weights.

Table 2. Setting of the malware attribution experiment for APT groups.

Experiment Number	Classification Task
A	2 classifications of general and APT malware.
B	General malware and 76 classifications of APT groups.
C	8 classifications of general malware types and 76 classifications of APT groups.
D	76 classifications of APT groups.

Table 3. Attribution and correlation results for different feature selection methods.

Feature Selection Method	Number of Features	Scaling Ratio	Classification Task	Attribution Accuracy	Attribution Precision	Attribution F1	Correlation Precision	Correlation Accuracy	Number of Connections
TF-IDF	410	103:1	A	0.9642	0.9640	0.9641	-	-	-
			B	0.9251	0.9297	0.9244	-	-	-
			C	0.7620	0.7581	0.7552	-	-	-
			D	0.7488	0.7609	0.7442	0.5384	0.9810	7
High- frequency word	332	127:1	A	0.9628	0.9625	0.9625	-	-	-
			B	0.9214	0.9225	0.9193	-	-	-
			C	0.7699	0.7720	0.7657	-	-	-
			D	0.7251	0.7298	0.7164	0.6250	0.9819	10
Mutual information	172	245:1	A	0.9475	0.9474	0.9474	-	-	-
			B	0.9166	0.9157	0.9114	-	-	-
			C	0.7722	0.7758	0.7688	-	-	-
			D	0.7509	0.7619	0.7455	0.8571	0.9838	12

Table 4. Experimental results based on different correlation analysis methods.

Method	Precision	Accuracy	TP	FP
Average distance	0.6923	0.9781	18	8
Center distance	0.1860	0.9658	8	35
Set similarity	0.8750	0.9781	14	2
Rough set	0.9090	0.9801	20	2

Table 5. Data distribution at different time stages.

Time	2008–2011	2012–2015	2016–2018	2019–2022	2008–2022
Security event samples	45	346	772	433	1582
Malware samples	82	1495	1363	279	3219

Table 6. Evaluation analysis results.

Dataset	Time	Precision	Accuracy	TP	FP
Security events	2008–2011	-	-	-	-
	2012–2015	0.5000	0.9972	1	1
	2016–2018	0.3333	0.9886	2	4
	2019–2022	0.5000	0.9801	2	2
	2008–2022	0.6666	0.9829	4	2
Malware	2008-2011	-	-	-	-
	2012–2015	0.7500	0.9987	3	1
	2016–2018	0.5000	0.9958	6	6
	2019–2022	0.4285	0.9920	3	4
	2008–2022	0.8666	0.9851	13	2
APT groups	2008-2011	-	-	-	-
	2012–2015	0.6666	0.9984	4	2
	2016–2018	0.5000	0.9946	9	9
	2019–2022	0.4545	0.9895	5	6
	2008–2022	0.9090	0.9844	20	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Liu, J.; Zhang, R. Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets. Electronics 2024, 13, 1106. https://doi.org/10.3390/electronics13061106

AMA Style

Li J, Liu J, Zhang R. Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets. Electronics. 2024; 13(6):1106. https://doi.org/10.3390/electronics13061106

Chicago/Turabian Style

Li, Jingwen, Jianyi Liu, and Ru Zhang. 2024. "Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets" Electronics 13, no. 6: 1106. https://doi.org/10.3390/electronics13061106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets

Abstract

1. Introduction

2. Related Work

2.1. Attack Attribution

2.2. APT Group Correlation Analysis

3. Methodology

3.1. Knowledge Representation of APT Groups

3.2. Behavior Pattern Construction of APT Groups Based on Rough Sets

3.3. Correlation Measurement Method Based on Link Prediction

4. Experiment

4.1. Data Gathering and Preprocessing

4.2. Feature Verification and Analysis

4.3. Correlation Analysis

4.4. Evaluation Analysis

5. Case Study

5.1. Correlation Analysis

5.2. Temporal Evolution Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI