Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning

Qian, Kexiang; Yang, Hongyu; Li, Ruyu; Chen, Weizhe; Luo, Xi; Yin, Lihua

doi:10.3390/app14041615

Open AccessArticle

Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning

by

Kexiang Qian

^1,2,

Hongyu Yang

³,

Ruyu Li

³,

Weizhe Chen

³,

Xi Luo

^3,* and

Lihua Yin

^3,*

¹

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, Sgri Power Grid Digitizing Technology Department, State Grid Smart Grid Research Institute Co., Ltd., Beijing 100190, China

³

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 511363, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(4), 1615; https://doi.org/10.3390/app14041615

Submission received: 11 January 2024 / Revised: 9 February 2024 / Accepted: 10 February 2024 / Published: 17 February 2024

(This article belongs to the Special Issue State-of-the-Art of Network Attack Detection and Situation Awareness Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid growth of IoT devices, the threat of botnets is becoming increasingly worrying. There are more and more intelligent detection solutions for botnets that have been proposed with the development of artificial intelligence. However, due to the current lack of computing power in IoT devices, these intelligent methods often cannot be well-applied to IoT devices. Based on the above situation, this paper proposes a distributed botnet detection method based on graph partitioning, efficiently detecting botnets using graph convolutional networks. In order to alleviate the wide range of IoT environments and the limited computing power of IoT devices, the algorithm named METIS is used to divide the network traffic structure graph into small graphs. To ensure robust information flow between nodes while preventing gradient explosion, diagonal enhancement is applied to refine the embedding representations at each layer, facilitating accurate botnet attack detection. Through comparative analysis with GATv2, GraphSAGE, and GCN across the C2, P2P, and Chord datasets, our method demonstrates superior performance in both accuracy and F1 score metrics. Moreover, an exploration into the effects of varying cluster numbers and depths revealed that six cluster levels yielded optimal results on the C2 dataset. This research significantly contributes to mitigating the IoT botnet threat, offering a scalable and effective solution for diverse IoT ecosystems.

Keywords:

botnet detection; graph convolutional networks; graph partitioning; METIS

1. Introduction

With the innovation and advancement of information technology, the application of Internet of Things (IoT) technology has experienced explosive growth in various fields. Increasingly, IoT products, systems, and platforms have emerged, covering industries such as industrial control [1,2], agriculture [3], logistics [4], healthcare [5], transportation [6], energy [7], and smart living [8]. According to data released at the 2023 Global Digital Economy Conference, by the end of May this year, the number of mobile IoT terminal users in China exceeded 2.05 billion, making it the first among major economies worldwide to achieve a “thing” connection count surpassing the “people” connection count.

However, this growth brings with it significant security challenges. The diversity and proliferation of Internet of Things (IoT) devices have made them prime targets for botnet attacks. As these devices become infected with malicious programs and turned into puppets under attackers’ control, the rapid expansion of botnet nodes gives rise to large-scale network security incidents, including distributed denial of service (DDoS) attacks. These incidents cause substantial losses to IoT users, highlighting the urgent need for effective security measures. The issue of detecting botnet nodes within the Internet of Things (IoT) ecosystem has emerged as a prominent research area, garnering significant attention from both industry and academia [9].

The botnet, consisting of a vast network of infected computers, poses a significant threat to the security of the Internet of Things. Attackers have compromised these computers through various means, infecting them with malicious programs and turning them into puppet hosts under the attackers’ control. Once the attacker successfully infects victim devices with bot programs, these devices become botnet hosts and join the attacker’s command and control network. As shown in Figure 1, when there is a need for attack activities, the attacker sends various attack commands through a control channel, which are received and executed by these puppet hosts, forming a highly controllable botnet. Due to the dynamic nature of network traffic, it is challenging to detect botnet nodes effectively.

According to Threat Intelligence Report 2023 [10], 60% of attacks in telecom mobile networks are linked to Internet of Things (IoT) bots scanning for vulnerable hosts to expand their botnets for use in distributed DDoS attacks. Botnets have become a major generator of DDoS traffic. In 2016, attackers exploited Mirai [11] to compromise thousands of IoT devices and use them as nodes to launch large-scale DDoS attacks, causing significant disruptions to numerous mainstream websites. The Mirai attack wave reached its peak in September 2016 when this botnet launched massive DDoS attacks, resulting in temporary outages of prominent sites such as OVH, Dyn, and Krebs on Security, rendering them unable to provide normal services. According to reports from OVH and Dyn, the peak traffic of these attacks exceeded 1 Tbps, making it the largest known attack in terms of scale. The Necurs botnet [12], composed of millions of infected devices, is capable of sending 5 million spam emails worldwide every hour. IoT botnets have increasingly become a serious threat to network security.

Existing botnet detection methods mostly rely on machine learning or deep learning-based approaches. Beigi et al. [13] evaluated the effectiveness of features such as the source port, destination port, protocol, communication duration, first packet length, packet exchange, and random reconnection in botnet detection. However, using similar features requires researchers to possess sufficient domain knowledge and select the most effective features for experimentation. With the emergence of graph neural networks, researchers have started exploring graph learning methods to address the botnet detection problem [14,15,16]. Given the complexity of relationships between entities that graphs can model, they are particularly suited for depicting intricate network structures. Their ability to efficiently extract features and generalize makes them powerful tools for analyzing network behavior. However, the challenge arises in distributed training, especially when dealing with the vast number of nodes in botnets and the constrained computing capabilities of IoT devices. To manage large-scale graph data on parallel computing systems effectively, it is crucial to partition and distribute these data across various computing nodes. This strategy aims to ensure optimal load balancing among nodes while also minimizing the communication costs between them, highlighting the importance of an efficient graph partitioning algorithm for distributed computing of large-scale graph data. Among the available algorithms, METIS [17], developed by the Karypis lab, is notable for its robust performance in segmenting large, irregular graphs, partitioning extensive grids, and optimizing the ordering for sparse matrix factorization. Its application spans a wide range of fields that require graph partitioning, underscoring its significance and versatility.

In this work, we propose a botnet community detection method. It primarily utilizes an efficient deep graph convolutional network to identify large-scale IoT botnet communities. In graph neural networks, numerous inter-layer message-passing processes require high communication demands. This limits the performance of common distributed training methods, making it challenging to achieve optimal training results. To address this issue, we draw inspiration from Cluster-GCN [18], proposed by Google, which has demonstrated outstanding performance in protein–protein interaction (PPI) and Reddit datasets. First, we preprocess the challenging task of training large-scale IoT botnets by applying data preprocessing to update the initial embedding representation of each node. Next, we employ the METIS algorithm to perform graph partitioning, obtaining small subgraphs that can be trained on individual machines. To better leverage the information propagated by neighboring nodes without causing gradient explosion, we employed a diagonal enhancement technique to process the embedding representations at each layer to identify botnet attacks.

In summary, the key contributions of this paper are as follows:

We introduce a new system that efficiently detects centralized and P2P botnets in large-scale IoT networks using minimal GPU memory resources. It only requires aggregated connection information from network traffic, making it highly practical for real-world deployments.
We propose a method that applies a data preprocessing method for large graphs and first implement an efficient graph convolutional neural network, Cluster-GCN, for botnet detection. Cluster-GCN effectively partitions large graphs into small subgraphs that can be trained on individual IoT devices.
We implemented a prototype and conducted experiments using the C2, P2P, and Chord datasets. The experimental results demonstrate that our proposed research method achieves an accuracy of 0.99978. Furthermore, at this accuracy level, it significantly reduces the required computational resources.

The paper is organized as follows:

Section 2 summarizes the previous related work. Section 3 describes the background knowledge. Section 4 illustrates our proposed method in detail. Section 5 presents the evaluation results, including the datasets, tables, and metrics, and analyzes the results. Section 6 visualizes and discusses the proposed model detection results. Finally, in Section 7, the conclusion is presented.

2. Related Work

In the past few years, there has been a significant amount of research focused on botnet detection. In this section, we discuss previous studies and research related to the detection of botnets, including methods based on machine learning and graph learning approaches.

2.1. ML-Based Approaches

In the early stages, the detection of botnets primarily relied on machine learning algorithms [19,20] such as random forest [21], SVM [22], k-means [23], XGBoost [24], and others. In 2007, Choi et al. [25] proposed a mechanism for detecting botnets by monitoring Domain Name System (DNS) traffic. This mechanism detects botnet activity formed by simultaneous DNS query requests from distributed botnet programs. However, this approach requires significant processing time when monitoring large-scale networks. Gu et al. proposed BotHunter [26], BotSniffer [27], and BotMiner [28], which analyze traffic features such as abnormal packet sizes and port numbers from the perspectives of multiple infection detection stages, multi-round communication activities, and longer communication response times to identify botnets. Azab et al. [29] utilized the Classification of Network Information Flow Analysis (CONIFA) framework to extract a set of network traffic features to capture patterns in C&C channels and malicious traffic. Current machine learning-based botnet detection methods require a substantial amount of domain knowledge and expert feature extraction, such as packet sizes and inter-arrival times. Since 2010, with the gradual development of deep learning [30], the latest deep learning algorithms have significantly surpassed traditional machine learning algorithms in terms of data prediction and classification accuracy. Deep learning does not require manual feature extraction but automatically filters and extracts high-dimensional features from the data. Recently, transfer learning [31] and ensemble learning [32] have been increasingly employed in network data detection. However, both approaches face challenges regarding their generalization capabilities. Existing detection methods, such as extracting features of botnet attack vectors [33], flow features [34], and utilizing auto-encoders to extract packet features [35], combine with convolutional neural networks (CNN), long short-term memory (LSTM), and other methods for botnet detection. Nevertheless, the processing of high-dimensional data is inherently complex, demanding significant computational resources and storage capacities [36,37]. This presents a particular challenge for IoT devices, which are often constrained by limited memory capacities, impeding their ability to store the requisite volume of network traffic data for analysis. To address the above issues, this article proposes the application of graph partitioning techniques as a solution.

2.2. Graph-Based Approaches

The problem of botnet detection can be addressed through a graph-based approach. Graph neural networks [38] are capable of capturing complex interactions between nodes, adapting to new models and data and exhibiting better generalization performance in detecting botnets. They also possess efficient feature extraction capabilities, relieving the burden of manual feature extraction in traditional methods. Furthermore, graph neural network-based methods can quickly adapt to botnet attacks [39], ensuring the normal operation of legitimate nodes in the network while detecting traces of botnet nodes. They perform well in handling large-scale networks, effectively processing graphs with millions of nodes, thereby addressing complex network environments like the Internet of Things (IoT). Therefore, employing a graph-based method, specifically utilizing graph neural networks, offers advantages such as capturing complex node interactions, better generalization performance, efficient feature extraction, alleviating manual feature extraction efforts, adaptability to network attacks, and effective handling of large-scale networks with millions of nodes.

Graph-based features are better than flow-based features in detecting botnet malware, since they avoid the need to cross-compare flows across the dataset. Chowdhury et al. [40] investigate seven graphed-based features that may be connected to bot activities, and apply a self-organizing map (SOM) to establish the clusters of nodes based on these graph features, which enables fast detection of bot nodes. Nguyen et al. [41] propose PSI-Graph to detect IoT botnets by extracting high-level functional graphs, thereby improving their effectiveness in handling multi-architecture issues while avoiding complex control flow graph analysis. Wang et al. [42] propose an automated model, BotMark, that detects botnets with a hybrid analysis of flow-based and graph-based network traffic behaviors. Lo et al. [43] propose XG-BoT, an explainable deep graph neural network model, for botnet node detection. The proposed model is composed of a botnet detector and an explainer for automatic forensics. It can effectively detect malicious botnet nodes under large-scale networks. Zhao et al. [44] proposed Bot-AHGCN, which establishes a multi-attribute heterogeneous information network (AHIN) to model the interdependencies between botnet entities and learn the interaction behavior patterns among bots. Nguyen et al. [45] present a novel high-level PSI-rooted subgraph-based feature for the detection of IoT botnets, then generate a limited number of features that have precise behavioral descriptions, which require a smaller space and reduce the processing time. In this article, the detection accuracy of botnet nodes will be enhanced, and the application of graph neural networks in botnet detection will be expanded through the optimization of the graph convolutional neural network model.

3. Background

In this section, we first introduce the topology of botnets to illustrate our motivation. Then, we provide an overview of the GCN and METIS algorithms to enhance understanding for the subsequent sections.

3.1. Botnet Architecture

A botnet refers to a network formed by infecting a large number of hosts with one or more propagation methods, enabling one-to-many control between the controller and the infected hosts. Based on the command and control (C&C) structure of botnets, the transmission of commands and critical information to the bots is defined, leading to three types: centralized, decentralized, and hybrid.

Centralized botnets use a client/server (C/S) model where botnet hosts actively send requests to a limited number of C&C servers. The attacker distributes control commands and program resources through these C&C servers, as shown in Figure 2. However, the centralized topology has a fatal flaw, which is a single point of failure. If a C&C server is discovered, the botnet hosts cannot receive instructions and stop working. While this benefits white-hat security professionals maintaining network security, it is detrimental to malicious hackers seeking to cause disruptive security events.

To overcome the single point of failure in centralized botnets and improve their robustness and flexibility, attackers have applied peer-to-peer technology, forming decentralized botnets known as P2P botnets, as shown in Figure 3. In P2P botnets, bot nodes act as both clients and servers, and each node is equal. Although command dissemination in P2P botnets is slower, the communication process does not rely on a specific node, making them difficult to measure, hijack, and shut down. P2P botnets have better scalability and robustness compared to traditional C/S structures but are more challenging to eliminate due to their large scale and defense difficulty.

Figure 4 shows the structure of a hybrid botnet, which combines both centralized and decentralized elements, incorporating the advantages of both. An example is Pinkbot [46], which uses a P2P approach for less time-sensitive instructions and employs a C&C mode for distributing time-critical commands. The experimental dataset used in Section 5 consists of both centralized and decentralized botnets.

Given the widespread use of IoT technology, detection methods for IoT botnets should possess distributed detection capabilities and utilize computational resources effectively. Various devices should be able to parallelize detection within the maximum allowable range rather than relying solely on a single server to handle massive data detection.

3.2. Graph Convolutional Network

The traditional GCN consists of L graph convolutional layers, where each layer constructs embeddings for each node by aggregating the embeddings of its neighboring nodes in the previous layer:

X^{(l + 1)} = σ (Z^{(l + 1)})

(1)

Z^{(l + 1)} = A^{'} X^{(l)} W^{(l)}

(2)

where

X^{(l)} \in R^{N \times F_{1}}

represents the l-th layer embeddings of all n nodes,

X^{(0)} = X

,

A^{'}

is the normalized and regularized adjacency matrix,

W^{(l)} \in R^{F_{1} \times F_{(l + 1)}}

is the feature transformation matrix learned for downstream tasks, and

σ (\cdot)

is typically the ReLU activation function.

Semi-supervised node classification is a widely used application of GCN. In this application, the objective is to learn the weight matrices in Equation (1) by minimizing the loss function:

L = \frac{1}{| Y_{L} |} \sum_{i \in Y_{L}} l o s s (y_{i}, z_{i}^{(L)})

(3)

where

Y_{L}

contains the labels of all labeled nodes,

y_{i}

represents the true label of node i, and

z_{i}^{(L)}

denotes the final predicted result for node i, which is the i-th row of matrix

Z^{(l)}

. In practice, a cross-entropy loss is commonly used for node classification in multi-class or multi-label problems.

3.3. METIS Algorithm

Graph partitioning [47] is the process of evenly dividing a large graph into a series of subgraphs to facilitate distributed applications. Each subgraph is stored on a separate machine, allowing for parallel execution between subgraphs. If a current subgraph requires information from another subgraph, a communication overhead is incurred. The quality of graph partitioning directly impacts the storage cost on each machine and the communication cost between machines.

METIS [17] is a method for vertex partitioning, which is a type of graph partitioning. It assigns graph vertices to different subgraphs while maintaining the integrity of the connections between vertices in each subgraph. METIS is commonly used to achieve efficient and balanced graph partitioning in distributed systems. There are mature software packages available that can be directly used for this type of graph partitioning. In Cluster-GCN, we use this method for graph segmentation to lay the foundation for distributed botnet detection.

4. Proposed Method

In this section, we describe a new botnet community detection method, which is constructed by Cluster-GCN. Figure 5 illustrates the overall forward propagation process. It involves the following steps:

Step 1: Incorporate the initial embedding of nodes into the input model. Deploy network monitoring probes at specific locations to collect data traffic information from monitoring devices. Abstract the devices and traffic information as nodes and edges, respectively, constructing a graph representation of botnet traffic and performing data preprocessing.

Step 2: Employ graph partitioning algorithms to partition the graph. Use the METIS graph partitioning algorithm while generating a matrix to store the removed edges during the partitioning process. During this partitioning process, a matrix is generated to meticulously record the edges that are removed. This careful partitioning enables the handling of large-scale data by breaking it down into smaller pieces that can be more easily processed and analyzed.

Step 3: Each machine can independently train the segmented subgraphs. Prior to distribution, the server holds all the subgraphs and edge information. Start by randomly selecting a number, q. Combine every q subgraphs into one subgraph and distribute them to IoT devices for training. After completing one round of training, update the node embeddings in the server and select the next random integer.

Step 4: Restore the training results from each device into the large graph and perform botnet detection on the nodes.

4.1. Data Preprocessing

In general, a set of network flows from the datasets can be treated as graph data, as each of the network host IP addresses can be represented as a graph node, and network communication flows between each host can be represented as edges. Since the original dataset forms an unattributed graph where all nodes have the same label of 1, it is challenging to make predictions based solely on node features in the feature learning process. To comprehensively determine whether a node belongs to a botnet, it is necessary to consider the labels of the edges connecting the nodes.

As hackers can instruct all their controlled bot hosts in the botnet to simultaneously initiate continuous access to specific network targets at specific times [48], there is a significant difference in communication volume between benign nodes, malicious nodes, and benign nodes under attack. Hence, the labels of the edges connecting the nodes also exhibit substantial variations.

To address this, we preprocess the data to incorporate the edge labels as a new feature of each node. We calculate the sum of all edge labels connected to a node and divide it by the node’s degree. This normalized value serves as the new feature for the node. It is important to note that we do not want this new feature to be a learnable parameter, so we perform this preprocessing step to obtain the node’s updated feature.

4.2. Feature Learning

Define the communication graph of the botnet as

G = (V, A)

, where V is a set consisting of n nodes

\{v_{1}, v_{2}, \dots, v_{n}\}

. The adjacency matrix of the graph G is denoted by

A \in R^{n \times n}

, which is typically symmetric. In the matrix

A \in R^{n \times n}

,

a_{i j} = 1

represents the edge between nodes

v_{i}

and

v_{j}

, indicating direct communication between nodes

v_{i}

and

v_{j}

, while a value of 0 indicates no direct communication. The node degree matrix is represented by

D = d i a g (d_{1}, d_{2}, \dots, d_{n})

, where

d_{i} = \sum_{j = 1}^{n} a_{i j}

. It is important to note that A and

D S

represent the fixed graph structure throughout the entire learning process.

We utilize the Cluster-GCN model [18] to learn the topology of the botnet network and the preprocessing technique to modify the embedding representations of nodes and perform end-to-end detection. The working principle of Cluster-GCN is as follows: we first use the METIS algorithm to process the large graph and partition it into a specified number of dense subgraphs. Then, we sample the node blocks that are related to the subgraph and restrict the neighborhood search within that subgraph. This simple and effective strategy significantly improves memory and computational efficiency while achieving comparable testing accuracy to previous algorithms.

To address the issue of low computational performance of IoT devices, we adopted a random multiple clustering approach proposed by Cluster-GCN to merge inter-cluster edges and reduce discrepancies between batches. Initially, we partitioned the graph into c clusters

V_{1}, V_{2}, \dots, V_{c}

, where c takes a relatively large value. When constructing batches B for SGD updates, instead of using just one cluster, we randomly selected q clusters, denoted as

t_{1}, \dots, t_{q}

, and included their nodes

{V_{t_{1}} \cup \dots \cup V_{t_{q}}}

in the batch. Additionally, the edges between the selected clusters

{A_{i j} | i, j \in t_{1}, \dots, t_{q}}

were also reintroduced. This recombination of edges between clusters minimized differences between batches. Experimental results demonstrated that using multiple clusters as a batch improved convergence. The final Cluster-GCN algorithm is presented in Algorithm 1.

Algorithm 1 Cluster-GCN
1:	Input: Graph A, feature X, label Y
2:	Output: Node representation $\bar{X}$
3:	Partition graph nodes $V$ into c clusters ${V_{1}, V_{2}, \dots, V_{m}}$ by METIS, such that $\| V_{i} \| = k$ for $i = 1, 2, \dots, c$
4:	for each epoch $t = 1, 2, \dots, T$ do
5:	Randomly choose q clusters, $t_{1}$ , $t_{2}$ , ⋯, $t_{q}$ from V without replacement;
6:	Form the subgraph $\bar{G}$ with nodes $\bar{V} = [V_{t_{1}}, V_{t_{2}}, \dots, V_{t_{q}}]$ and links $A_{\bar{V}, \bar{V}}$ ;
7:	Compute $g \leftarrow \nabla L_{A_{V, \bar{V}}}$ (loss on the subgraph $A_{\bar{V}, \bar{V}}$ );
8:	Conduct Adam update using gradient estimator g
9:	end for
10:	Output ${W_{l}}_{l = 1}^{L}$

In our approach, we utilize the METIS algorithm to partition n nodes into m groups,

V = [V_{1}, V_{2}, \dots, V_{m}]

. Let

V_{i}

represent the nodes in the i-th group. Based on this, we obtain m subgraphs as follows:

\tilde{G} = [G_{1}, G_{2}, \dots, G_{m}] = [\{V_{1}, E_{1}\}, \{V_{2}, E_{2}\}, \dots, \{V_{m}, E_{m}\}]

(4)

Each

E_{i}

only consists of the links between nodes in

V_{i}

. Similarly, after reorganizing the nodes, the adjacency matrix is also partitioned into

m^{2}

sub-matrices:

A = \bar{A} + Δ = [\begin{matrix} A_{11} & \dots & A_{1 m} \\ ⋮ & ⋱ & ⋮ \\ A_{m 1} & \dots & A_{m m} \end{matrix}]

(5)

and:

\bar{A} = [\begin{matrix} A_{11} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & A_{m m} \end{matrix}]

(6)

Δ = [\begin{matrix} 0 & \dots & A_{1 m} \\ ⋮ & ⋱ & ⋮ \\ A_{m 1} & \dots & 0 \end{matrix}]

(7)

where

A_{i i}

represents the adjacency matrix corresponding to the subgraph

G_{i} = \{V_{i}, E_{i}\}

,

Δ

denotes the interconnections between subgraphs, consisting of all off-diagonal blocks of matrix A, and

A_{i j} (i \neq j)

represents the links between subgraphs

G_{i}

and

G_{j}

. Similarly, we can partition the feature matrix X and training labels Y into subsets

[X_{1}, X_{2}, \dots, X_{m}]

and

[Y_{1}, Y_{2}, \dots, Y_{m}]

, respectively, based on the partition

[V_{1}, V_{2}, \dots, V_{m}]

, where

X_{i}

and

Y_{i}

are formed by the features and labels of nodes in subgraph

V_{i}

, respectively.

The purpose of this partitioning is to leverage the limited computational resources of IoT devices by assigning different subgraphs to different distributed devices for training. Only one subgraph needs to be loaded into the GPU memory. As a result, the objective function for training GCN is also decomposed into different clusters. Let

{\bar{A}}^{'}

represent the normalized version of

\bar{A}

, and the final embedding matrix becomes:

\begin{matrix} Z^{(L)} & = {\bar{A}}^{'} σ ({\bar{A}}^{'} σ (\dots σ ({\bar{A}}^{'} X W^{(0)}) W^{(1)}) \dots) W^{(L - 1)} \\ = [\begin{matrix} {\bar{A}}_{11}^{'} σ ({\bar{A}}_{11}^{'} σ (\dots σ ({\bar{A}}_{11}^{'} X W^{(0)}) W^{(1)}) \dots) W^{(L - 1)} \\ ⋮ \\ {\bar{A}}_{m m}^{'} σ ({\bar{A}}_{m m}^{'} σ (\dots σ ({\bar{A}}_{m m}^{'} X W^{(0)}) W^{(1)}) \dots) W^{(L - 1)} \end{matrix}] \end{matrix}

(8)

In Equation (5), due to the reorganization of the adjacency matrix A into the form

\bar{A} + Δ

, where the original graph is partitioned into m clusters and inter-cluster edges, during the training phase, it is only necessary to compute the cross-entropy loss on the nodes selected within the clusters. The loss function can be decomposed into:

L_{{\bar{A}}^{'}} = \sum_{t} \frac{| V_{t} |}{N} L_{\bar{A_{t t}^{'}}}

(9)

L_{{\bar{A}}_{t t}^{'}} = \frac{1}{| V_{t} |} \sum_{i \in V_{t}} l o s s (y_{i}, z_{i}^{(L)})

(10)

The method is then based on the decomposition form in (8)–(10). At each step, we sample a cluster

V_{t}

and perform stochastic gradient descent (SGD) to update the weights based on the gradient of the loss function. This process only requires the subgraph

A_{t t}

, the

X_{t}

,

Y_{t}

on the current batch, and the current model parameters

{\{W^{l}\}}_{l = 1}^{L}

. The implementation involves simple forward and backward propagation of matrix products, which is easier to implement compared to the neighborhood search procedure used in previous SGD-based training methods. Since each node in

V_{t}

is only connected to nodes within its own subgraph, there is no need for each node to perform a neighborhood search outside of its own subgraph.

To avoid the potential impact on performance caused by the removal of certain links after graph partitioning and the issue of potential differences between the clustering distribution of METIS and the original dataset, we approximate the overall dataset distribution by using multiple random combinations.

The approach is as follows:

We begin by partitioning the graph into p clusters, denoted

[V_{1}, V_{2}, \dots, V_{p}]

, where p is relatively large. When creating a batch B for an SGD update, instead of considering a single cluster, we randomly select q clusters as a batch, represented as

[t_{1}, t_{2}, \dots, t_{q}]

, and include their nodes

\{V_{t_{1}} \cup \dots \cup V_{t_{q}}\}

in the batch. Additionally, we reintroduce the links between the selected clusters to ensure connectivity.

\{A_{i j} | i, j \in t_{1}, \dots, t_{q}\}

(11)

By incorporating the between-cluster links and considering random combinations of clusters, we reintegrate the between-cluster connections and reduce the variance across batches. This approach helps to maintain the overall connectivity of the graph and ensures that the batches are representative of the original dataset distribution.

4.3. Feature Regeneration

Here we employed diagonal augmentation to enhance the training of deep Cluster-GCN. In the original GCN setting, each node aggregates the representations of its neighbors from the previous layer. However, this strategy may not be suitable in the context of deep GCNs, as it does not take into account the number of layers. Intuitively, nearby neighbors should contribute more than distant nodes. Diagonal augmentation refers to amplifying the diagonal part of the adjacency matrix A used in each GCN layer.

X^{l + 1} = σ ((\tilde{A} + λ d i a g (\tilde{A})) X^{(l)} W^{(l)})

(12)

The experimental results of adopting the diagonal enhancement techniques are presented in Section 5.

5. Evaluation

In the Evaluation section, an evaluation of the proposed model is conducted, where the datasets utilized in our experiments are introduced, along with the experimental setup and the chosen parameters. The metrics employed for performance evaluation are detailed, and a comparison is made between the precision, accuracy, F1 score, recall rate, and average time consumption of GATv2, GraphSAGE, GCN, and our method. Moreover, the effects of varying the numbers of clusters and the sizes of layers on the performance of our proposed approach are discussed.

5.1. Dataset

In general, we can construct a network topology graph from a set of network flow data. In this graph, we represent the IP address of each network device as a node, and the network communication flows between network devices are constructed as edges in the graph. In formal terms, we can define the network topology graph as

G = (V, E)

, where graph G consists of a node set V and an edge set E. We can represent such a graph in the format of an adjacency matrix, where for a network topology graph with n nodes and network flows, it can be represented as

A \in R^{n \times n}

. If there exists network flow between network device i and network device j, then

a_{i j} = 1

.

We used three publicly available botnet network graph datasets: C2, P2P, and Chord. These botnet network graph datasets were generated using the original network flow data from CTU-13 [49]. The C2 and P2P botnet network traffic [39] was generated from real malicious software traffic samples, while the Chord botnet network traffic was generated from synthetic malicious software traffic. The dataset is formed by embedding a P2P botnet into real traffic. All the graphs in the datasets are a mixture of botnet nodes and botnet network topological patterns with background traffic collected in 2018 from the Center for Applied Internet Data Analysis (CAIDA). In the C2 and P2P datasets, each graph contains approximately 3000 botnet nodes, while in Chord, each graph contains 10,000 botnet nodes. All the graphs are attribute-less, meaning that the node attributes are vectors of all ones. The statistical information for the C2, P2P, and Chord datasets is presented in Table 1, Table 2 and Table 3.

Our experiments were conducted on a Linux server with two 2.1 GHz 16-core Intel(R) Xeon(R) processors and 256 GB memory and a Tesla A800 GPU. The proposed model was developed in Python using several deep learning packages, such as Sckit-learn, PyTorch Geometric, and PyTorch. For performing hyperparameter tuning, a grid search was performed to ensure the optimal settings were used. Our grid search values are given in Table 4.

5.2. Evaluation Metrics

We evaluate the performance of our models using accuracy, precision, recall, F1 score, and memory usage as the performance indicators in this paper. Accuracy measures the overall correctness in classifying both normal and abnormal nodes. Precision assesses the ability to correctly identify botnet nodes, while recall measures the ability to detect all botnet attacks. The F1 score is the harmonic mean of precision and recall, providing an overall measure of the effectiveness of the detection models.

These metrics are calculated using the following formulas:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(13)

P r e c i s i o n = \frac{T P}{T P + F P}

(14)

R e c a l l = \frac{T P}{T P + F N}

(15)

F_{1} = \frac{2 P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(16)

In the context of flow classification, TP (True Positive) represents the number of flows correctly classified as positive (e.g., botnet attacks), TN (True Negative) represents the number of flows correctly classified as negative (e.g., normal traffic), FP (False Positive) refers to the number of flows incorrectly classified as positive, and FN (False Negative) represents the number of flows incorrectly classified as negative.

5.3. Effectiveness

We evaluated the performance of our proposed method on three datasets: C2, P2P, and Chord. The evaluation metrics include precision, accuracy, F1 score, and recall. Additionally, we also considered the average runtime as a reference for comparison with the performance of GATv2, GraphSAGE, and GCN.

The performance evaluation results are presented in Table 5. The performance evaluation results indicate that on the C2 dataset, GATv2, GraphSAGE, and our approach all demonstrate high precision, with all exceeding 99%. Among them, our approach achieves the highest precision of 99.66%, surpassing GraphSAGE by 0.1%. However, all three methods achieve an accuracy of over 99.95%. In comparison to precision, the difference in accuracy is approximately 1%. The trend of F1 score is similar to that of precision. While our approach does not reach the highest recall, the difference compared to the optimal solution is only around 0.04%. In terms of ave_time, our approach takes approximately 33% longer than GraphSAGE.

On the other two P2P datasets, the performance of precision, accuracy, and F1 score remains consistent with what was mentioned earlier. In terms of recall, GATv2 and our method each achieve the best results on one dataset. However, GraphSAGE, which performed best on the C2 dataset, only achieves a recall rate of 76.16% on the Chord dataset. Apart from this, the recall rate fluctuates minimally across all four datasets.

It is worth noting that our method achieves the minimum ave_time on the Chord dataset, but on the P2P datasets, GraphSAGE still outperforms it. We speculate that this may be related to the graph partitioning and storage method after METIS partitioning. In comparison, GraphSAGE performs fixed-neighbor sampling for each node in the graph, allowing it to efficiently capture local neighborhood information and reduce the training time. However, this sampling strategy is not conducive to distributed detection of botnets.

Overall, the performance evaluation results indicate that our approach achieves the best precision and F1 score across all datasets, reaching 99.71% and 99.74%, respectively. For the Chord dataset, our approach outperforms other methods in all three aspects, even achieving an average runtime as low as 125.67 s.

In addition, we compared the impact of clustering and non-clustering on the model, as shown in Figure 6. Under different numbers of model layers, the accuracy of clustering is higher than that of non-clustering. The impact of different cluster numbers on our approach was also assessed through 3-fold cross-validation, as shown in Table 6. We set the cluster numbers to 100 and 1000 during the METIS partitioning process and considered batch sizes of 10, 20, 25, and 5, 10, 20, 50, respectively. We separately discussed the influence of cluster numbers on the detection results when using batch sizes of 10 and 20, as well as the impact of different batch sizes on the detection results when the cluster numbers were the same.

In Table 6, when the batch size is 10, the precision is 0.086% higher with a cluster number of 1000 compared to 100. However, as the cluster number increases, for batch sizes of both 10 and 20, there is a certain degree of decrease in accuracy, F1 score, and recall.

When the cluster number is fixed, precision shows minimal fluctuations in both test groups. When the cluster number is 100, accuracy, F1 score, and recall exhibit an increasing trend with the increase in batch size, but the magnitude of improvement is not significant. When the cluster number is 1000, accuracy, F1 score, and recall reach their maximum values at a batch size of 50.

To summarize, it can be observed that as the cluster number increases, the predicted precision slightly improves, but the remaining metrics show more or less decline. When the cluster number is fixed, accuracy, F1 score, and recall all increase with the increase in batch size, showing a positive correlation with it.

In Table 7, we evaluated the performance impact of different numbers of layers on our approach. A total of seven different numbers of layers were tested, including 6, 7, 8, 9, 10, 12, and 16, and even when the number of layers reached 16, our approach was still able to effectively distinguish between malicious and non-malicious nodes.

Similar to Table 6, as the number of layers increases, we can observe in Table 7 that the precision increases from 0.04% to 0.29%, while the accuracy, F1 score, and recall decrease by 0.044%, 0.416%, and 1.176%, respectively. At the same time, the average time consumed per layer increased by 60.39%. Compared to the improvement in precision with an increase in the number of layers, the impact on accuracy, F1 score, recall, and time consumption becomes more prominent.

To enhance the propagation of information from neighboring nodes and avoid gradient explosion, we utilize diagonal enhancement to process the embedding representation of each layer. As shown in Figure 7, the comparison between diagonal enhancement and JumpingKnowledge in different layers shows that when the number of layers is six, the accuracy of the former is higher than that of the latter in the three datasets.

Based on the above findings, when the number of layers of the graph convolution network is six, the proposed model has strong generalization ability. We can determine that the proposed model’s optimal number of model layers in the three datasets is six.

The loss curves of the model on both the training and validation sets are plotted in Figure 8. It can be observed that the losses on both sets showed a steady downward trend as the number of training epochs increased. It is worth noting that a very small difference between the losses on the training and validation sets was observed, indicating that the designed Cluster-GCN model performs well on the dataset without showing significant signs of overfitting.

6. Discussion

6.1. Botnet Topology

In this section, we use GNNExplainer [53], a method that can interpret graph neural network models, to interpret and analyze the detection results of the proposed model. GNNExplainer’s main goal is to maximize the mutual information (MI) between the distribution of feasible subgraphs and the prediction of a graph neural network. Its optimization objective aims to identify a subgraph

G_{S} \subseteq G

along with associated features

X_{S} = x_{j} | v_{j} \in G_{S}

that are pertinent for explaining a target prediction through the mutual information measure MI, where H is an entropy term. Mathematically:

max_{G_{S}} MI (Y, (G_{S}, X_{S})) = H (Y) - H (Y ∣ G = G_{S}, X = X_{S})

(17)

GNNExplainer emphasizes the critical subgraph structures and node-level features that significantly impact detection. By analyzing the detection results using the proposed model, GNNExplainer effectively determines the network flow path of botnet nodes.

Figure 9 shows the interpretable results of P2P botnet graph nodes with normal/robot nodes, where the orange nodes are nodes that GNNExplainer considers more important for the model to predict this subgraph and the blue nodes represent nodes with less help in identifying the botnet.

According to the formula (17) above, GNNExplainer finds the above orange nodes to have an essential influence on the subgraph by maximizing the mutual information objective function between the graph neural network prediction and the distribution of feasible subgraphs. Their positions in the overall graph structure and node characteristics play a more critical role than other nodes in the model’s prediction of the entire subgraph. Intuitively, we find that the degree of these nodes is higher than that of other nodes, and they are usually in the core position of the whole network topology structure diagram. Therefore, nodes in critical positions in the network topology diagram (such as nodes with more connections) are significant for discovering botnets. This is because they can pass information between multiple sparsely connected neighborhoods with a large number of nodes along with GNN messaging.

The information obtained from GNNExplainer clarifies how learned node features and node connections contribute to the comprehension of suspected bot behavior. This explainable method effectively communicates the significance of corresponding nodes and edges in influencing the final detection result by learning and providing node features and edge masks. Specifically, GNNExplainer highlights highly correlated hosts and network flows by identifying nearby nodes that substantially contribute to detection results.

The explanation results from GNNExplainer indicate that, in identifying critical nodes within botnets, special attention should be given to those nodes situated at key positions within the network topology. In real-world networks, nodes of botnets are often distributed across different autonomous systems (AS). It is noteworthy that communication within each autonomous system tends to be relatively frequent, whereas communication between autonomous systems is usually sparse. However, the presence of botnets leads to an increase in communication behaviors between autonomous systems. Hence, the findings from GNNExplainer suggest that identifying key nodes in botnets should focus on those located in sparsely connected neighborhoods with a significant number of nodes across multiple autonomous systems. These nodes may play crucial roles in information transmission, as they are capable of conveying messages between different autonomous systems, thereby playing a pivotal role in the network. Therefore, through the explanations provided by GNNExplainer, we can more accurately identify and understand the topological structure of botnets, offering vital references for network security analysis and maintenance.

6.2. Threats to Validity

Internal Validity: Our experiments employed controlled variables and cross-validation to ensure the validity of the experimental results. However, the graph datasets we used may have introduced some unexpected topological noise, affecting the effectiveness of model training. Further efforts are needed to gradually examine the quality attributes of the data. Although the experimental results indicate that our model is effective, more work is needed to understand what types of nodes the model struggles to distinguish.

External Validity: The graph datasets we used are attribute-free graphs, so our model is based on specific botnet topologies for botnet detection. Therefore, the trained model can be generalized to botnet detection in real-world scenarios with similar topological structures. However, the model may fail to detect specific botnet topologies that are not present in the training dataset.

7. Conclusions

This work introduces a new approach to enhance botnet detection in distributed IoT environments. By utilizing clustered graph convolutional networks, our method effectively identifies botnet communities within large-scale IoT networks. This is achieved through innovative data preprocessing and graph partitioning technology, which facilitates distributed training. Our experimental results demonstrate that the proposed method not only significantly improves calculation accuracy and computing resource utilization but also showcases its feasibility for practical application. This underscores the method’s potential in safeguarding IoT deployments against cyber threats, thereby contributing novel insights to the realm of distributed IoT botnet detection and advancing the field of IoT security. The implementation of the proposed model lays a solid foundation for future research in this area and highlights its importance in enhancing the security of IoT networks and the detection of botnets. Looking ahead, we plan to deploy the model across more real-world scenarios. This expansion is crucial for bolstering the security of IoT networks and for the broader identification and understanding of botnets. However, it is important to note that our analysis, at this stage, is limited to three specific botnet datasets: C2, Chord, and P2P. Currently, the model has not been generalized to encompass a wider variety of botnet datasets. Additionally, the botnets in our dataset predominantly fall into the C2 or P2P categories, whereas real-world scenarios may involve hybrid botnets that combine different operational tactics. Given these considerations, further work is essential to validate the model’s performance in more diverse and complex real-world scenarios. This will involve not only expanding the variety of botnet datasets analyzed but also refining the model to effectively detect and mitigate hybrid botnet threats. Such advancements are of great significance for improving the security of IoT networks and for the ongoing development of robust, scalable solutions capable of combating the evolving landscape of cyber threats.

Author Contributions

Conceptualization, K.Q., L.Y. and X.L.; methodology, K.Q., X.L. and L.Y.; software, K.Q. and H.Y.; validation, H.Y., R.L. and K.Q.; formal analysis, H.Y. and K.Q.; investigation, W.C. and R.L.; resources, L.Y. and X.L.; data curation, H.Y.; writing—original draft preparation, K.Q.; writing—review and editing, W.C. and X.L.; visualization, H.Y.; supervision, L.Y.; project administration, X.L.; funding acquisition, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is in part supported by the National Key R&D Program of China (No. 2022YFB3104100), the National Science Foundation of China (No. 62102109), and the Major Key Project of PCL (No. PCL2021A09, PCL2021A02, PCL2022A03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. Three datasets were used in this paper: P2P, C2, and Chord. For these datasets, please visit https://zenodo.org/records/3689089 (accessed on 30 December 2023).

Conflicts of Interest

Author Kexiang Qian was employed by the company State Grid Smart Grid Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Shahzad, A.; Kim, Y.-G.; Elgamoudi, A. Secure IoT Platform for Industrial Control Systems. In Proceedings of the 2017 International Conference on Platform Technology and Service (PlatCon), Busan, Republic of Korea, 20–22 February 2017; pp. 1–6. [Google Scholar]
Truong, H.T.; Ta, B.P.; Le, Q.A.; Nguyen, D.M.; Le, C.T.; Nguyen, H.X.; Do, H.T.; Nguyen, H.T.; Tran, K.P. Light-weight federated learning-based anomaly detection for time-series data in industrial control systems. Comput. Ind. 2022, 140, 103692. [Google Scholar] [CrossRef]
Dhanaraju, M.; Chenniappan, P.; Ramalingam, K.; Pazhanivelan, S.; Kaliaperumal, R. Smart farming: Internet of things (IoT)-based sustainable agriculture. Agriculture 2022, 12, 1745. [Google Scholar] [CrossRef]
Tran-Dang, H.; Krommenacker, N.; Charpentier, P.; Kim, D.-S. The internet of things for logistics: Perspectives, application review, and challenges. IETE Tech. Rev. 2022, 39, 93–121. [Google Scholar] [CrossRef]
Hammad, M.; Abd El-Latif, A.A.; Hussain, A.; Abd El-Samie, F.E.; Gupta, B.B.; Ugail, H.; Sedik, A. Deep learning models for arrhythmia detection in IoT healthcare applications. Comput. Electr. Eng. 2022, 100, 108011. [Google Scholar] [CrossRef]
Dogra, A.K.; Kaur, J. Moving towards smart transportation with machine learning and internet of things (IoT): A review. J. Smart Environ. Green Comput. 2022, 2, 3–18. [Google Scholar] [CrossRef]
Geetha, B.; Kumar, P.S.; Bama, B.S.; Neelakandan, S.; Dutta, C.; Babu, D.V. Green energy aware and cluster based communication for future load prediction in IoT. Sustain. Energy Technol. Assess. 2022, 52, 102244. [Google Scholar] [CrossRef]
Saha, A.; Roy, M.; Chowdhury, C. IoT-based human activity recognition for smart living. In IoT Enabled Computer-Aided Systems for Smart Buildings; Springer: Cham, Switzerland, 2023; pp. 91–119. [Google Scholar]
Burhan, M.; Alam, H.; Arsalan, A.; Rehman, R.A.; Anwar, M.; Faheem, M.; Ashraf, M.W. A Comprehensive Survey on the Cooperation of Fog Computing Paradigm-Based IoT Applications: Layered Architecture, Real-Time Security Issues, and Solutions. IEEE Access 2023, 11, 73303–73329. [Google Scholar] [CrossRef]
NOKIA. Threat Intelligence Report 2023. Available online: https://www.nokia.com/networks/security-portfolio/threat-intelligence-report/ (accessed on 30 December 2023).
Antonakakis, M.; April, T.; Bailey, M.; Bernhard, M.; Bursztein, E.; Cochran, J.; Durumeric, Z.; Halderman, J.A.; Invernizzi, L.; Kallitsis, M. Understanding the Mirai Botnet. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 1093–1110. [Google Scholar]
Lange, T.; Kettani, H. On Security Threats of Botnets to Cyber Systems. In Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 7–8 March 2019; pp. 176–183. [Google Scholar]
Beigi, E.B.; Jazi, H.H.; Stakhanova, N.; Ghorbani, A.A. Towards Effective Feature Selection in Machine Learning-Based Botnet Detection Approaches. In Proceedings of the 2014 IEEE Conference on Communications and Network Security, San Francisco, CA, USA, 29–31 October 2014; pp. 247–255. [Google Scholar]
Zhang, B.; Li, J.; Chen, C.; Lee, K.; Lee, I. A Practical Botnet Traffic Detection System Using GNN. In Proceedings of the 13th International Symposium on Cyberspace Safety and Security (CSS 2021), Virtual Event, 9–11 November 2021; pp. 66–78. [Google Scholar]
Zhu, X.; Zhang, Y.; Zhang, Z.; Guo, D.; Li, Q.; Li, Z. Interpretability Evaluation of Botnet Detection Model Based on Graph Neural Network. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), New York, NY, USA, 2–5 May 2022; pp. 1–6. [Google Scholar]
Carpenter, J.; Layne, J.; Serra, E.; Cuzzocrea, A. Detecting Botnet Nodes via Structural Node Representation Learning. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5357–5364. [Google Scholar]
Karypis, G.; Kumar, V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 1998, 20, 359–392. [Google Scholar] [CrossRef]
Chiang, W.-L.; Liu, X.; Si, S.; Li, Y.; Bengio, S.; Hsieh, C.-J. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 257–266. [Google Scholar]
Shinan, K.; Alsubhi, K.; Alzahrani, A.; Ashraf, M.U. Machine learning-based botnet detection in software-defined network: A systematic review. Symmetry 2021, 13, 866. [Google Scholar] [CrossRef]
Mirmozaffari, M.; Yazdani, M.; Boskabadi, A.; Ahady Dolatsara, H.; Kabirifar, K.; Amiri Golilarz, N. A novel machine learning approach combined with optimization models for eco-efficiency evaluation. Appl. Sci. 2020, 10, 5210. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1996, 20, 273–297. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-means Clustering Algorithm. J. R. Stat. Soc. Ser. C 1979, 28, 100–108. [Google Scholar] [CrossRef]
Le, T.T.; Oktian, Y.E.; Kim, H. XGBoost for imbalanced multiclass classification-based industrial internet of things intrusion detection systems. Sustainability 2022, 14, 8707. [Google Scholar] [CrossRef]
Choi, H.; Lee, H.; Lee, H.; Kim, H. Botnet Detection by Monitoring Group Activities in DNS Traffic. In Proceedings of the 7th IEEE International Conference on Computer and Information Technology (CIT 2007), Aizu-Wakamatsu, Japan, 16–19 October 2007; pp. 715–720. [Google Scholar]
Gu, G.; Porras, P.A.; Yegneswaran, V.; Fong, M.W.; Lee, W. Bothunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation. In Proceedings of the USENIX Security Symposium, Boston, MA, USA, 6–10 August 2007; Volume 7, pp. 1–16. [Google Scholar]
Gu, G.; Zhang, J.; Lee, W. BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium, San Diego, CA, USA, 10–13 February 2008. [Google Scholar]
Gu, G.; Perdisci, R.; Zhang, J.; Lee, W. Botminer: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. In Proceedings of the USENIX Security Symposium, San Jose, CA, USA, 28 July–1 August 2008. [Google Scholar]
Azab, A.; Alazab, M.; Aiash, M. Machine Learning Based Botnet Identification Traffic. In Proceedings of the 2016 IEEE Trustcom BigDataSE ISPA, Tianjin, China, 23–26 August 2016; pp. 1788–1794. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Bibi, M.; Hussain Qaisar, Z.; Aslam, N.; Faheem, M.; Akhtar, P. TL-PBot: Twitter bot profile detection using transfer learning based on DNN model. Eng. Rep. 2024, e12838. [Google Scholar] [CrossRef]
Luqman, M.; Faheem, M.; Ramay, W.Y.; Saeed, M.K.; Ahmad, M.B. Utilizing Ensemble Learning for Detecting Multi-Modal Fake News. IEEE Access 2024, 12, 15037–15049. [Google Scholar] [CrossRef]
McDermott, C.D.; Majdani, F.; Petrovski, A.V. Botnet detection in the internet of things using deep learning approaches. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Popoola, S.I.; Adebisi, B.; Hammoudeh, M.; Gui, G.; Gacanin, H. Hybrid Deep Learning for Botnet Attack Detection in the Internet-of-Things Networks. IEEE Internet Things J. 2020, 8, 4944–4956. [Google Scholar] [CrossRef]
Homayoun, S.; Ahmadzadeh, M.; Hashemi, S.; Dehghantanha, A.; Khayami, R. Botshark: A Deep Learning Approach for Botnet Traffic Detection. In Cyber Threat Intelligence; Springer: Cham, Switzerland, 2018; pp. 137–153. [Google Scholar]
Luo, F.; Du, B.; Zhang, L.; Zhang, L.; Tao, D. Feature Learning Using Spatial-Spectral Hypergraph Discriminant Analysis for Hyperspectral Image. IEEE Trans. Cybern. 2018, 49, 2406–2419. [Google Scholar] [CrossRef]
Peng, J.; Sun, W.; Du, Q. Self-paced Joint Sparse Representation for the Classification of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1183–1194. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
Zhou, J.; Xu, Z.; Rush, A.M.; Yu, M. Automating botnet detection with graph neural networks. arXiv 2020, arXiv:2006.06344. [Google Scholar]
Chowdhury, S.; Khanzadeh, M.; Akula, R.; Zhang, F.; Zhang, S.; Medal, H.; Marufuzzaman, M.; Bian, L. Botnet Detection Using Graph-Based Feature Clustering. J. Big Data 2017, 4, 14. [Google Scholar] [CrossRef]
Nguyen, H.-T.; Ngo, Q.-D.; Le, V.-H. A Novel Graph-Based Approach for IoT Botnet Detection. Int. J. Inf. Secur. 2020, 19, 567–577. [Google Scholar] [CrossRef]
Wang, W.; Shang, Y.; He, Y.; Li, Y.; Liu, J. Botmark: Automated Botnet Detection with Hybrid Analysis of Flow-Based and Graph-Based Traffic Behaviors. Inf. Sci. 2020, 511, 284–296. [Google Scholar] [CrossRef]
Lo, W.W.; Kulatilleke, G.; Sarhan, M.; Layeghy, S.; Portmann, M. XG-Bot: An Explainable Deep Graph Neural Network for Botnet Detection and Forensics. Internet Things 2023, 22, 100747. [Google Scholar] [CrossRef]
Zhao, J.; Liu, X.; Yan, Q.; Li, B.; Shao, M.; Peng, H. Multi-Attributed Heterogeneous Graph Convolutional Network for Bot Detection. Inf. Sci. 2020, 537, 380–393. [Google Scholar] [CrossRef]
Nguyen, H.-T.; Ngo, Q.-D.; Nguyen, D.-H.; Le, V.-H. Psi-Rooted Subgraph: A Novel Feature for IoT Botnet Detection Using Classifier Algorithms. ICT Express 2020, 6, 128–138. [Google Scholar] [CrossRef]
360Netlab. Pink, a Botnet That Competed with the Vendor to Control the Massive Infected Devices. 2021. Available online: https://blog.netlab.360.com/pink-en/(accessed on 30 December 2023).
Bichot, C.-E.; Siarry, P. Graph Partitioning; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Xunwei, H. What Is a Botnet? 2023. Available online: https://info.support.huawei.com/info-finder/encyclopedia/en/Botnet.html (accessed on 30 December 2023).
Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An Empirical Comparison of Botnet Detection Methods. Comput. Secur. 2014, 45, 100–123. [Google Scholar] [CrossRef]
Brody, S.; Alon, U.; Yahav, E. How attentive are graph attention networks? arXiv 2021, arXiv:2105.14491. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Ying, Z.; Bourgeois, D.; You, J.; Zitnik, M.; Leskovec, J. GNNExplainer: Generating explanations for graph neural networks. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2019; Volume 32. [Google Scholar]

Figure 1. Framework of botnet attack activity.

Figure 2. Centralized botnet.

Figure 3. Decentralized botnet.

Figure 4. Hybrid botnet.

Figure 5. Overview of our method.

Figure 6. Comparison of cluster and non-cluster methods on the C2 dataset.

Figure 7. Comparison of accuracy of different feature enhancement methods with different numbers of model layers in three datasets.

Figure 8. The loss curve of the model on the training set and the validation set of the P2P dataset.

Figure 9. Botnet topology, where orange nodes represent the critical nodes.

Table 1. Botnet dataset statistics for C2.

Data Split	Graphs	Avg Nodes	Avg Edges	Avg Botnet Nodes
Train	768	143,895	813,237	3211
Val	96	143,763	812,955	3234
Test	96	144,051	814,003	3175

Table 2. Botnet dataset statistics for P2P.

Data Split	Graphs	Avg Nodes	Avg Edges	Avg Botnet Nodes
Train	768	143,895	162,3217	3090
Val	96	143,763	162,2620	3093
Test	96	144,051	1,624,948	3095

Table 3. Botnet dataset statistics for Chord.

Data Split	Graphs	Avg Nodes	Avg Edges	Avg Botnet Nodes
Train	768	143,895	150,2748	10,000
Val	96	143,763	150,2284	10,000
Test	96	144,051	150,4310	10,000

Table 4. Hyperparameter values used in our approach.

Hyperparameter	Values
Layers	[3, 4, 5, 6, 7, 8, 9, 10, 12, 16]
Hidden Channels	128
Clusters	[100, 1000]
Batch Size	[5, 10, 20, 25, 50]
Learning Rate	$1 \times 10^{- 3}$
Weight Decay	$5 \times 10^{- 4}$
Activation Function	ReLU
Optimizer	Adam

Table 5. Comparisons of evaluations on different datasets.

Method	Dataset	Precision	Accuracy	F1 Score	Recall	Avg_Time (s)
GATv2 [50]	C2	0.9904	0.99950	0.9942	0.9982	310.50
	P2P	0.9920	0.99965	0.9959	0.9998	310.25
	Chord	0.9576	0.99239	0.9715	0.9864	287.40
GraphSAGE [51]	C2	0.9956	0.99975	0.9971	0.9987	133.00
	P2P	0.9963	0.99971	0.9966	0.9970	147.33
	Chord	0.9828	0.96691	0.8348	0.7616	152.00
GCN [52]	C2	0.9764	0.99875	0.9857	0.9956	152.00
	P2P	0.9827	0.99906	0.9890	0.9955	150.75
	Chord	0.9878	0.99789	0.9919	0.9961	152.38
Our Approach	C2	0.9966	0.99978	0.9974	0.9983	177.42
	P2P	0.9971	0.99929	0.9973	0.9983	176.58
	Chord	0.9970	0.99977	0.9974	0.9978	125.67

Table 6. Effect of different cluster numbers on our approach.

Number of Clusters	Batch Size	Precision	Accuracy	F1 Score	Recall
100	10	0.99683	0.9997	0.99671	0.99659
	20	0.99694	0.9997	0.99673	0.99664
	25	0.99768	0.99981	0.99794	0.99821
1000	5	0.99777	0.99916	0.99039	0.98326
	10	0.99769	0.99949	0.99411	0.99063
	20	0.99778	0.99959	0.99548	0.99322
	50	0.99726	0.99965	0.99607	0.99491

Table 7. Effect of different layer numbers on our approach.

Number of Layers	Precision	Accuracy	F1 Score	Recall	Avg_Time(s)
6	0.99729	0.99980	0.99760	0.99792	157.43
7	0.99600	0.99976	0.99717	0.99836	166.65
8	0.99829	0.99972	0.99680	0.99532	175.64
9	0.99819	0.99961	0.99550	0.99284	183.40
10	0.99850	0.99954	0.99460	0.99077	194.00
12	0.99846	0.99963	0.99574	0.99305	216.40
16	0.99890	0.99936	0.99244	0.98616	252.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, K.; Yang, H.; Li, R.; Chen, W.; Luo, X.; Yin, L. Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning. Appl. Sci. 2024, 14, 1615. https://doi.org/10.3390/app14041615

AMA Style

Qian K, Yang H, Li R, Chen W, Luo X, Yin L. Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning. Applied Sciences. 2024; 14(4):1615. https://doi.org/10.3390/app14041615

Chicago/Turabian Style

Qian, Kexiang, Hongyu Yang, Ruyu Li, Weizhe Chen, Xi Luo, and Lihua Yin. 2024. "Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning" Applied Sciences 14, no. 4: 1615. https://doi.org/10.3390/app14041615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning

Abstract

1. Introduction

2. Related Work

2.1. ML-Based Approaches

2.2. Graph-Based Approaches

3. Background

3.1. Botnet Architecture

3.2. Graph Convolutional Network

3.3. METIS Algorithm

4. Proposed Method

4.1. Data Preprocessing

4.2. Feature Learning

4.3. Feature Regeneration

5. Evaluation

5.1. Dataset

5.2. Evaluation Metrics

5.3. Effectiveness

6. Discussion

6.1. Botnet Topology

6.2. Threats to Validity

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI