1. Introduction
Blockchain is a distributed ledger that records the financial transactions that take place within decentralized networks [
1]. With the emergence of Ethereum smart contracts, blockchain has entered the era of programmable finance [
2]. In recent years, blockchain technology has surpassed its original scope in finance and achieved success in a variety of industries and sectors. For instance, within the energy sector, blockchain is transforming conventional approaches to energy management and distribution [
3]. In the field of agriculture, blockchain technology offers novel solutions for ensuring data integrity and traceability in supply chains, thereby enhancing food safety and traceability of agricultural products [
4]. In healthcare, it is contributing to better patient care and efficient medical record management by securely storing and sharing patient data [
5]. Furthermore, in the field of e-government, blockchain serves as a means to improve the transparency and efficiency of public services, which leads to a more effective and secure management of these services [
6]. The success of blockchain technology is closely linked to smart contracts, which were initially proposed for the dissemination, validation, and enforcement of contracts in an informational manner [
7]. Compared to traditional contracts, smart contracts enabled users to codify their agreements and trust relations by providing automated transactions without the supervision of a central authority [
8].
With the widespread application of smart contracts in finance, industry, commerce, and other fields, the security of smart contracts is becoming increasingly important [
9]. If there are vulnerabilities in the smart contract, attackers can exploit these smart contract vulnerabilities to maliciously penetrate blockchain networks and steal tens of millions of dollars in cryptocurrency. This not only has implications for the reliability of the blockchain system, but also serious for the interests of smart contract holders. There is a typical event that an attacker exploited a reentrancy vulnerability to successfully abscond with millions of Ether called the DAO attack [
10]. This event seriously undermined the credibility of the blockchain system. Events based on smart contract vulnerability attacks are still occurring continuously [
11]. Undoubtedly, these attacks highlight the urgent need for smart contract vulnerability detection. By detecting potential vulnerabilities in a smart contract, the security and stability of the blockchain system can be effectively improved [
12]. Therefore, how to design efficient vulnerability detection models is an important problem and challenge that needs to be solved.
Traditional methods to detect vulnerabilities in smart contracts suffer from the problem of path explosion, which leads to a large increase in the execution time and memory overhead. Moreover, these methods cannot effectively parse complex data structures of program, which may result in higher false positive and false negative rates. In recent years, researchers have used deep learning techniques to improve the accuracy of detecting vulnerabilities in smart contracts [
13]. By segmenting the source code slice of smart contracts into code snippets and using the BLSTM model, Qian et al. designed ReChecker to detect reentrancy vulnerabilities [
14]. However, the defined slice criteria are not comprehensive, and the generated code snippets may implicitly ignore critical segments. In addition, for longer code snippets, BLSTM may not efficiently capture global semantic information [
15], and bring about high false positives. Based on Convolutional Neural Network (CNN) and Bidirectional Gated Recurrent Unit (BiGRU) network models, Zhang et al. devised a CBGRU hybrid deep learning model to detect vulnerabilities [
16]. However, due to CNN’s inability to fully extract local syntax and semantic detail features and BiGRU’s limited perception of global structural features, the detection results can lead to false negatives. To detect reentrancy vulnerabilities, Wu et al. proposed a peculiar approach for extracting critical data flow graphs from the smart contract code [
17]. However, this method cannot detect other types of vulnerabilities and has a lack of generality.
To overcome the above shortcomings, we proposed a novel Multi-Scale Encoder Vulnerability Detection (MEVD) approach. Firstly, due to excessive semantic information that is irrelevant to vulnerability features in the set of the original ones [
18], it may affect the correlation of the global semantic information and the coherence of the contextual structure. To address this problem, we have designed a Surface Feature Encoder (SFE) based on the gating mechanism. The unique mechanism can control the flow of critical semantic information, suppress noise data, and enhance the global semantic information of the features. Second, by combining a Base Transformer Encoder (BTE) based on Transformer blocks and Detail CNN Encoder (DCE), we devise a dual-branch structured encoder to comprehensively explore vulnerability patterns from global structure and local detail features, where the BTE is used to extract long-range dependencies and global structural features in the code and the DCE focuses more on the analysis and exploration of the syntax and semantics within the code to extract local detail features. Third, to improve the quality of the code vector representation, we introduce Deep Residual Shrinkage Networks (DRSNs) to resist the noise generated by vulnerability-unrelated variables. Using deep learning to automatically determine the threshold, the model can focus more on the vulnerability-related characteristics. Due to the large amount of irrelevant information may exist in the set of contract codes [
19], in the data preprocessing phase, we intend to use the Vulnerability Syntax-driven Slice (VSS) to eliminate redundant information. VSS provides a more comprehensive set of slice criteria for identifying different vulnerability characteristics, which can maintain data dependency and control relevance of code statements. We conducted extensive experiments on three different types of vulnerability datasets: reentrancy, timestamp dependency, and infinite loop vulnerabilities. The experimental results show that our approach outperforms state-of-the-art methods in terms of detection performance. The major contributions of this paper are as follows:
We present a vulnerability syntax-driven slice method, which can simplify the contract code by removing statements unrelated to vulnerability characteristics and preserving the data and controls dependencies in statements.
We propose a novel MEVD approach to detect vulnerabilities in smart contracts. The global structure and local detail features are captured by multi-scale encoders, respectively to compensate for the lack of feature extraction capability of a single model.
We have compared MEVD with state-of-the-art vulnerability detection methods. Experimental results show that the proposed MEVD outperforms existing methods for detecting reentrancy, timestamp dependency, and infinite loop vulnerabilities.
This paper is organized as follows. First, in
Section 2, we discuss the background knowledge of smart contract vulnerabilities. Next, in
Section 3, we introduce the related work. In
Section 4, we provide a detailed explanation of our proposed method.
Section 5 presents our experimental results. Finally, we conclude the paper by discussing future directions in
Section 6.
3. Related Work
In this section, we will discuss the current smart contract vulnerability detection methods, including traditional detection methods and deep learning-based methods. Then, we will introduce some deep learning models.
Traditional Detection Methods: Traditional methods for detecting vulnerabilities in smart contracts primarily include methods such as symbolic execution, formal verification [
23], and fuzzing [
13]. Oyente leverages the static symbolic execution technique to detect potential vulnerabilities in smart contracts [
24]. Mythril combines static symbolic execution, taint analysis, and control flow checking to further improve the accuracy of vulnerability detection [
25]. Slither can detect vulnerabilities by transforming contract code into an intermediate representation [
26]. ContractFuzzer is the first vulnerability detection framework to apply fuzz testing techniques to smart contracts [
22].
Deep Learning Detection Methods: In recent years, the deep learning technology has achieved remarkable success in various fields. Currently, there are a number of deep learning methods for vulnerability detection. ReChecker [
14] employs code slicing of smart contract source code and uses a BLSTM-ATT sequence model to detect reentrancy vulnerabilities. To fully capture key statements associated with the vulnerability, Yu et al. had made improvements to the slicing technique by introducing the concept of Vulnerability Candidate Slices (VCSs), which feed into various temporal neural network models for vulnerability detection [
27]. Cai et al. constructed a novel contract graph, incorporating Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs), and Program Dependency Graphs (PDGs) and employed graph neural network models for vulnerability detection [
15]. Liu et al. introduced a new method for representing code graphs, AME, which classifies nodes in the graph into core and normal nodes, then extracts deep graph features and fuses global graph features with local expert patterns to improve accuracy [
28]. VulnSense combines three types of features from smart contracts and uses a multi-modal learning approach for a comprehensive vulnerability detection method [
29].
Our proposed MEVD approach is based on multi-scale encoders. Compared with ReChecker’s single feature extraction, MEVD can extract the semantic and syntactic features of the code more comprehensively, effectively combining global structures with local details. In addition, compared with the VCS method proposed by Yu et al., we have introduced the VSS, which contains more comprehensive vulnerability slicing criteria. The VSS can take into consideration the characteristics of different types of vulnerabilities in depth, and the extracted code snippets contain richer semantic information, which effectively reduces false positives caused by insufficient crucial information in the vulnerability code snippets. Finally, MEVD has demonstrated excellent detection performance compared to current state-of-the-art methods through extensive experimental validation.
Deep Learning Models: In recent years, many outstanding network models have emerged, such as Long Short-Term Memory (LSTM [
30]), Graph Convolutional Networks (GCNs [
31]), and CNNs. These models have achieved significant success in their respective domains. In addition to these basic models, more complex model structures have emerged in recent years, such as the Transformer model [
32]. The core components of the Transformer model are encoders and decoders that use self-attention mechanisms to establish global correlations within input sequences, thereby better capturing contextual structural information. The Transformer has made significant breakthroughs in natural language processing, particularly machine translation. By refining the Transformer’s encoder and decoder modules, Restormer can better capture interactions between distant pixels [
33]. RepVGG introduces a multi-branch structure to extract richer features [
34]. DRSN is a network structure to improve the feature learning capability for highly noisy vibration signal analysis, enabling more accurate fault diagnosis [
35]. CDDFuse designs a novel autoencoder to address the problem of multi-modal image fusion [
36].
In the field of smart contract vulnerability detection, we have summarized traditional methods (such as symbolic execution, formal verification, and fuzz testing) and deep learning methods (such as ReChecker and AME). Traditional methods emphasize static symbolic execution techniques, while deep learning methods include various models such as BLSTM-ATT, graph neural networks, and self-attention mechanisms. Finally, we introduce outstanding deep learning models, such as LSTM, Transformer, and Restormer, which have made significant progress in various domains. These methods play an important role in enhancing the security of smart contracts.
4. Method
In this section, we introduce a novel method for slicing smart contract code and present the design of the MEVD framework.
4.1. Overview
As shown in
Figure 1, the MEVD proposed in this paper consists of three phases: data preprocessing phase, model training phase, and vulnerability detection phase. Firstly, in the data preprocessing phase, we remove redundant information from the smart contract code, such as blank lines, comments, and non-ASCII characters. Then, to simplify the contract, the source code is sliced into VSS and uniform naming rules for normalization. Next, we use the Word2Vec word embedding technique [
37] to convert code slices into vector forms suitable for input to the model in the model training phase. The model consists of multi-scale encoders that are used to enrich the semantic information of the feature vectors and capture global structural and local detail features that are used to improve the performance of vulnerability detection. Finally, in the vulnerability detection phase, to obtain a vector containing global structure and local detail features, the two feature vectors processed by the multiple encoders are fed into a fusion layer. These feature vectors are then fed into a fully connected layer and a SoftMax activation function is used to output vulnerability detection probabilities.
4.2. Data Preprocessing
4.2.1. Code Slice Criteria
It has been discovered that there are a number of code statements in the contract source code that are unrelated to vulnerabilities [
38]. These irrelevant code statements can have an impact on vulnerability detection performance. Therefore, it is important to extract key code statements accurately. However, existing methods for the definition of slice criteria have certain limitations when dealing with smart contract code, which can result in the loss of implicit information about vulnerabilities. Therefore, as shown in
Table 1, we propose an entirely new set of slice criteria that covers three types of vulnerabilities.
Taking the reentrancy vulnerability as an example, we have defined several slice criteria. Firstly, code statements that include call.value () and fallback () are considered to be critical code statements that are susceptible to reentrancy vulnerabilities, because they allow smart contracts to undergo state transitions as they interact with other contracts. This state transition could result in the contract continuing to execute after an external call, which may result in potential vulnerabilities. In addition, fund transfer functions that include call.value calls require special attention. These functions are widely used in the transfer and invocation processes of smart contracts and serve as conditions for launching reentrancy attacks. Then, we have also focused on variables related to user balances, as these variables are equally important in fund transfers. Similarly, we have also defined slice criteria for the other two types of vulnerabilities. This novel set of slice criteria enables more comprehensive vulnerability-related information to be captured within the contract code, avoiding the omission of critical implicit details.
4.2.2. Code Slice Generation
To address the impact of redundant information in smart contract source code on the accuracy of vulnerability detection, we have proposed a novel vulnerability syntax-driven slice method, as shown in
Figure 2.
Taking a smart contract containing
call.value () as an example, in step 2, we first remove comments, non-ASCII characters, and blank lines from the smart contract. Removing this irrelevant information does not affect the results. Next, in step 3, based on the vulnerability type associated with the code, we identify the key statements of the code according to the slice criteria in
Table 1. For example, the ninth line of code contains the keyword
call.value (), and the sixth line operates on the user’s balance. Therefore, we consider these to be key sensitive statements. The code statements relevant to the key statements are then extracted and organized into code snippets by analyzing the control and data dependencies between statements and variables. Then, in order to normalize the code snippets and avoid the potential impact of different naming rules, in step 4, we map user-defined variables to symbolic names (e.g., “VAR1”, “VAR2”) and user-defined functions to symbolic names (e.g., “FUN1”, “FUN2”). This is the final generated VSS. Finally, to convert code snippets into a format that can be accepted by neural networks, we use Word2Vec technology to convert code snippets into vector representations.
4.3. MEVD Model
In this section, we will introduce the model structure of MEVD. The MEVD model consists of three modules: the surface feature encoder, the dual-branch Transformer-CNN encoder, and the deep residual shrinkage network.
4.3.1. Surface Feature Encoder
We used the Word2Vec algorithm, which can convert tokens to vectors, but by converting code directly to vectors as input, semantic features within the code may be ignored. To solve this problem, we designed SFE, which can enhance the relevance of global semantics within the features and suppress the influence of irrelevant information, thus preserving richer syntactic and semantic information. The detailed design of the SFE module is shown in
Figure 3.
SFE leverages three design strategies to enhance semantic features: a gating mechanism, dilated convolutions, and depth-wise convolutions. Gating mechanisms are typically used to control the flow of information by enhancing attention to critical information. Specifically, the gating mechanism is formulated as the element-wise product of two parallel paths of linear transformation layers. One path is activated by the non-linear activation function GELU, while the other path applies the reverse operation to the vector. The outputs of these two paths are then subjected to element-wise multiplication, allowing for information from one path to modulate the strength of the other, thereby enhancing focus on important semantic details and reducing the impact of irrelevant variables. On the other hand, the SFE employs dilated convolutions and depth-wise convolutions. Dilated convolutions extend the receptive field to cover a wider range of input information, helping to capture different scales and structural features within the code. Meanwhile, depth-wise convolutions focus on the local context and detailed information in the input data. They efficiently capture local interdependencies between different channels in the input data, helping the SFE to enrich the semantic information within the features. Given the input vector (feature matrix)
,
L represents the length of the token sequence and
D represents the dimension of the token embedding. The SFE equation is as follows:
where
is the 3 × 3 dilated convolution,
is the 3 × 3 depth-wise convolution, ⊙ denotes element-wise multiplication,
represents the GELU non-linearity, and LN is the layer normalization.
Reverse denotes the reversal of rows in a feature matrix, e.g., swapping the first row with the last. In summary, the design of the SFE allows each level to be concerned with details that are complementary to other levels, focusing on enriching features with code semantic context information.
4.3.2. Dual-Branch Encoder
The dual-branch encoder consists of two parallel encoders: the base Transformer encoder and the detail CNN encoder. These two modules are used to further extract vector features after the SFE. The BTE is used to effectively capture global structural information, allowing it to capture dependencies between code statements and contextual logical structural features. The DCE, on the other hand, focuses on exposing crucial latent features within the code, while preserving more syntactic and semantic details. This dual-branch design strategy allows the model to simultaneously consider global structure and fine-grained features, leading to a more comprehensive understanding of smart contract code and significantly improving the accuracy and performance of vulnerability detection.
The BTE consists of multiple self-attention layers, with each self-attention layer containing multi-head self-attention and feedforward networks. It uses the multi-head self-attention feature of the Transformer encoder to capture contextual information and long-range dependencies within code sequences, thereby extracting richer global structural features. The multi-head self-attention mechanism serves as the core of BTE, allowing it to establish correlations between different positions in the input sequence and effectively capture contextual information. The structure of the BTE is shown in
Figure 4.
BTE achieves parallel computation of multiple attentional heads, which allows the model to simultaneously focus on information from different positions within the input sequence. It then combines the outputs of the different heads using weighted fusion to obtain a more comprehensive feature representation. The calculation process of the BTE encoder is as follows:
Here,
is obtained using positional encoding operations,
Q represents the query vector,
K represents the key vector, and
V represents the value vector. Each vector undergoes a linear transformation, with
,
, and
representing the weight matrices for these transformations. Finally,
is obtained by Equation (7):
Then, the is obtained by residual connection and layer normalization. Finally, is processed through a feedforward neural network and residual connection to obtain feature vectors with global structure and semantic information. This design enables the model to better understand the overall structure and global semantic relationships of smart contract code and effectively handle complex smart contracts.
- b.
Detail CNN Encoder (DCE)
BTE focuses primarily on capturing global semantic information and structural features, while DCE complements BTE by focusing on extracting more intricate and complex features. DCE has the ability to delve deep into the latent syntactic, semantic, and local features of the code, further enriching the feature representation of the code vectors. DCE consists of a multi-branch convolutional structure as shown in
Figure 5, consisting of three branches: a 3 × 3 dilated convolution layer with Batch Normalization (BN), a 1 × 1 dilated convolution layer with BN, and an independent BN layer. Dilated convolution allows different dilation rates to be set, providing a larger receptive field without increasing the number of parameters. This helps the model to better understand the structure and relationships within the code, thus capturing more intricate details. In addition, the different kernel sizes in these convolutions allow the DCE to capture information at different scales. Finally, by fusing outputs from different branches, it integrates information from different scales and levels of detail. This helps to improve the model’s perception of fine-grained details and to combine features from different hierarchical levels for a better understanding of the detailed features of the code.
In this research, we utilize symbols to represent key concepts.
represents the convolution kernel for a 3 × 3 dilated convolution layer, while
represents the convolution kernel for a 1 × 1 dilated convolution layer. Additionally, we use
,
,
, and
to respectively denote the accumulated mean, standard deviation, learned scaling factor, and bias for each branch in the BN layer. We have also defined symbolic representations for the input and output.
represents the input, and
represents the output. The DCE equation is as follows:
In summary, the use of DCE enhances the model’s ability to perceive and capture these intricate and complex features, thereby helping the model to better distinguish between vulnerable code and normal code.
4.3.3. Deep Residual Shrinkage Network
The DRSN module integrates soft thresholding and deep learning, introducing a method for adaptive thresholding [
35]. Through DRSN, the model can effectively reduce irrelevant information during feature learning. The calculation of the soft threshold is as follows:
The soft thresholding calculation is shown in
Figure 6b, where
represents the input features,
represents the output features, and
is the threshold, which is a positive parameter. As can be seen from Equation (14), the essence of soft thresholding is to discard features with smaller absolute values and shrink features with larger absolute values, thereby reducing information unrelated to the current task.
The structure of DRSN is shown in
Figure 6c. First, the input vector passes two convolution layers, each accompanied by normalization layers and Relu activation functions. The output of the convolution layer is a matrix and is denoted as
, where
is the number of token sets, and K is the number of convolution kernels. The resulting matrix undergoes an absolute value operation, followed by global average pooling to generate a one-dimensional vector, denoted as
:
where
denotes the absolute value of the
row feature in
. Subsequently,
is fed through a fully connected layer, followed by batch normalization, a Relu activation function, and a second fully connected layer. Furthermore, a sigmoid activation function is used to scale the threshold parameter for each channel in the range (0, 1), resulting in the scaling parameter
, and the threshold
can be calculated using Equation (17):
After DRSN processing, this results in eliminating residual redundant information and ensuring that the feature vectors retain only crucial information related to the vulnerability characteristics. Additionally, this helps to improve the performance of the model, enabling it to better distinguish between vulnerable and non-vulnerable samples, thereby increasing the accuracy and effectiveness of vulnerability detection.
4.4. Vulnerability Detection
In the model training phase, we obtain two feature vectors: one from the BTE, denoted as vector
B and another from the DCE, denoted as vector
D. These two vectors are individually subjected to soft thresholding processing through DRSN, resulting in processed feature vectors
and
. To concatenate these two vectors, we flatten these two vectors as 1D vectors and we concatenate them to
, which is then fed into a fully connected layer and a Relu activation function for non-linear transformation. Finally, the transformed vector is fed into a SoftMax layer to determine whether the code snippet contains vulnerabilities. The equations are as follows:
Here, represents the predicted probability, FC represents the fully connected layer, concat signifies vector concatenation operation, and denotes the SoftMax function, used to transform real-valued vectors into probability vectors, indicating the probabilities of each category in a classification problem.
The MEVD model proposed in this paper is a framework that integrates several different functional encoders. In addition, we introduce a novel code-slicing method for data preprocessing. Through an extensive series of experiments, the method proposed in this paper demonstrates outstanding performance in the smart contract vulnerability detection. In the following sections, we present the experimental results to comprehensively substantiate the effectiveness of our proposed approach.
5. Experiments
In this section, we introduced the datasets and configuration parameters used in the experiments. We provided a detailed description of the experimental procedures and presented the experimental results. To evaluate the performance of our proposed method, we answered the following questions:
RQ1: Can the proposed method effectively detect reentrancy, infinite loop, and timestamp dependency vulnerabilities? How does its accuracy, precision, recall, and F1-score compare to state-of-the-art vulnerability detection methods?
To answer this question, we compared state-of-the-art detection tools. Through these comparisons, we were able to assess the performance of our approach within the existing technology.
To answer this question, we divided the dataset into two parts: the original dataset and the Vulnerability Syntax-Driven Slice (VSS) dataset. We conducted experiments using these two datasets as model inputs and compared their performance separately.
To answer this question, we conducted experiments to compare the performance of models with different combinations of components in terms of metrics such as accuracy, precision, recall, and F1-score.
5.1. Datasets
Our research dataset covers three types of vulnerabilities: reentrancy vulnerabilities, timestamp dependency vulnerabilities, and infinite loop vulnerabilities. The primary sources of these datasets include: (i) The SmartBugs Wild dataset consists of 47,398 Solidity language files containing a total of approximately 203,716 contracts with known vulnerabilities [
39]. (ii) The ESC (Ethereum Smart Contracts) dataset consists of 307,396 smart contract functions extracted from 40,932 smart contracts. (iii) In addition, we use the datasets published by Qian et al. [
40].
5.2. Experimental Settings
All experiments were performed on a computer equipped with 32 GB RAM and a GPU at 1080Ti. We implemented our method using the Keras and TensorFlow frameworks. In our experiment, 80% of the contracts form the training set, and the remaining 20% form the test set. The parameters used in the model are detailed in
Table 2.
5.3. Evaluation Metrics
To measure the performance of our approach, we use the following four widely used evaluation metrics:
: It represents the proportion of correctly predicted samples out of the total number of samples and measures the overall accuracy of the model’s predictions.
: It represents the proportion of true positives among those predicted to be positive by the model and measures the accuracy of the model in predicting positive samples.
: It represents the proportion of true positives among the actual positive samples and measures the ability of the model to detect positive samples.
: This is a composite score that combines precision and recall to assess the overall performance of the model.
5.4. Results Analysis
Result for Answering RQ1: To evaluate the performance of MEVD in detecting these three types of vulnerabilities, we compared it with deep learning-based approaches (ReChecker [
14], DR-GCN [
40], TMP [
40], CGE [
21]). In
Table 3, we show the comparison with state-of-the-art deep learning-based methods. These comparisons allow us to assess the performance of our proposed MEVD in vulnerability detection.
Based on the results presented in
Table 3 and
Figure 7, we can compare the performance of different models in terms of detection. The following conclusions can be drawn from these data: Firstly, the ReChecker model shows relatively poor detection performance compared to the other models, particularly in the context of identifying infinite loop vulnerabilities, where its accuracy is only 68.79%. However, when we use the MEVD model for detection, the accuracy improves significantly by 18.15%. It is worth noting that our approach not only outperforms in detecting infinite loop vulnerabilities, but also provides satisfactory performance improvements in other metrics. For example, in detecting timestamp dependency vulnerabilities, the accuracy of the MEVD model is 3.09% higher than that of the graph neural network-based CGE. In addition, the MEVD model achieves an accuracy of 92.13% for the detection of reentrancy vulnerabilities, which is well above the performance of other methods.
By comparing the experimental results, we come to a clear conclusion: the MEVD shows outstanding performance in different types of vulnerability detection. The MEVD shows significant performance advantages over state-of-the-art deep learning-based detection approaches.
Result for Answering RQ2: To evaluate the effectiveness of our proposed code slicing method, we divided the dataset into two parts: the unprocessed original dataset and the VSS dataset (VSS + Dataset). We then performed experiments on these two datasets separately, as shown in
Table 4. This approach allows us to evaluate the impact of code slicing on model performance.
By analyzing the results in
Table 4, we observe that running experiments on the sliced dataset, as opposed to the unprocessed dataset, significantly improves the performance of the models. For example, in the case of reentrancy vulnerability detection, experiments conducted on the sliced dataset yielded an accuracy of 92.13%, which is a significant improvement of 5.81% compared to the unprocessed dataset. At the same time, in the detection of vulnerabilities related to timestamps, the F1-score reaches 91.04%, which represents an improvement of 4.8%. By comparing the experimental results, we arrive at a definitive conclusion: The slice criteria defined by our approach are significant in covering crucial statements within the contract code. Furthermore, by extracting sliced code using control flow and data flow methods, we can expose critical code structure information and deepen the relationships between semantics. This helps models learn vulnerability patterns more effectively.
Result for Answering RQ3: In order to evaluate the effectiveness of different modules in the MEVD, we performed substitution experiments between modules. Specifically, we performed the following experiments individually: (1). Removal of the SFE module only, known as the MEVD-S model. (2). Replacement of the Transformer-CNN dual-branch encoder module with a CNN model, called the MEVD-T model. (3). Removal of the DRSN module only, called the MEVD-D model.
Table 5 shows the experimental results for these different modules. By comparing these results, we can determine the impact of the different modules on the model performance.
From an analysis of the experimental results in
Table 5, we obtain a clear conclusion: when various modules are removed, the experimental performance of all models generally experience a decrease. Particularly noteworthy is the MEVD-T model, which shows a significant decrease in F1-score of 8.29% compared to the MEVD model for timestamp dependency vulnerability detection. In the case of reentrancy vulnerability detection, the accuracy of the MEVD-T model is only 84.31%, a decrease of 7.82% compared to the MEVD model, with other evaluation metrics also falling short of the MEVD model. This phenomenon is due to the limitation of the CNN models in adequately capturing the code relationships, thus failing to extract vulnerability features effectively. In contrast, our constructed Transformer CNN dual-branch encoder module comprehensively extracts the code global structure and fine-grained syntax semantic features, and fuses them into multi-scale feature vectors. This enables the model to learn vulnerability patterns, significantly improving detection performance. Further analysis of the MEVD-S and MEVD-D experimental results shows that the SFE and DRSN modules enhance semantic contextual details and effectively eliminate redundant information from feature vectors. For example, in the detection of reentrancy vulnerabilities, the SFE module improves accuracy by 3.42%, this improvement is attributed to SFE’s unique gating mechanism, which allows it to learn bidirectional semantic features from the code. By controlling the transmission of important semantic information, SFE increases the focus on critical semantic details. In addition, unlike traditional convolutional layers, SFE uses dilated convolutional layers for feature extraction, which can flexibly adjust dilation rates to capture a wider range of contextual semantic information, ensuring the integrity of the code’s semantics.
While the DRSN module improves the F1-score by 3.91% in the infinite loop vulnerability, this improvement is attributed to DRSN’s unique adaptive soft threshold setting strategy. This adaptability allows the model to flexibly capture vulnerability features at different positions in the code structure, enabling the network model to better adapt and learn different vulnerability patterns. By reducing information unrelated to the current task, DRSN effectively reduces the impact of features irrelevant to vulnerabilities on the model, resulting in improved vulnerability detection performance.