A Systematic Review on Deep-Learning-Based Phishing Email Detection

Thakur, Kutub; Ali, Md Liakat; Obaidat, Muath A.; Kamruzzaman, Abu

doi:10.3390/electronics12214545

Open AccessSystematic Review

A Systematic Review on Deep-Learning-Based Phishing Email Detection

¹

Department of Professional Security Studies, New Jersey City University, Jersey City, NJ 07305, USA

²

Department of Computer Science & Physics, Rider University, 2083 Lawrenceville Rd, Lawrenceville, NJ 08648, USA

³

Department of Computer Science, City University of New York, New York, NY 10019, USA

⁴

Department of Business and Economics, York College/CUNY, Jamaica, NY 11451, USA

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(21), 4545; https://doi.org/10.3390/electronics12214545

Submission received: 2 October 2023 / Revised: 20 October 2023 / Accepted: 31 October 2023 / Published: 5 November 2023

(This article belongs to the Special Issue Cyber-Security in Smart Cities: Challenges and Solution)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Phishing attacks are a growing concern for individuals and organizations alike, with the potential to cause significant financial and reputational damage. Traditional methods for detecting phishing attacks, such as blacklists and signature-based techniques, have limitations that have led to developing more advanced techniques. In recent years, machine learning and deep learning techniques have gained attention for their potential to improve the accuracy of phishing detection. Deep learning algorithms, such as CNNs and LSTMs, are designed to learn from patterns and identify anomalies in data, making them more effective in detecting sophisticated phishing attempts. To develop a comprehensive understanding of the current state of research on the use of deep learning techniques for phishing detection, a systematic literature review is necessary. This review aims to identify the various deep learning techniques used for phishing detection, their effectiveness, and areas for future research. By synthesizing the findings of relevant studies, this review identifies the strengths and limitations of different approaches and provides insights into the challenges that need to be addressed to improve the accuracy and effectiveness of phishing detection. This review aims to contribute to developing a coherent and evidence-based understanding of the use of deep learning techniques for phishing detection. The review identifies gaps in the literature and informs the development of future research questions and areas of focus. With the increasing sophistication of phishing attacks, applying deep learning in this area is a critical and rapidly evolving field. This systematic literature review aims to provide insights into the current state of research and identify areas for future research to advance the field of phishing detection using deep learning.

Keywords:

deep learning; phishing email detection; email security; spam filtering; malicious email detection; email structure analysis; privacy preservation

1. Introduction

Phishing is a type of cyber-attack that targets individuals or organizations with the aim of stealing sensitive information such as passwords, credit card details, and other personal information [1]. According to recent reports, phishing attacks are on the rise and have become more sophisticated, causing significant financial and reputational damage [2]. Despite the availability of various traditional phishing detection methods, including blacklists [3] and signature-based techniques [4], their limitations have necessitated the development of more advanced techniques. Machine learning and deep learning techniques have gained increasing attention in recent years due to their potential to overcome the limitations of traditional techniques and improve the accuracy of phishing detection [5]. Phishing attacks are often carried out through emails, social media, and other online channels, and they are designed to deceive users into clicking on links or opening attachments that appear to be legitimate but are in fact malicious [6].

Traditional phishing detection methods often rely on the analysis of the content of the message, such as the sender, subject line, and text, to identify phishing attempts. However, these methods have become less effective as phishing attacks become more sophisticated [7]. In contrast, machine learning and deep learning techniques are designed to learn from patterns and identify data anomalies, making them more effective in detecting phishing attempts [8]. In recent years, deep learning techniques have become increasingly popular in cybersecurity, including in phishing detection [9]. Deep learning algorithms, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, have been shown to improve the accuracy of phishing detection by analyzing the content and structure of phishing emails and other online communications [10]. These techniques can identify patterns and features in the data that are not easily recognizable by traditional methods. Furthermore, deep learning algorithms can learn from large amounts of data and improve their accuracy as new data becomes available. Systematic literature reviews are widely recognized as an important tool for identifying and synthesizing the evidence base on a given research question or topic [11].

The use of deep learning techniques for phishing detection is a rapidly evolving field that has gained significant attention in recent years due to its potential to overcome traditional methods’ limitations and improve detection accuracy. However, there is a need for a systematic and comprehensive overview of the current state of research in this area, which can help to identify the strengths and limitations of different deep learning techniques and inform the development of future research. By synthesizing the findings of relevant studies and identifying gaps in the literature, a systematic literature review can develop a coherent and evidence-based understanding of the use of deep learning techniques for phishing detection. Specifically, such a review can offer insights into the effectiveness of different deep learning techniques and their limitations and potential areas for improvement. Additionally, a systematic review can identify gaps in the literature and inform the development of future research questions and areas of focus.

1.1. Our Contribution

There is lack of a comprehensive and up-to-date understanding of these techniques’ effectiveness, limitations, and potential areas for improvement. While there have been many studies on the topic, few have focused specifically on using deep learning techniques for phishing detection. Given the rapid evolution of this field and the importance of accurate phishing detection, a systematic review is necessary to synthesize the existing evidence, identify gaps in the literature, and inform the development of future research. The contributions of this research are given below:

1.: This systematic literature review aims to provide a comprehensive overview of the current state of research on the use of deep learning techniques for phishing detection.
2.: The review explores the various deep learning techniques used for phishing detection, their effectiveness, and areas for future research.
3.: By synthesizing the findings of relevant studies, this review helps to identify the strengths and limitations of different approaches and provide insights into the challenges that need to be addressed to improve the accuracy and effectiveness of phishing detection.

In summary, the use of deep learning techniques for phishing detection has shown great potential in improving the accuracy and effectiveness of detection. With the increasing sophistication of phishing attacks, the application of deep learning in this area is a critical and rapidly evolving field. This systematic literature review aims to provide insights into the current state of research and identify areas for future research to advance the field of phishing detection using deep learning.

1.2. Organization of the Document

This paper is organized as follows. Section 2 provides a details analysis of the proposed methodology. Section 3 presents the literature survey and the finding from the survey. Section 4 discusses the results and finding of this research as well as our suggestion highlighting research direction in the field of phishing email detection using deep learning. Finally, Section 5 presents the conclusion and limitations of this research. The organization of the paper is shown in Figure 1

2. Methodology

Systematic literature reviews (SLRs) are an essential part of research as they help in summarizing and evaluating the available literature in a particular research area. In this section, we provide a detailed methodology for conducting an SLR of deep-learning-based phishing detection.

2.1. Research Question and Search Strategy

The first step in conducting an SLR is to define the research question and develop a search strategy. In this study, the research question is focused on identifying state-of-the-art deep-learning-based phishing detection techniques. The search strategy involved searching for relevant literature in databases such as IEEE, ACM, ScienceDirect, and Springer- Link. The search terms used were “Deep Learning”, “Phishing Detection”, “Neural Networks”, “Convolutional Neural Networks”, “Recurrent Neural Networks”, “LSTM”, “GRU”, and “Attention Mechanism”.

2.2. Study Selection

The next step is to select relevant studies based on the defined inclusion and exclusion criteria. In this study, we included papers published in the English language and contained empirical results on deep-learning-based phishing detection. We excluded papers that were not related to the research question or were of low quality. Text-based phishing detection is a critical cybersecurity measure aimed at identifying and preventing phishing attacks that rely on deceptive text-based communication. Core components of text-based phishing detection encompass keyword examination, sender authentication, URL scrutiny, content analysis, user behavior monitoring, machine learning, real-time threat intelligence, and email header inspection. The effectiveness of text-based phishing detection lies in the amalgamation of these methods, forming a multifaceted defense strategy that minimizes susceptibility to phishing schemes. We have reviewed mostly text-based phishing emails in this study.

2.3. Data Extraction and Analysis

The next step is to extract data from the selected papers and analyze them. In this study, we extracted data such as the year of publication, study design, sample size, deep learning techniques used, and the results obtained. We analyzed the data and identified common themes and trends across the studies. We used qualitative analysis techniques to synthesize the findings from the selected studies.

2.4. Quality Assessment

The final step is to assess the quality of the included studies. In this study, we used the Quality Assessment Tool for Quantitative Studies (QATQS) to assess the quality of the studies. We evaluated the studies based on their design, bias, confounding factors, and statistical analysis. We provided an overall quality rating for each study.

2.5. Inclusion and Exclusion Criteria

2.5.1. Inclusion Criteria

The paper must contain empirical results on deep-learning-based phishing detection.
The paper must be published in the English language.

2.5.2. Exclusion Criteria

The paper is not available in full-text format.
The paper is not related to the research question.
The paper is a duplicate publication. The paper is a review article or a meta-analysis.
The paper is a conference abstract or poster presentation.
The paper is a book, book chapter, or thesis.
The paper is of low quality, as determined by the QATQS.

In conclusion, the methodology for conducting an SLR of deep-learning-based phishing detection involves defining the research question, developing a search strategy, selecting relevant studies, extracting data, analyzing the data, and assessing the quality of the studies. This approach ensures that the review is comprehensive, transparent, and replicable and the results obtained are of high quality.

3. Literature Survey and Findings

3.1. Research Papers Published in 2017 and Before

Nosseir, Khaled, and Taj-Eddin [12] presented a new method for phishing detection employing a character-word-based approach and a multi-NN classifier. They discussed that weight normalization, based on the ASCII value of word characters, trained each neural network. Their experimental results showed a high FPR and a low TNR. Future work for their suggested approach will involve using a larger database sample to evaluate the classifier’s performance thoroughly. ALmomani et al. [13] suggested a phishing detection approach named “Phishing Dynamic Evolving Neural Fuzzy (PDENF)”, which employed a hybrid learning approach combining supervised and unsupervised learning. Their suggested framework was designed to detect unknown zero-day phishing emails through an adaptive online learning mechanism enhanced by offline learning. The authors aimed to achieve high performance in terms of true negative, true positive, overall accuracy, sensitivity, F-measure, and precision. The authors claimed that PDENF had a low memory footprint and required minimal rules for email classification. Their next work might include evaluating the system on a bigger and more diversified dataset and comparing its performance to other SOTA phishing detection technologies.

Hamid, Abawajy, and Kim [14] proposed a behavior-driven approach to identify phishing emails using features of sender behavior. The authors extracted features employing Mbox2xml as a disassembly tool. They observed sender behavior and used the naïve Bayes method to classify the datasets into ham or phishing emails. Their suggested approach showed promising results with an accuracy of 94% using eight features. The limitations of their suggested approach motivated further study to examine attackers’ behavior and profile their modus operandi, particularly in the message-ID field. Jameel and George [15] used a feed-forward neural network to identify emails as legitimate or phishing based on features extracted from the HTML header and body. Their suggested model demonstrated a 98.72% high accuracy with a fast processing time of 0.00067 milliseconds. Future directions of their study could include exploring other machine learning techniques or incorporating additional email features for improved performance.

Soni [16] proposed an RCNN-based model named THEMIS with staggered vectors and an attention mechanism for email phishing detection. Their suggested model simultaneously considered the email header, word level, character level, and body. They used an imbalanced dataset with a good proportion of phishing and legitimate emails. They claimed their proposed model showed 99.848% accuracy and 0.043% FPR. Their future work aimed to enhance the model’s performance in detecting phishing messages without email headers. Zhang and Yuan [17] evaluated the effectiveness of multilayer feed-forward neural networks (NNs) for phishing email detection. They designed a feature set, processed a phishing dataset and implemented NN systems. They also analyzed the performance of the NNs through cross-validation and compared it with other prominent ML algorithms. They claimed that NNs achieved the highest recall rate while maintaining precision over 95%, demonstrating their proficiency in detecting phishing emails while only misclassifying a little portion of legitimate emails.

Kufandirimbwa and Gotora [18] introduced a new approach of spam filtering based on a single perceptron. Their proposed approach was designed to learn the distinction between illegitimate and legitimate sending server parameter values, messages, and emails. They claimed that the perceptron algorithm showed favorable results due to its continuous learning feature. In future research, the authors plan to implement the perceptron algorithm in a filtering server, conduct more performance tests to improve its blocking statistics, observe the filtering technique over a longer period, and incorporate the use of mail filtering server logs to track filtering events. Abu-Nimeh et al. [19] compared the accuracy of several DL and ML methods including RF, SVM, LR, BART, NN, RT, and CT for predicting phishing emails. They utilized a 2889 legitimate and phishing emails dataset as well as 43 features for training and testing the classifiers in their study. They claimed that NN outperformed all other methods with an AUC of 0.9448. The experimental results of their study suggested further research to improve predictive accuracy by adding more variables to the dataset.

Chandan et al. [20] suggested an approach of phishing detection that classified the security of a webpage by analyzing the source code. They extracted various phishing traits and evaluated the website’s security by decreasing the initial secure weight for each detected phishing character. They calculated the total security percentage, with higher rates indicating a secure website and lower percentages indicating a potential phishing website. The experimental findings of their suggested approach were demonstrated by comparing the final values of phished and legitimate websites. In the future, they wanted to employ a big sample database to assess the classifier’s performance extensively. Alkaht and Bassel [21] presented a method for filtering SPAM employing several-stage NN. They indicated that their strategy outperformed others while using less computational resources. They undertook a series of experiments using self-organizing global ranking and feed-forward NNs. Their suggested method showed 95.82% accuracy and 93.43% F1 score. The authors will investigate the application of the proposed method to other domains and evaluate its performance against other innovative techniques for filtering SPAM in the future.

3.2. Research Papers Published in 2018

Coyotes, Crypt, et al. [22] introduced a DL-based method for phishing email classification that outperformed traditional rule-based and machine-learning techniques. Their proposed architecture, comprising CNN, MLP, and RNN along with Word2vec embeddings, showed high accuracy during training; however, the test accuracy was relatively low because of the dataset’s highly imbalanced nature. They used two datasets in their study. The first one was subtask 1, and the second was subtask 2. They claimed that CNN performed better on subtask 1 with an accuracy of 95.2%, while RNN performed better on subtask 2 with an accuracy of 93.1%. Their proposed study highlighted the need for further research to address the imbalanced datasets issue. Smadi, Aslam, and Zhang [23] presented an algorithm that combined a NN with reinforcement learning to identify real-time phishing attacks. Their proposed method was adapted and improved over time through reinforcement learning and can handle new phishing behaviors. They conducted several tests on well-known datasets and claimed a TPR of 99.07%, an accuracy of 98.63%, a low FPR of 1.81%, an FNR of 93%, and a TNR of 98.19%. They concluded that future research could enrich the offline dataset to enhance the model’s performance and extend it to identify phishing, spam, and ham emails.

Hiransha, M., et al. [24] presented a model that utilized CNN and Keras word embedding to classify legitimate and phishing emails. They obtained dense vector representations of words by combining these two techniques. After that, they used hybrid representation to classify emails in the dataset. They showed that the proposed model performed well in header and non-header tasks with a high detection rate for phishing emails. The authors suggested that adding additional data sources could further improve the phishing emails detection accuracy in their developed methodology. Barushka and Hajek [25] presented a deep belief network with random deep NN and a rectified linear unit (DBBRDNN-ReL) for filtering spam. Their study’s experimental results showed that the proposed method outperformed other existing spam filters on three out of four datasets in terms of accuracy. They observed that DBB-RDNN-ReL performed exceptionally well on imbalanced and non-linear spam datasets with an AUC of 0.961. One limitation of this study is that the processing time required for the DBB-RDNN-ReL method may not be fast enough for some real-time applications. Their future research aimed to explore the method’s performance in multi-objective optimization. Table 1 provides the summary of papers in 2018.

3.3. Research Papers Published in 2019

Fang, Yong, et al. [26] suggested an innovative RCNN-based model of phishing email detection, THEMIS, with an attention mechanism and multi-level vectors. The authors analyzed the email structure and modeled the emails at the word, character, header, and body levels. Their work utilized an unbalanced dataset of legitimate and phishing email ratios. They claimed that their proposed method showed 99.848% overall accuracy with a low false FPR of 0.043%. The authors intended to enhance the model for identifying phishing emails without header and only with email body. Harikrishnan et al. [27] investigated the effect of random and time split pre-processing methods for classifying malicious or phishing URLs employing deep learning such as DNN, RNN, and CNN and classical ML techniques such as NB, DT, AB, and RF. Their experimental results showed that time split with DT classifier, and tf-idf representation achieved 88.5% accuracy. They highlighted that Tf-idf representation was the most effective than feature hashing and embedding. One of the limitations of this study is that they used a limited dataset in their evaluation. Their future work may aim to train the model with a larger volume of actual data by creating a scalable method to gather URL data from several sources.

Ali and Ahmed [28] proposed a hybrid approach to predict phishing websites using DNN having genetic algorithm (GA)-based feature selection and weighting. To improve prediction accuracy, they employed GA to discover the most important characteristics and optimum weights. The experimental results of their study showed that this hybrid approach achieved a higher classification accuracy of 95%. Their suggested approach provided a reliable solution for predicting phishing websites, and future research can explore using faster EAs for more efficient performance. Ona et al. [29] investigated the feasibility of detecting phishing emails by analyzing their structural properties and the technical content used by phishers. Their suggested study utilized the Agile Scrum methodology and the Matlab process tool for software implementation, which included machine learning, feature selection, and neural network algorithms. They claimed that their proposed model showed an average accuracy of 93.9%. However, they planned to explore deep learning and Bayesian NNs for improved phishing detection for further studies. Nguyen, Nguyen, and Nguyen [30] investigated cutting-edge text classification algorithms in NLP to combat email phishing. The authors provided a paradigm for modeling emails at the word and sentence levels that combines hierarchical LSTM networks and attention processes. They claimed that their suggested model showed a precision of 0.8934. The authors claimed that the proposed framework would illustrate deep learning’s potential in solving cybersecurity concerns. The study highlights the importance of future research in this area.

Wei, Bo, et al. [31] suggested a multi-spatial CNN system for the detection of phishing. They implemented a Raspberry Pi prototype for real-time detection. Their proposed method achieved a true detection rate of 86.63% and a 30% reduction in execution time. Their study only considered using a single DL model and did not compare its performance to other SOTA methods. Their study suggests future work on a DL-based webpage-content phishing detection model for resource-constrained sensors. Vinayakumar et al. [32] suggested a framework for spam and phishing detection employing DL architectures and traditional ML algorithms. They conducted experiments using email and URL datasets with 1000 epochs and varied learning rates. They claimed that their proposed framework could detect malicious activities, but its performance could be improved by integrating sub-modules for DNS malware and log analysis, which will be a significant future direction. Yanz, Zhao, and Zeng [33] presented a multidimensional feature approach of phishing detection using DL. Their proposed method involved character sequence feature extraction and quick classification. Their suggested methods showed a 98.99% accuracy and a low 0.59% FPR. The limitation of their study was that it just focused on a single type of phishing attack (website-based) and did not address other types of phishing attacks that may use different techniques and tactics. For further studies, the authors aim to improve the technique by using other deep learning methods to extract text and code features from web pages. Table 2 provides the summary of papers in 2019.

3.4. Research Papers Published in 2020

Saha et al. [34] proposed a DL-based model to detect phishing web pages. Their suggested model showed promising results with a 95% training accuracy and 93% test accuracy. They explained that the difference between accuracy of test and training was narrow, meaning that the presented model could effectively recognize unfamiliar webpages. They showed that detection accuracy of legitimate websites was 98.4%, outperforming existing phishing detection systems. The authors aimed to improve the model’s ability to detect phishing websites in future studies by adding more layers in the NNs and using more advanced NNs, such as a backpropagation NN. Thapa et al. [35] suggested a phishing email detection system combining RCNN and BERT. Their study analyzed the performance of developed methods in the detection of phishing emails under several settings such as asymmetrical and balanced data scalability and distribution. They showed that global RNN-based model accuracy decreased by 1.8%, while BERT accuracy rose by 0.6% when increasing organizational counts. They highlighted that the overall performance of their introduced method improved with an increase in the email dataset but was affected by highly asymmetric data distribution. In the future, the authors planned to enhance their suggested framework for phishing email detection by considering highly asymmetrical email dataset distributions and studying the impact of increasing organizational counts on performance.

Adebowale, Lwin, and Hossain [36] proposed a novel approach, IPDS, to differentiate phishing and legitimate URLs by combining CNN and LSTM techniques. They developed the IPDS classifier using a DL approach to analyze pictures and text comprehensively. Their suggested approaches were tested against a dataset of 1,000,000 URLs, 13,000 characteristics, and 10,000 pictures gathered from authentic and phishing websites. Although this dataset provided a comprehensive analysis of phishing URLs, ensuring the availability and quality of such datasets would be a challenge. The experimental results of their study showed that IPDS achieved a 93.28% classification accuracy. In the future, the authors want to improve the accuracy of IPDS and create a web browser plugin that utilizes DL to detect online phishing in real time. Alotaibi, Al-Turaiki, and Alakeel [37] introduced the CNNPD, a CNN-based framework for email phishing detection. They used a huge dataset of legitimate and phishing emails. Their suggested method showed promising performance with an accuracy of 99.42%. In the future, the authors intend to optimize the hyperparameters, explore more deep-learning architectures, and evaluate the models on a larger dataset.

Baccouche et al. [38] suggested a multi-label model of LSTM that combined two different datasets for spam and fraud detection in emails and social media posts. The authors used a shared dataset of common bigrams from different datasets to train their developed system. Their proposed model showed an accuracy of 92.7%. One limitation of their study was not comparing its results with other cutting-edge techniques for malicious text detection. The authors intended to evaluate other NLP techniques to boost their model’s precision in the future. Soon et al. [39] examined the NNs usage in phishing detection. They evaluated deep learning neural network (DLNN) and ensemble feed-forward neural network (EFFNN) on the CSDMC2010 SPAM dataset using 18 unique features. Their experimental results showed that the EFFNN outperformed the DLNN with an accuracy of 94.41%. One limitation of this study could be the limited size and diversity of the dataset used for evaluation. In future work, the authors want to explore other machine-learning techniques and test the NNs on a larger and more diverse dataset to improve phishing detection further.

Alauthman [40] suggested a GRU-RNN and SVM approach for the botnet spam email detection model. They used a spam base dataset in their work. They claimed that their developed method showed an accuracy of 98.7%. Their research had a limited scope since it only utilized one dataset to evaluate the performance of the suggested model. They concluded that the proposed approach demonstrated strong capability in detecting spam emails; however, future directions include exploring other multiclass classifiers to enhance the impact of the GRU model. Eryılmaz, Sahin, and Kilic [41] presented a hybrid DL system to detect spam emails in a Turkish dataset. They used a dataset of 800 emails, with half classified as spam. They employed the Keras deep learning library and LSTM algorithm in this work. They claimed that their suggested model achieved 100% accuracy in detecting Turkish spam emails. They observed that the use of the “adam” optimizer and activation functions, including relu and tanh, also contributed to the success of the Keras system. One limitation of this study was that it only evaluated the suggested method using a limited dataset of 800 emails, only half of which were classified as spam. In future studies, the researchers plan to further improve the results by conducting experiments using other deep learning algorithms, activation functions, and optimizer functions.

Halgas, Agrafiotis, and Nurse [42] proposed a novel automated detection system of phishing email using RNNs. The results of their study showed superior performance over traditional expert feature selection methods and highlighted the potential of considering overlooked email information. The proposed RNN solution outperformed state-of-the-art systems and was easily generalized to basic spam email classification. This study opens up prospects for further automated phishing threat mitigation. I¸sik et al. [43] provided an alternative solution to email classification using deep learning architectures applied to two feature selection methods, MI and WMI. The system involved feature reduction, feature vector construction with BoW, and performance analysis using ANN, LSTM, and BILSTM models. Their experimental results showed 100% accuracy with LSTM and BILSTM combined with MI or WMI for the Turkish language, with WMI having higher cross-validation performance. Their research only focused on the Turkish language. In future work, they want to combine MI with other feature selection methods for email classification. Kim Soon et al. [39] compared the performance of three AI methods such as feed-forward NN, RNN, and ensemble NN in detecting phishing emails. The results showed that ensemble neural network outperformed the other two with slightly better accuracy. Their study was based on a limited dataset, which may not represent the diversity and complexity of phishing attacks in real-world scenarios. Further investigation could focus on improving the accuracy of these AI methods or exploring other AI models for phishing email detection.

AlEroud and Karabatis [44] presented a GAN-based approach for generating URL-driven phishing examples that could fool Blackbox phishing detectors. Their system was evaluated using actual phishing datasets and showed promising results in deceiving both simple and sophisticated ML-based phishing detection models. Their study was based on a controlled environment and may not accurately reflect the real-world situation where attackers constantly evolve their techniques to bypass detection. They aimed to analyze the suggested approach to graph-based phishing detection techniques for a comprehensive evaluation in the future. Castillo et al. [45] discussed using ML, DL, and NLP to identify phishing content in emails. The authors tested various neural network designs using word embedding representations to distinguish suspicious messages. Their experimental results showed that backpropagation with and without RNN layers outperformed existing techniques. Their study only focused on limited datasets. They needed to work on more training datasets, as well as the evaluation of different messaging genres and languages.

Kumar, Chatterjee, and Díaz [46] developed a hybrid phishing detection strategy that combined SVM classification and feature extraction. They observed that probabilistic NN would more accurately distinguish spam from genuine emails with the selected features. Their presented method works on a small range of phishing and non-phishing emails. Their future work aims to improve the system by expanding the dataset with a wider range of phishing and non-phishing emails to reflect real-world situations better and adapt to evolving phishing tactics. Their ultimate goal is to develop a practical system for widespread use in organizations and for individuals to prevent phishing attacks. Opara, Wei, and Chen [47] introduced HTMLPhish, a DL-based method for automatic phishing webpage classification. Their proposed method utilized CNN algorithms to learn from the HTML document’s textual contents without the need for human feature engineering. They claimed that their proposed approach achieved over 93% accuracy on a dataset of 50,000 HTML documents. Their proposed method is limited to HTML document analysis; future work is needed to compare it with other models. Their future research will compare the results with other models that use feature engineering and plan to turn their work into a browser extension. Table 3 provides the summary of papers in 2020.

3.5. Research Papers Published in 2021

AbdulNabi and Yaseen [48] examined the use of word embedding for spam email classification. The authors fine-tuned a pre-trained BERT model for this task, with results compared to a DNN model and classic classifiers such as naïve Bayes and k-NN. Their proposed strategy achieved a 98.67% accuracy and 98.66% F1 score on two open-source datasets. One of the limitations of this study is that the authors used a sparse input sequence length, which was limited by GPU memory. The survey conducted by Otter et al. [49] offers a concise but comprehensive introduction to the field of deep learning in computational linguistics. It begins by providing a swift overview of various deep learning architectures and methodologies. The survey then delves into the vast array of recent studies, summarizing numerous noteworthy contributions. These contributions encompass core linguistic processing topics and an array of applications within the field of computational linguistics. It concludes by presenting recommendations for potential avenues of future research, highlighting the evolving nature of the field and the need for ongoing exploration and innovation. Alhogail and Alsabih [50] proposed a deep-learning-based phishing email classifier model. Their suggested model applied GCN and NLP to enhance phishing detection on the email body text. Their suggested classifier achieved 98.2% accuracy with a 0.015 low FPR. This study demonstrated the effectiveness of their suggested classifier in detecting the phishing emails employing body text. The authors only worked on the English language and plan to add non-English datasets in the future. Bagui et al. [51] suggested a novel approach for identifying phishing emails. They used deep semantic analysis to extract the intrinsic properties of the email text body. They employed a one-hot encoding approach with DL and ML algorithms to classify the emails. The authors compared various parameters and hyperparameters of DL models and presented results of ML models such as DT, NB, and SVM, as well as DL models like CNN and LSTM. They claimed that CNN with word embedding was the most effective model, with an accuracy of 96.34%. They could focus on further improving the accuracy while reducing computation time as a future direction of their study.

Lee et al. [52] proposed a comprehensive and multi-modular system named D-Fence to detect phishing emails. They used structure, text, and URL modules to detect phishing in various email components. Their developed system D-Fence successfully detected phishing attacks with 99% accuracy. They expressed that D-Fence maintained a high detection rate while saving much computational time. Their study relied on multiple modules for phishing detection, which would increase the complexity and cost of the system, making it less practical for real-world deployment. The researchers aimed to enhance the accuracy and performance of the D-Fence: by incorporating user feedback into the model in future studies. Manaswini and Srinivasu [53] developed an email phishing detection model named Themis. Their presented model analyzed the email structure by combining the email header and body. They applied a CBOW Multi-level Word2Vec mechanism and an improvised RCNN algorithm with an attention approach. Their suggested model achieved a 99.87% overall accuracy with a low FPR of 0.042%. The limitation of this study is that it only focused on analyzing the email structure and did not consider other factors, such as the sender’s reputation and behavior that could contribute to phishing detection. The authors plan to improve their model’s accuracy further by incorporating additional features, such as email structure and sender reputation.

Ghaleb et al. [54] developed a spam detection system (SDS) through the combination of multilayer perceptron (MLP) and six different variants of enhanced evolutionary algorithms. The authors aimed to train the MLP with evolutionary algorithms to identify emails as non-spam or spam. Their suggested model was evaluated on three datasets: UK-2011 Webspam, SpamBase, and SpamAssassin. Their experiments showed that E2GOAMLP achieved best classification accuracies of 95.6%, 96.9%, and 98.1% and detection rates of 96.6%, 97.2%, and 97.8% on UK-2011Webspam, SpamBase, and SpamAssassin datasets, respectively. Their spam detection study was inadequate because it had no specified way to choose features. To improve it, they intended to use fewer features and a combination of different classifiers. Eckhardt and Bagui [55] compared the performance of CNN and LSTM models for textual data classification. They showed that LSTM achieved the highest accuracy of 98.32% and a 96.57% ROC score. Their comparison is limited to textual data classification. They claimed that the Adam optimizer outperformed the SGD optimizer for both models. They also observed that ReLU activation function surpassed the results of CNN, while the sigmoid activation function had an average better performance with LSTM. Sheneamer [56] compared DL algorithms such as LSTM, and CNN with and without GloVe to classify non-spam and spam email messages. They trained the models on the data of email with automatic feature extraction. They claimed that CNN with GloVe achieved the best accuracy of 96.52%. They used a limited dataset in their study, which might not represent the entire population of emails. Future work includes incorporating image data in deep learning classifiers to improve results.

Dubey et al. [57] utilized data and web mining to uncover patterns and extract textual information from web pages. They applied this to a phishing system aimed at detecting malicious messages susceptible to terrorism and diverting them to a spam folder. Their suggested system was beneficial for law enforcement to raise awareness among the public and track those involved in terrorism. The results showed the effectiveness of using data and web mining in detecting harmful messages and providing a solution to prevent the spread of terrorism. Their proposed method may not be feasible to implement the proposed system of phishing detection in real time, as the data and web mining is time-consuming and computationally expensive. Further research can focus on improving the system’s accuracy and expanding its scope to other forms of online malicious activities. Table 4 provides the summary of papers in 2021.

3.6. Research Papers Published in 2022

Samarthrao and Rohokale [58] developed an innovative model for detecting email spam in order to improve cybersecurity. They included several phases in their model such as dataset acquisition, feature extraction, optimal feature selection, and detection. They performed optimal feature selection using evolutionary algorithms. The experimental results of their study showed that the suggested approach had 12.24% better accuracy than KNN, 14.93% better accuracy than DT, 12.24% better accuracy than NN, and 10.32% better accuracy than SVM. They concluded that some misclassification still existed despite the advantages of textual and visual features. It will be addressed in future work by developing advanced deep learning models and incorporating more image datasets. Dewis and Viana [59] explored the issue of phishing and spam in their research and proposed a hybrid machine learning solution named Phish Responder. Their suggested solution combined algorithms of DL and NLP to detect spam and phishing emails. They claimed that their suggested showed 99% average accuracy for the text-based datasets using LSTM and 94% for numerical-based datasets using MLP. They observed that Phish Responder was statistically significantly better than existing solutions through comparison and an independent t-test. They used a limited dataset in their work used. They concluded that future work might involve combining deep learning methods and improvements on Phish Responder’s text-based technique to make it statistically significant.

Khan et al. [60] proposed a new fuzzy-logic-based evaluation metric. Their suggested methods conducted a preliminary empirical analysis using BERT and LSTM models from deep learning and three benchmark datasets. Their experiments showed that LSTM performed better for the Enron and PU datasets, with μO values ranging from 0.88 to 0.96, while BERT had better values ranging from 0.94 to 0.96 for the Lingspam dataset. Their paper presented a promising approach to evaluating the performance of email phishing detection models using a multi-criteria fuzzy logic-based method and can be further expanded in the future. Malhotra and Malik [61] investigated the different techniques for identifying spam and legitimate emails using ML and DL classifiers. They conducted several tests and claimed that the BiLSTM classifier achieved the highest accuracy of 98.5% and an F1-measure of 96%. The authors concluded that future research should extend the presented techniques’ applicability to other domains, such as e-commerce and job-profile-based websites that are easily vulnerable to email phishing. They also recommended exploring the possibility of implementing the classifier in real time and developing a smartphone application that lets users quickly detect fake information.

Korkmaz et al. [62] introduced TshPhish, a hybrid model to detect phishing attacks. They used a large URL and content dataset, including 51,316 legitimate and 36,173 phishing links from PhishTank. They utilized a fivefold cross-validation method to evaluate URL-based, content-based, and TshPhish models. The experimental findings of their study showed that the URL-based GCNN model achieved 97.68% accuracy, while the DNN model in the content-based model achieved 93.39% accuracy. They claimed that their proposed method TshPhish showed the highest accuracy of 98.37%. Their research attempted to enhance the dataset size and improve the feature selection process through evolutionary algorithms. Zhu, Erzhou, et al. [63] introduced a model of phishing detection named CCBLA. Their proposed method combined CNN, bi-directional LSTM, and attention mechanism. They conducted several experiments on two datasets. They claimed that their suggested methods showed a total accuracy of 99.85% in detecting phishing attacks with minimal time consumption. One limitation of this study is that the presented model is too complex and computationally intensive. Their future work will focus on efficient methods to learn features from URLs as phishing attacks continuously change. Nooraee, and Ghaffari [64] introduced a deep learning approach using a combination of LSTM neural network and the Glove word embedding method to detect spam emails. Their proposed model showed accuracies of 98.39% and 99.49% on two different datasets. The authors recommended that in the future, they would train the model on a bigger dataset to enhance accuracy and investigate its applicability in identifying spam in other languages.

Prosun, Alam, and Bhowmik [65] discussed two voting architectures based on ML models and ensemble classifiers. The authors investigated the performance of numerous ensemble approaches and individual classifiers utilizing various feature retrieval algorithms. Their proposed models performed well with the ML-based voting model with an accuracy of 98%. Their future work will include implementing other benchmark datasets and comparing them with other feature extraction models such as seq2seq and word2vec. Jafar et al. [66] presented an innovative method for identifying phishing URLs by employing the GRU. They claimed that their suggested method was a highly accurate and fast phishing classifier with an accuracy of 98.30%. Their study focused on detecting phishing attacks during the COVID-19 crisis would limit its generalizability to other types of crises or phishing scenarios. For further studies, the authors aim to expand its scope and improve accuracy and efficiency with advanced machine-learning approaches. Quang Do et al. [67] reviewed the use of DL for phishing detection. They analyzed the limitations and strengths of current methods and discussed the challenges faced by deep learning in this field. They conducted an empirical review to examine the effectiveness of several deep learning algorithms in real-world settings. To address these challenges, they aimed to optimize parameters and incorporate less explored DL techniques, such as GANs or DRLs, in future studies. In their study, Min-Gang Zhou and colleagues [68] introduced a novel quantum neural network model that harnesses the power of artificial intelligence, quantum communication, and quantum computing for enhanced security measures. Their innovative approach employs classical control over single-qubit operations and measurements on real-world quantum systems, mitigating the challenges associated with environmental decoherence. The results demonstrate exceptional nonlinear classification capabilities and noise resilience, expanding the potential of quantum computing beyond conventional applications and paving the way for the earlier development of quantum neural computers.

Rafat et al. [69] discussed the text pre-processing impact on email classification using ML and DL algorithms. They used the Spamassassin corpus and compared the results of ML and DL algorithms, with and without text pre-processing. They observed that DL algorithms consistently outperformed ML models, as LSTM achieved the precision of 95.26%, recall of 97.18%, and F1-score of 96% without text pre-processing. Their proposed study only focused on the limitations of ML and DL algorithms in detecting encrypted communication and did not explore other possible solutions to tackle this issue. Their future directions include broader email content analysis, deploying spam filters near main servers, and extending email spam filtering beyond textual context. Rathee and Mann [70] explored the use of both DL and ML in detecting phishing emails. They analyzed different ML and DL models introduced over the past few years to identify phishing emails. The study discussed the issue and future directions and showed that DL techniques, like CNN and RNNs, need to be employed in detecting phishing emails. Despite the constant upgrade of countermeasures, phishing emails are increasing, requiring more advanced detection technology. However, this sector lacks tools and resources, necessitating more study to evaluate DL approaches for detecting phishing emails.

Mughaid et al. [71] introduced a detection system employing DL techniques that utilize features such as email text and other characteristics to identify the email as phishing or no phishing. They claimed that their suggested method showed accuracies of 0.97, 0.88, and 1.00 using a boosted decision tree on three different datasets. Their proposed method cannot effectively handle modern phishing techniques. However, in the future, they want to improve the feature selection techniques to better handle the modern phishing techniques used by phishers. The authors recommended developing an automated tool to extract new characteristics and enhance the detection of phishing emails. Butt et al. [72] conducted a study to identify phishing emails employing different classification algorithms, SVM, NB, and LSTM, on a modified dataset with various data sizes and features. Their experimental results showed that SVM achieved 99.62% accuracy, NB achieved 97% accuracy, and LSTM achieved 98% accuracy. The authors suggested that the methodology could be improved by combining phishing and non-phishing emails to establish a unique dataset that reflects real-life scenarios where fraudsters continuously evolve their techniques. In the future, their proposed framework could be implemented across organizations and used confidentially to protect customers from phishing attacks.

Logavarshini and Yogalakshmi [73] analyzed email structure and introduced a novel phishing detection model. They used an improved RCNN with attention mechanisms and multilevel vectors. Their proposed method examined emails at multiple levels, including the header, body, character, and words. Their suggested method is limited to detecting phishing emails with headers. The authors claimed that their model achieved high accuracy in detecting phishing emails. The authors intend to improve the algorithm for identifying phishing emails without a header in future work. Ghaleb et al. [74] developed a wrapper approach based on the EGOA algorithm for MLP training and an evolutionary algorithm for feature extraction to improve the SDS performance. They tested the proposed system using UK-2011, SpamBase, and SpamAssassin datasets and showed better results than other established practices by up to 96.4%, 97.5%, and 98.3%, respectively. Their introduced method is limited to detecting spam; future work is needed to detect other cyber-attacks. Babu [75] proposed a method of phishing detection in emails employing a multi-convolutional NN fusion to classify phishing and legitimate URLs. Their approach validated the URLs without accessing the content. Their experimental results showed the potential for constructing a robust security defense against attackers. Their suggested work is limited to URL validation without accessing content; future work is needed to compare with other ML and DL techniques. Shmalko et al. [76] evaluated the effectiveness of the Profiler against an ML-ensemble strategy using cutting-edge techniques on 9000 legitimate and 900 phishing email datasets. The experimental results of their study showed that the Profiler’s horizontal method resulted in 30% fewer false positives and 25% fewer false negatives than the ML-ensemble approach. Their study also demonstrated the Profiler’s ability to handle concept drift by testing it on a dataset of 3300 file-sharing phishing emails. Their work is limited to email phishing detection, and future work is needed to handle concept drift. The summary of papers is shown in Table 5.

3.7. Research Papers Published in 2023

Muralidharan and Nissim [77] presented a method for completely automated phishing email detection by analyzing email segments employing deep ensemble learning. Their suggested framework eliminated the need for manual feature engineering and outperformed existing methods with an AUC of 0.993 and TPR of 5%. Future research will focus on incorporating federated learning for privacy preservation. Bountakas and Xenakis [78] proposed a new phishing email detection technique named HELPHED. They combined methods of ensemble learning with hybrid features that integrated content and textual traits of emails. They used two methods in their work: the first was stacking ensemble learning, and the second was soft voting ensemble learning. Their proposed method achieved superior results in thoroughly evaluating a rich imbalanced dataset, yielding an F1-score of 0.9942. The authors concluded that further research is necessary to evaluate the effectiveness of the proposed methodology in detecting phishing emails, using a larger and more diverse dataset.

Wen, Tingke, et al. [79] suggested an innovative LSTM-FCN and BP NN-based phishing scam account detection model named LBPS for detecting phishing scams in blockchain financial security. Their suggested model combined the BP NN for the implicit feature analysis and the LSTM-FCN NN for temporal feature analysis of transaction records. The experimental results of their study using Ethereum data showed that the selected features effectively identified accounts of phishing scam with a 97.86% F1-score. Their future research aimed to enhance the dataset with more authoritative sources for improved generalization and extend the LBPS model’s application to identify phishing scams on other blockchain technologies such as Bitcoin. Table 6 provides the summary of papers in 2023.

4. Results and Analysis

4.1. Findings of Data Analysis

The research findings presented in these data showcase the process of collecting and analyzing academic papers on a specific topic. The dataset consisted of 223 papers, out of which 62 were deemed irrelevant and 35 were found to be duplicates. After removing these papers, the remaining dataset comprised 126 papers. Figure 2 shows the conference vs. journal distribution of our collected data.

Further analysis of the remaining 126 papers revealed that 55 were journal papers, 33 were conference papers, 10 were reviews, 19 were workshops, and 9 were of low quality. The papers that were categorized as reviews, workshops, and low-quality conferences were subsequently discarded, leaving a total of 88 papers that were deemed suitable for the literature review process. Of these 88 papers, 55 were journal papers and 33 were conference papers. Looking at the 55 selected journal papers, the data showed that 12 papers were published in 2022, followed by 6 papers in 2021, and 5 papers in 2020. The years 2019 and 2018 each had six papers, while 2017 and 2016 each had four papers, 2015 had three papers, and 2014 and 2013 each had two papers. Only one paper each was selected from 2011 and 2007, as shown in Figure 3.

On the other hand, the data show that six conference papers were published in 2022, followed by five papers in 2020 and four papers in 2019. The years 2018 and 2013 each had three papers, 2017 and 2016 each had two papers, and 2015 and 2014 each had two papers. Only one paper each was selected from 2008 and 2002, as shown in Figure 4.

4.2. Limitations Found

The literature review on phishing email detection using deep learning revealed several limitations that need to be addressed in future research. This section discusses these limitations and their implications for the proposed phishing detection models. One of the significant limitations is the lack of focus on privacy preservation in the proposed models. While the models aim to detect phishing emails, they may also reveal sensitive user information. Therefore, future research should focus on preserving user privacy in phishing detection models. Another limitation is the misclassification of phishing emails, which indicates that the models are not yet accurate enough. The models may incorrectly classify legitimate emails as phishing emails, or vice versa. This can lead to false positives and negatives, which can harm user trust in the models. Therefore, researchers need to address this limitation by improving the accuracy of the models. Moreover, the studies focus only on analyzing the email structure and do not consider other factors, such as the sender’s reputation and behavior. Therefore, researchers need to explore how to incorporate additional email features.

Another limitation of the literature review is the limited data used in the studies. The studies did not provide a detailed explanation of the dataset employed, which makes it difficult to evaluate the models’ performance. Therefore, future research should include a more comprehensive dataset and provide a detailed explanation of the dataset’s characteristics. The studies also suggest that the models can be further expanded in the future, indicating that there is room for improvement. Researchers need to explore new approaches to phishing detection, optimize the feature selection mechanism using evolutionary algorithms, and increase the dataset size to enhance the models’ efficiency. The studies also indicate that the models have minimal time consumption, making them ideal for real-time phishing detection. However, the models are limited to one language, which reduces their effectiveness in a multilingual context. Therefore, future research should explore how to make the models more language-agnostic.

The studies also suggest that more benchmark datasets and feature extraction models are needed to improve the models’ performance. This requires researchers to focus on collecting and analyzing diverse datasets, including different types of phishing attacks. Furthermore, the studies focus on detecting phishing attacks during the COVID-19 crisis, limiting their generalizability to other types of crises or phishing scenarios. Therefore, future research should explore how the models can be applied in different contexts. The studies also focus only on text-based spam email detection, which ignores other types of phishing attacks. Researchers need to develop models that can detect different types of phishing attacks, including image-based and voice-based attacks.

Moreover, the studies lack tools and resources, requiring further research to develop better phishing detection models. Therefore, researchers need to focus on developing better tools and resources for phishing detection. Another limitation is the models’ inability to handle modern phishing techniques effectively. This suggests that researchers need to focus on developing models that can adapt to evolving phishing techniques. The studies are also limited to detecting phishing emails with headers, which reduces their effectiveness in detecting sophisticated phishing attacks that may not include headers. Therefore, researchers need to develop models that can detect phishing emails without headers. Furthermore, the studies focus only on email phishing detection, ignoring other types of cyber-attacks. Researchers need to develop models that can detect different types of cyber-attacks, including malware and ransomware attacks.

The studies also limit URL validation without accessing content, which reduces their effectiveness in detecting phishing attacks that use sophisticated techniques. Therefore, future research should compare the proposed models with other machine learning and deep learning techniques to enhance their efficiency. Additionally, the studies are limited to supervised learning and tested only with an English corpus. Future research should explore other machine learning techniques and evaluate the models’ performance with diverse datasets in different languages. The studies also rely on multiple modules for phishing detection, increasing the complexity and cost of the system. Researchers need to develop simpler models that are more practical for real-world deployment.

4.3. Future Direction

Based on the limitations outlined in the literature review, the following are some potential future directions for research in the field of phishing email detection using deep learning.

4.3.1. Privacy Preservation

Future research should focus on incorporating privacy preservation techniques into phishing email detection systems to ensure that sensitive user information is not compromised. By combining strong encryption, user consent mechanisms, and data anonymization, we can improve the security of email communications while protecting personal data from unauthorized access and phishing attacks. This comprehensive approach can help create a safer digital environment for everyone globally.

4.3.2. Increasing Dataset Size and Optimizing Feature Selection

To improve the performance of phishing email detection systems, we must increase the dataset size and optimize feature selection. Future research should focus on expanding the dataset to include more phishing email variations and using evolutionary algorithms to intelligently select the most informative features. This will make these crucial cybersecurity tools more efficient and accurate.

4.3.3. Broader Email Content Analysis

Future research could focus on expanding beyond text-based spam email detection to include broader email content analysis, deploying spam filters near main servers, and extending email spam filtering beyond textual context. To improve spam detection and filtering, we need to incorporate advanced machine learning techniques to identify spam based on multimedia content, such as images and videos. We also need to develop new strategies to counter evolving spam tactics, such as deepfakes and voice-based spam.

4.3.4. Handling Modern Phishing Techniques

Phishing techniques are constantly evolving, and future research should aim to develop phishing email detection systems that can effectively handle modern phishing techniques. Modern phishing techniques are a constant challenge in cybersecurity. As cybercriminals become more sophisticated, future research should focus on developing advanced phishing email detection systems capable of identifying and blocking these increasingly complex threats. This proactive approach is essential to protect individuals and organizations from falling victim to the ever-evolving phishing attack landscape.

4.3.5. Handling Concept Drift

Future work is needed to handle concept drift and ensure that the phishing email detection system remains effective over time. Concept drift is a major challenge in phishing email detection. We need to develop adaptive models and algorithms that can continuously learn and evolve to detect new phishing tactics and patterns as they emerge. This research will lead to more robust and reliable email security solutions.

4.3.6. Consideration of Additional Factors

Consideration of additional factors in phishing detection is crucial for enhancing overall security measures. Future research should not only focus on email structure analysis but also delve into aspects like the sender’s reputation and past behavior, as these factors can significantly influence the effectiveness of anti-phishing strategies.

4.3.7. Comparison with State-of-the-Art Techniques

Comparison with state-of-the-art techniques is a crucial step in evaluating the proposed model’s effectiveness in the field of malicious text detection. Future work should encompass a comprehensive comparison of the model’s results against other state-of-the-art techniques, thereby validating its performance and competitiveness in addressing the challenges of identifying malicious content in text data. This comparative analysis will help ascertain the model’s true potential and guide further refinements in the quest for more robust and accurate detection methods.

4.3.8. Hyperparameter Optimization and More Deep Learning Architectures

To further improve deep learning model performance, future research should explore new architectures, in addition to optimizing hyperparameters. This will push the boundaries of neural network design and innovation, enabling models to learn more effectively and solve more complex problems. By combining advanced architectures with effective hyperparameter tuning, the field of deep learning can continue to evolve and achieve new levels of success in a wide range of applications.

4.3.9. Real-Time Dataset and Processing

Future research should aim to develop a real-time dataset and processing system that can effectively detect phishing emails in real-time. A phishing protection system could empower us to preemptively defend individuals and organizations from constantly evolving phishing threats, ensuring a faster and more effective response against cyberattacks. It has the potential to significantly reduce the risks and consequences of phishing incidents in our digital age.

4.3.10. Exploration of Other Machine Learning Techniques

Exploration of other machine learning techniques is a crucial avenue for further research, as it can potentially lead to enhanced email classification. To improve the performance and accuracy of email management systems, future work should integrate a broader range of email features. This diversified approach to machine learning could lead to even more effective email management tools.

4.3.11. Incorporating Additional Data Sources

Phishing email detection systems can be made more accurate and effective by using more data sources. By including a wider range of data, such as user behavior patterns and threat intelligence feeds, organizations can significantly improve their ability to find and stop phishing attacks, making their cybersecurity more resilient.

4.3.12. Enriching the Dataset

Enriching the email security dataset is essential for improving the accuracy and capabilities of the model. By increasing the size and diversity of the dataset, the model can better distinguish between different types of emails, including spam, ham, and phishing emails. This enhancement will significantly improve the model’s real-world performance in email security and filtering, making email communication safer and more efficient for users.

4.3.13. Exploring Attackers’ Behavior and Modus Operandi

Exploring attackers’ behavior and modus operandi is a critical area of research in cybersecurity. By closely examining the tactics, techniques, and procedures employed by malicious actors, we can gain valuable insights into their strategies, allowing us to enhance our defenses. This is especially important in the context of email security, where phishing attacks are prevalent. Focusing on elements like the message-ID field, which attackers often manipulate, can help us design more robust and effective phishing email detection systems. Such research can ultimately bolster our ability to thwart cyber threats and protect sensitive information in an ever-evolving digital landscape.

4.3.14. Testing on Other Domains

Future research should test the proposed model on other domains to ensure it is effective across different email accounts. This broader evaluation will help ensure that the model can reliably categorize emails from a wide range of email accounts, thus confirming its real-world applicability. Here is a rewritten version of your sentences: Researchers can improve the performance and adaptability of email classification models by exploring various domains, assessing potential biases, and adapting the models to address unique challenges. This approach would also make the models more robust and dependable for a broader user base.

Overall, many avenues for future research in phishing email detection using deep learning exist. Addressing the limitations outlined in the literature review will be critical to improving the performance and practicality of these systems. Moreover, the network, relying on quantum encryption and quantum digital signatures can not only bolster network security but also assist in identifying phishing emails [80]. This is another area of investigation for future research directions.

5. Conclusions

In conclusion, the literature review has identified several limitations and future directions for phishing email detections using deep learning. While deep learning has shown promising results in detecting phishing emails, there is room for improvement. Some of the key limitations include the focus on a limited dataset without clear explanations on the dataset used, misclassification still existing, minimal time consumption, being limited to one language, and the lack of tools and resources required for research. To overcome these limitations, future research can focus on privacy preservation in phishing email detection, increasing the dataset size, optimizing feature selection mechanism using evolutionary algorithms, and expanding the research to other languages and types of crises. More benchmark datasets and feature extraction models are needed to improve the efficiency of the system. Furthermore, to enhance the detection rate of phishing emails, it is necessary to deploy spam filters near main servers, extend email spam filtering beyond textual context, and broaden email content analysis. Additionally, modern phishing techniques need to be effectively handled, and concept drift should be addressed in future work. The use of a limited input sequence length and unbalanced dataset can also be improved, and the gap between training and test accuracy needs to be reduced. Moreover, future research can explore other machine learning techniques or incorporate additional email features to improve performance. Deep learning architectures need to be enhanced, and larger dataset evaluation is needed. The proposed models’ performance should also be compared with other state-of-the-art techniques for malicious text detection. The detection rate of phishing emails can be further improved by adding sub-modules for DNS log analysis and malware analysis.

In summary, although this literature review has highlighted several limitations and future directions in detecting phishing emails using deep learning, these limitations can be addressed through further research. The proposed improvements can help to enhance the accuracy and efficiency of phishing email detection systems, ultimately providing better cybersecurity and protection for individuals and organizations against phishing attacks.

Author Contributions

Conceptualization, K.T., M.L.A. and M.A.O.; methodology, K.T., M.L.A. and M.A.O.; validation, K.T., M.A.O. and A.K.; formal analysis, K.T., M.A.O. and A.K.; investigation, M.A.O. and A.K.; resources, A.K.; writing—original draft preparation, K.T., M.L.A. and M.A.O.; writing—review and editing, M.A.O.; visualization, A.K.; supervision, K.T.; project administration, M.A.O. and M.L.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data sharing not applicable. No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Alshingiti, Z.; Alaqel, R.; Al-Muhtadi, J.; Haq, Q.E.U.; Saleem, K.; Faheem, M.H. A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics 2023, 12, 232. [Google Scholar] [CrossRef]
Tsohou, A.; Diamantopoulou, V.; Gritzalis, S.; Lambrinoudakis, C. Cyber insurance: State of the art, trends and future directions. Int. J. Inf. Secur. 2023, 22, 737–748. [Google Scholar] [CrossRef]
Sheng, S.; Wardman, B.; Warner, G.; Cranor, L.; Hong, J.; Zhang, C. An Empirical Analysis of Phishing Blacklists. In Proceedings of the Sixth Conference on Email and Anti-Spam, Mountain View, CA, USA, 16–17 July 2009. [Google Scholar]
Edge, M.E.; Sampaio, P.R.F. A survey of signature based methods for financial fraud detection. Comput. Secur. 2009, 28, 381–394. [Google Scholar] [CrossRef]
Safi, A.; Singh, S. A systematic literature review on phishing website detection techniques. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 590–611. [Google Scholar] [CrossRef]
Aldawood, H.; Skinner, G. An Advanced Taxonomy for Social Engineering Attacks. Int. J. Comput. Appl. 2020, 177, 1–11. [Google Scholar] [CrossRef]
Aleroud, A.; Zhou, L. Phishing environments, techniques, and countermeasures: A survey. Comput. Secur. 2017, 68, 160–196. [Google Scholar] [CrossRef]
Kocher, G.; Kumar, G. Machine learning and deep learning methods for intrusion detection systems: Recent developments and challenges. Soft Comput. 2021, 25, 9731–9763. [Google Scholar] [CrossRef]
Chen, D.; Wawrzynski, P.; Lv, Z. Cyber security in smart cities: A review of deep learning-based applications and case studies. Sustain. Cities Soc. 2021, 66, 102655. [Google Scholar] [CrossRef]
Adebowale, M.A.; Lwin, K.T.; Hossain, M.A. Deep learning with convolutional neural network and long short-term memory for phishing detection. In Proceedings of the 2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Island of Ulkulhas, Maldives, 26–28 August 2019; pp. 1–8. [Google Scholar]
Thomas, B.; Ciliska, D.; Dobbins, M.; Micucci, S. A Process for Systematically Reviewing the Literature: Providing the Research Evidence for Public Health Nursing Interventions. Worldviews Evid.-Based Nurs. 2004, 1, 176–184. [Google Scholar] [CrossRef]
Nosseir, A.; Nagati, K.; Taj-Eddin, I. Intelligent word-based spam filter detection using multi-neural networks. Int. J. Comput. Sci. Issues (IJCSI) 2013, 10 Pt 1, 17. [Google Scholar]
Almomani, A.; Gupta, B.B.; Wan, T.C.; Altaher, A.; Manickam, S. Phishing dynamic evolving neural fuzzy framework for online detection zero-day phishing email. Indian J. Sci. Technol. 2013, 6, 3960–3964. [Google Scholar] [CrossRef]
Hamid, I.R.A.; Abawajy, J.; Kim, T.H. Using feature selection and classification scheme for automating phishing email detection. Stud. Inform. Control. 2013, 22, 61–70. [Google Scholar] [CrossRef]
Jameel, N.G.M.; George, L.E. Detection of phishing emails using feed forward neural network. Int. J. Comput. Appl. 2013, 77, 10–15. [Google Scholar]
Soni, A.N. Spam-e-mail-detection-using-advanced-deep-convolution-neuralnetwork-algorithms. J. Innov. Dev. Pharm. Tech. Sci. 2019, 2, 74–80. [Google Scholar]
Zhang, N.; Yuan, Y. Phishing Detection Using Neural Network. Available online: http://cs229.stanford.edu/proj2012/ZhangYuan-PhishingDetectionUsingNeuralNetwork.pdf (accessed on 1 October 2023).
Kufandirimbwa, O.; Gotora, R. Spam detection using artificial neural networks (perceptron learning rule). Online J. Phys. Environ. Sci. Res. 2012, 1, 22–29. [Google Scholar]
Abu-Nimeh, S.; Nappa, D.; Wang, X.; Nair, S. A comparison of machine learning techniques for phishing detection. In Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, Pittsburgh, PA, USA, 4–5 October 2007; pp. 60–69. [Google Scholar]
Chandan, C.J.; Chheda, H.P.; Gosar, D.M.; Shah, H.R.; Bhave, P.U. A Machine learning approach for detection of phished websites using neural networks. Int. J. Recent Innov. Trends Comput. Commun. 2014, 2, 42054209. [Google Scholar]
Alkaht, I.J.; Al Khatib, B. Filtering SPAM Using Several Stages Neural Networks. Int. Rev. Comput. Softw. (IRECOS) 2016, 11, 123–132. [Google Scholar] [CrossRef]
Coyotes, C.; Mohan, V.S.; Naveen, J.; Vinayakumar, R.; Soman, K.P.; Verma, A.D.R. ARES: Automatic rogue email spotter. In Proceedings of the 1st AntiPhishing Shared Pilot at 4th ACM International Workshop on Security and Privacy Analytics (IWSPA), Tempe, AZ, USA, 1–11 March 2018. [Google Scholar]
Smadi, S.; Aslam, N.; Zhang, L. Detection of online phishing email using dynamic evolving neural network based on reinforcement learning. Decis. Support Syst. 2018, 107, 88–102. [Google Scholar] [CrossRef]
Hiransha, M.; Unnithan, N.A.; Vinayakumar, R.; Soman, K.; Verma, A.D.R. Deep learning based phishing e-mail detection. In Proceedings of the 1st AntiPhishing Shared Pilot at 4th ACM International Workshop Security Privacy Analytics (IWSPA), Tempe, AZ, USA, 1–11 March 2018; pp. 1–5. [Google Scholar]
Barushka, A.; Hajek, P. Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Appl. Intell. 2018, 48, 3538–3556. [Google Scholar] [CrossRef]
Fang, Y.; Zhang, C.; Huang, C.; Liu, L.; Yang, Y. Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism. IEEE Access 2019, 7, 56329–56340. [Google Scholar] [CrossRef]
Harikrishnan, N.B.; Vinayakumar, R.; Soman, K.P.; Poornachandran, P. Time split based pre-processing with a data-driven approach for malicious url detection. Cybersecur. Secur. Inf. Syst. Chall. Solut. Smart Environ. 2019, 43–65. [Google Scholar] [CrossRef]
Ali, W.; Ahmed, A.A. Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Inf. Secur. 2019, 13, 659–669. [Google Scholar] [CrossRef]
Oña, D.; Zapata, L.; Fuertes, W.; Rodríguez, G.; Benavides, E.; Toulkeridis, T. Phishing attacks: Detecting and preventing infected e-mails using machine learning methods. In Proceedings of the 2019 3rd Cyber Security in Networking Conference (CSNet), IEEE, Quito, Ecuador, 23–25 October 2019; pp. 161–163. [Google Scholar]
Nguyen, M.; Nguyen, T.; Nguyen, T.H. A deep learning model with hierarchical lstms and supervised attention for anti-phishing. CEUR Workshop Proc. 2018, 2124, 29–38. [Google Scholar]
Wei, B.; Hamad, R.A.; Yang, L.; He, X.; Wang, H.; Gao, B.; Woo, W.L. A deep-learning-driven light-weight phishing detection sensor. Sensors 2019, 19, 4258. [Google Scholar] [CrossRef] [PubMed]
Vinayakumar, R.; Soman, K.P.; Poornachandran, P.; Akarsh, S.; Elhoseny, M. Deep learning framework for cyber threat situational awareness based on email and url data analysis. In Cybersecurity and Secure Information Systems: Challenges and Solutions in Smart Environments; Springer: Berlin/Heidelberg, Germany, 2019; pp. 87–124. [Google Scholar]
Yang, P.; Zhao, G.; Zeng, P. Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning. IEEE Access 2019, 7, 15196–15209. [Google Scholar] [CrossRef]
Saha, I.; Sarma, D.; Chakma, R.J.; Alam, M.N.; Sultana, A.; Hossain, S. Phishing attacks detection using deep learning approach. In Proceedings of the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), IEEE, Tirunelveli, India, 20–22 August 2020; pp. 1180–1185. [Google Scholar]
Thapa, C.; Tang, J.W.; Abuadbba, A.; Gao, Y.; Camtepe, S.; Nepal, S.; Almashor, M.; Zheng, Y. Evaluation of Federated Learning in Phishing Email Detection. Sensors 2023, 23, 4346. [Google Scholar] [CrossRef] [PubMed]
Adebowale, M.A.; Lwin, K.T.; Hossain, M.A. Intelligent phishing detection scheme using deep learning algorithms. J. Enterp. Inf. Manag. 2020, 36, 747–766. [Google Scholar] [CrossRef]
Alotaibi, R.; Al-Turaiki, I.; Alakeel, F. Mitigating email phishing attacks using convolutional neural networks. In Proceedings of the 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS), IEEE, Riyadh, Saudi Arabia, 19–21 March 2020; pp. 1–6. [Google Scholar]
Baccouche, A.; Ahmed, S.; Sierra-Sosa, D.; Elmaghraby, A. Malicious text identification: Deep learning from public comments and emails. Information 2020, 11, 312. [Google Scholar] [CrossRef]
Soon, G.K.; On, C.K.; Rusli, N.M.; Fun, T.S.; Alfred, R.; Guan, T.T. March. Comparison of simple feedforward neural network, recurrent neural network and ensemble neural networks in phishing detection. J. Phys. Conf. Ser. 2020, 1502, 012033. [Google Scholar] [CrossRef]
Alauthman, M. Botnet Spam E-Mail Detection Using Deep Recurrent Neural Network. Int. J. Emerg. Trends Eng. Res. 2020, 8, 1979–1986. [Google Scholar] [CrossRef]
Eryılmaz, E.E.; Şahin, D.Ö.; Kılıç, E. Filtering turkish spam using LSTM from deep learning techniques. In Proceedings of the 2020 8th International Symposium on Digital Forensics and Security, ISDFS, IEEE, Beirut, Lebanon, 1–2 June 2020; pp. 1–6. [Google Scholar]
Halgaš, L.; Agrafiotis, I.; Nurse, J.R. Catching the Phish: Detecting phishing attacks using recurrent neural networks (RNNs). In Proceedings of the Information Security Applications: 20th International Conference, WISA 2019, Jeju Island, Republic of Korea, 21–24 August 2019; pp. 219–233. [Google Scholar]
Isik, S.; Kurt, Z.; Anagun, Y.; Ozkan, K. Spam E-mail Classification Recurrent Neural Networks for Spam E-mail Classification on an Agglutinative Language. Int. J. Intell. Syst. Appl. Eng. 2020, 8, 221–227. [Google Scholar] [CrossRef]
AlEroud, A.; Karabatis, G. Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In Proceedings of the Sixth International Workshop on Security and Privacy Analytics, New Orleans, LA, USA, 18 March 2020; pp. 53–60. [Google Scholar]
Castillo, E.; Dhaduvai, S.; Liu, P.; Thakur, K.S.; Dalton, A.; Strzalkowski, T. Email threat detection using distinct neural network approaches. In Proceedings of the First International Workshop on Social Threats in Online Conversations: Understanding and Management, Marseille, France, 11–16 May 2020; pp. 48–55. [Google Scholar]
Kumar, A.; Chatterjee, J.M.; Díaz, V.G. A novel hybrid approach of SVM combined with NLP and probabilistic neural network for email phishing. Int. J. Electr. Comput. Eng. (IJECE) 2020, 10, 486–493. [Google Scholar] [CrossRef]
Opara, C.; Wei, B.; Chen, Y. HTMLPhish: Enabling phishing web page detection by applying deep learning techniques on HTML analysis. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
AbdulNabi, I.; Yaseen, Q. Spam Email Detection Using Deep Learning Techniques. Procedia Comput. Sci. 2021, 184, 853–858. [Google Scholar] [CrossRef]
Otter, D.W.; Medina, J.R.; Kalita, J.K. A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Trans. Neural Networks Learn. Syst. 2020, 32, 604–624. [Google Scholar] [CrossRef]
Alhogail, A.; Alsabih, A. Applying machine learning and natural language processing to detect phishing email. Comput. Secur. 2021, 110, 102414. [Google Scholar] [CrossRef]
Bagui, S.; Nandi, D.; Bagui, S.; White, R.J. Machine learning and deep learning for phishing email classification using one-hot encoding. J. Comput. Sci. 2021, 17, 610–623. [Google Scholar] [CrossRef]
Lee, J.; Tang, F.; Ye, P.; Abbasi, F.; Hay, P.; Divakaran, D.M. D-Fence: A flexible, efficient, and comprehensive phishing email detection system. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, Vienna, Austria, 7–11 September 2021; pp. 578–597. [Google Scholar]
Manaswini, M.; Srinivasu, D.N. Phishing Email Detection Model using Improved Recurrent Convolutional Neural Networks and Multilevel Vectors. Ann. Rom. Soc. Cell Biol. 2021, 25, 16674–16681. [Google Scholar]
Ghaleb, S.A.A.; Mohamad, M.; Fadzli, S.A.; Ghanem, W.A.H.M. Training Neural Networks by Enhance Grasshopper Optimization Algorithm for Spam Detection System. IEEE Access 2021, 9, 116768–116813. [Google Scholar] [CrossRef]
Eckhardt, R.; Bagui, S. Convolutional Neural Networks and Long Short Term Memory for Phishing Email Classification. Int. J. Comput. Sci. Inf. Secur. 2021, 19, 27–35. [Google Scholar]
Sheneamer, A. Comparison of Deep and Traditional Learning Methods for Email Spam Filtering. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 560–565. [Google Scholar] [CrossRef]
Dubey, K.A.; Ganesh, K.B.; Gowtham, V.; Balakrishnan, M.D. Phishing email detection. Int. J. Emerg. Technol. Comput. Sci. Electron. (IJETCSE) 2021, 28, 1–4. [Google Scholar]
Samarthrao, K.V.; Rohokale, V.M. Enhancement of email spam detection using improved deep learning algorithms for cyber security. J. Comput. Secur. 2022, 30, 231–264. [Google Scholar] [CrossRef]
Dewis, M.; Viana, T. Phish Responder: A Hybrid Machine Learning Approach to Detect Phishing and Spam Emails. Appl. Syst. Innov. 2022, 5, 73. [Google Scholar] [CrossRef]
Khan, S.A.; Iqbal, K.; Mohammad, N.; Akbar, R.; Ali, S.S.A.; Siddiqui, A.A. A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms. Appl. Sci. 2022, 12, 7043. [Google Scholar] [CrossRef]
Malhotra, P.; Malik, S. Spam Email Detection Using Machine Learning and Deep Learning Techniques. In Proceedings of the International Conference on Innovative Computing & Communication (ICICC), Delhi, India, 24 June 2022. [Google Scholar] [CrossRef]
Korkmaz, M.; Koçyiğit, E.; Şahingöz, Ö.; Diri, B. A Hybrid Phishing Detection System by Using Deep Learning-Based URL and Content Analysis. Elektron. Ir Elektrotechnika 2022, 28, 80–89. [Google Scholar] [CrossRef]
Zhu, E.; Yuan, Q.; Chen, Z.; Li, X.; Fang, X. CCBLA: A Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism. Cogn. Comput. 2022, 15, 1320–1333. [Google Scholar] [CrossRef]
Nooraee, M.; Ghaffari, H. Optimization and Improvement of Spam Email Detection Using Deep Learning Approaches. J. Comput. Robot. 2022, 15, 61–70. [Google Scholar]
Prosun, P.R.K.; Alam, K.S.; Bhowmik, S. Improved Spam Email Filtering Architecture Using Several Feature Extraction Techniques. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning: BIM 2021, Cox’s Bazar, Bangladesh, 23–25 September 2021; Springer: Singapore, 2021; pp. 665–675. [Google Scholar]
Jafar, M.T.; Al-Fawa’reh, M.; Barhoush, M.; Alshira’H, M.H. Enhanced Analysis Approach to Detect Phishing Attacks During COVID-19 Crisis. Cybern. Inf. Technol. 2022, 22, 60–76. [Google Scholar] [CrossRef]
Do, N.Q.; Selamat, A.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H. Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions. IEEE Access 2022, 10, 36429–36463. [Google Scholar] [CrossRef]
Zhou, M.-G.; Liu, Z.-P.; Yin, H.-L.; Li, C.-L.; Xu, T.-K.; Chen, Z.-B. Quantum Neural Network for Quantum Neural Computing. Research 2023, 6, 0134. [Google Scholar] [CrossRef] [PubMed]
Rafat, K.F.; Xin, Q.; Javed, A.R.; Jalil, Z.; Ahmad, R.Z. Evading obscure communication from spam emails. Math. Biosci. Eng. 2021, 19, 1926–1943. [Google Scholar] [CrossRef] [PubMed]
Rathee, D.; Mann, S. Detection of E-Mail Phishing Attacks – using Machine Learning and Deep Learning. Int. J. Comput. Appl. 2022, 183, 1–7. [Google Scholar] [CrossRef]
Mughaid, A.; AlZu’bi, S.; Hnaif, A.; Taamneh, S.; Alnajjar, A.; Abu Elsoud, E. An intelligent cyber security phishing detection system using deep learning techniques. Clust. Comput. 2022, 25, 3819–3828. [Google Scholar] [CrossRef] [PubMed]
Butt, U.A.; Amin, R.; Aldabbas, H.; Mohan, S.; Alouffi, B.; Ahmadian, A. Cloud-based email phishing attack using machine and deep learning algorithm. Complex Intell. Syst. 2022, 9, 3043–3070. [Google Scholar] [CrossRef]
Logavarshini, G.; Yogalakshmi, S. E-Mail Spam Classification Via Deep Learning and Natural Language Processing. Int. J. Res. Publ. Rev. 2022, 2582, 7421. [Google Scholar]
Ghaleb, S.A.A.; Mohamad, M.; Ghanem, W.A.H.M.; Nasser, A.B.; Ghetas, M.; Abdullahi, A.M.; Saleh, S.A.M.; Arshad, H.; Omolara, A.E.; Abiodun, O.I. Feature Selection by Multiobjective Optimization: Application to Spam Detection System by Neural Networks and Grasshopper Optimization Algorithm. IEEE Access 2022, 10, 98475–98489. [Google Scholar] [CrossRef]
Babu, D.K. Phishing Detection in Emails Using Multi-Convolutional Neural Network Fusion. Ph.D. Thesis, National College of Ireland, Dublin, Ireland, 2022. [Google Scholar]
Shmalko, M.; Abuadbba, A.; Gaire, R.; Wu, T.; Paik, H.Y.; Nepal, S. Profiler: Profile-Based Model to Detect Phishing Emails. arXiv 2022, arXiv:2208.08745. [Google Scholar]
Muralidharan, T.; Nissim, N. Improving malicious email detection through novel designated deep-learning architectures utilizing entire email. Neural Networks 2023, 157, 257–279. [Google Scholar] [CrossRef]
Bountakas, P.; Xenakis, C. HELPHED: Hybrid Ensemble Learning PHishing Email Detection. J. Netw. Comput. Appl. 2023, 210, 103545. [Google Scholar] [CrossRef]
Wen, T.; Xiao, Y.; Wang, A.; Wang, H. A novel hybrid feature fusion model for detecting phishing scam on Ethereum using deep neural network. Expert Syst. Appl. 2023, 211, 118463. [Google Scholar] [CrossRef]
Liu, Z.-P.; Zhou, M.-G.; Liu, W.-B.; Li, C.-L.; Gu, J.; Yin, H.-L.; Chen, Z.-B. Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution. Opt. Express 2022, 30, 15024–15036. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Paper organization.

Figure 2. Percentage of conference vs. journal papers.

Figure 3. Year wise Distribution of Journal papers.

Figure 4. Year wise Distribution of Conference papers.

Table 1. The summary of research paper published in 2018.

Ref	Method	Data	Result	Innovations	Limitations
[22]	CNN, MLP, RNN	Self-generated emails dataset	Accuracy: 93.1%	Highlighted issues related to imbalance data	Highly imbalanced nature of the dataset
[23]	NN	SpanAssian	Accuracy: 99.07%	Provided guidelines to improve offline data	Needed to enrich the offline dataset to enhance model performance
[24]	CEN-Deepspam	Self-generated emails dataset	Accuracy: 95.5%	Larger dataset could improve accuracy	Additional dataset required to validate the result
[25]	DBB-RDNN-ReL	Enron, SpamAssassin, SMS Spam Colection	Accuracy: 96.1%	DBBRDNN-ReL model outperformed compared to other models	Slow processing

Table 2. The summary of research paper published in 2019.

Ref	Method	Data	Result	Innovations	Limitations
[26]	THEMIS	Enron and SpamAssassin	Accuracy: 99.85%	Utilized unbalanced dataset	Limited to detecting phishing emails with header
[27]	NB, DT, AB, RF, DNN, RNN, CNN	PhishTank	Accuracy: 88.5%	Tf-idf presentation is better than feature hashing and embedding	Limited real-time dataset
[28]	DNN	UCI phishing websites	Accuracy: 95%	Hybrid model performs better for classification	Feature selection requires longer time
[29]	NN	Debian and PhishTank	Accuracy: 93.9%	Better accuracy	Limited use of deep learning
[30]	LSTM	Data-no-header and data-full-header	Accuracy: 89.34%	-	Low effectiveness
[31]	Multi-spatial CNN	Self-generated emails dataset	Accuracy: 86.63%	30% reduction in the execution time	Did not compare model’s performance with other state-of-the-art methods
[32]	CNN, RNN, CNN-RNN, CNN-LSTM	Spam dataset. URL dataset	Recall: 99%	Better performance in detecting malware	Performance could be improved by adding sub-modules
[33]	CNN, RNN, LSTM, CNN-RNN	Self-generated emails dataset	Accuracy: 98.99%	High accuracy and low FPR	Focused on a single type of phishing attack

Table 3. The summary of research papers published in 2020.

Ref	Method	Data	Result	Innovations	Limitations
[36]	IPDS	URLs	Accuracy: 93.28%	Novel approach to differentiate phishing and legitimate URLs	Ensuring the availability of the dataset would be challenging
[37]	CNN	PhishingCorpus and SpamAssasin	Accuracy: 99.42%	Used a huge dataset to detect phishing emails	Used a smaller dataset
[38]	Multi-label LSTM	Self-generated emails dataset	Accuracy: 92.7%	Used combined dataset	No comparison of the results
[40]	GRU-RNN+SVM	Spambase dataset	Accuracy: 98.7%	Claimed higher accuracy	Limited to one dataset
[41]	LSTM+Keras	800 Turkish emails dataset	Accuracy: 100%	Proposed hybrid model	Limited dataset
[42]	RNNs	SA-JN and En-JN datasets	Accuracy: 98.91% and 96.74%	Outperformed state-of-the-art systems	Unrealistically hard
[43]	ANN, LSTM, and BILSTM	Self-generated Turkish emails dataset	Accuracy: 100%	Highest accuracy	Focused on the Turkish language only
[44]	GAN-based	PhishTank and MillerSmiles	TPR: 97%	Has used actual phishing dataset	Controlled environment
[45]	ML, DL, NLP	Rnron, APWG	Accuracy: 93%	-	Limited dataset
[46]	SVM combined with NLP and PNN	Self-generated emails dataset	Accuracy: 89%	Probabilistic NN would be more accurate in phishing detection	Only works on a small phishing dataset
[47]	CNN	HTML documents	Accuracy: 93%	Automatic phishing web page detection	Limited to HTML document analysis

Table 4. The summary of research paper published in 2021.

Ref	Method	Data	Result	Innovations	Limitations
[50]	GCN+NLP	Self-generated email body text dataset	Accuracy: 98.2%	Enhance phishing detection on the email body text	Tested only English corpus
[51]	CNN and LSTM	Self-generated emails dataset	Accuracy: 96.34%	CNN with word embedding is most accurate	Tested only English corpus
[52]	D-Fence	Self-generated emails dataset	Accuracy: 99%	D-Fence maintained a high detection rate	Relied on multiple modules
[53]	Themis	Self-generated emails dataset	Accuracy: 99.87%	Combined email head and body	Focused only on analyzing the email structure
[54]	MLP	SpamBase, SpamAssassin, UK-2011 Webspam	Accuracy: 98.1%	Used several dataset and features	Spam detection study is inadequate
[55]	CNN and LSTM	Two datasets	Accuracy: 98.3%	Adam optimizer outperformed the SGD optimizer	Comparison limited to textual data classification
[56]	CNN	Self-generated emails dataset	Accuracy: 96.52%	Automated features extraction	Limited datasets

Table 5. The summary of machine learning-based phishing detection research published in 2022.

Ref	Method	Data	Result	Innovations	Limitations
[58]	Fitness-oriented, Levy improvement-based Dragonfly	N/A	Accuracy: 14.93%	Better performance than DT, KNN, and SVM	Misclassification existed
[59]	DL+NLP	Text-based and numerical-based datasets	Accuracy: 99% (text-based) and 94% (numerical-based)	Phish Responder better than other models	Limited data used; no explanation on the dataset employed
[61]	ML and DL	N/A	Accuracy: 98.5%	BiLSTM classifier performed better	Dataset did not contain variety of spam emails
[62]	TshPhish	PhishTank	Accuracy: 98.37%	Improved feature selection through evolutionary algorithms	Low recall rate
[63]	CCBLA	Two datasets	Accuracy: 99.85%	Combined CNN, bi-directional LSTM, and attention mechanism	Huge time consumption
[64]	LSTM and Glove word embedding	Two datasets	Accuracy: 98.39% and 99.49%	Used multiple datasets	Limited to one language
[65]	ML-based voting model	N/A	Accuracy: 98%	Used various feature retrieval algorithms	Lack of benchmark datasets
[66]	GRU-based Phishing URL detection	Phishing URLs	Accuracy: 98.30%	Highly accurate classifier	Limited detection of phishing attacks during COVID-19
[67]	Deep learning	N/A	Accuracy: 92%	Incorporated less explored DL techniques	No details of empirical analysis
[69]	ML and DL	Spamassassin	Precision: 95.26%, recall: 97.18%, F1-score: 96%	Focused on the limitations of ML and DL algorithms	Broader email content analysis
[71]	DL	Email text	Accuracy: 88–100%	-	Cannot effectively handle modern phishing techniques
[73]	RCNN	Email Structure	N/A	Examined emails at multiple levels, including the header, body, character, and words	Limited to detecting phishing emails with header
[74]	Multiobjective optimization	SpamBase, SpamAssassin, and UK-2011 datasets	Accuracy: 97.5%, 98.3%, and 96.4%	-	Limited to detecting spam

Table 6. The summary of research papers published in 2023.

Ref	Method	Data	Result	Innovations	Limitations
[77]	Deep ensemble learning	Email segments	AUC of 0.993 and TPR of 5%	Higher AUC result	Focus on privacy preservation in future work.
[78]	HELPHED	Imbalanced	F1-score: 99.42%	Superior result in the imbalance dataset	Focused on the detection and did not address prevention or mitigation of attacks. The dataset was imbalanced.
[79]	LBPS	Ethereum data	F1-score: 97.86%	Phishing scam account detection model	Tested the LBPS model only on Ethereum data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thakur, K.; Ali, M.L.; Obaidat, M.A.; Kamruzzaman, A. A Systematic Review on Deep-Learning-Based Phishing Email Detection. Electronics 2023, 12, 4545. https://doi.org/10.3390/electronics12214545

AMA Style

Thakur K, Ali ML, Obaidat MA, Kamruzzaman A. A Systematic Review on Deep-Learning-Based Phishing Email Detection. Electronics. 2023; 12(21):4545. https://doi.org/10.3390/electronics12214545

Chicago/Turabian Style

Thakur, Kutub, Md Liakat Ali, Muath A. Obaidat, and Abu Kamruzzaman. 2023. "A Systematic Review on Deep-Learning-Based Phishing Email Detection" Electronics 12, no. 21: 4545. https://doi.org/10.3390/electronics12214545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Systematic Review on Deep-Learning-Based Phishing Email Detection

Abstract

1. Introduction

1.1. Our Contribution

1.2. Organization of the Document

2. Methodology

2.1. Research Question and Search Strategy

2.2. Study Selection

2.3. Data Extraction and Analysis

2.4. Quality Assessment

2.5. Inclusion and Exclusion Criteria

2.5.1. Inclusion Criteria

2.5.2. Exclusion Criteria

3. Literature Survey and Findings

3.1. Research Papers Published in 2017 and Before

3.2. Research Papers Published in 2018

3.3. Research Papers Published in 2019

3.4. Research Papers Published in 2020

3.5. Research Papers Published in 2021

3.6. Research Papers Published in 2022

3.7. Research Papers Published in 2023

4. Results and Analysis

4.1. Findings of Data Analysis

4.2. Limitations Found

4.3. Future Direction

4.3.1. Privacy Preservation

4.3.2. Increasing Dataset Size and Optimizing Feature Selection

4.3.3. Broader Email Content Analysis

4.3.4. Handling Modern Phishing Techniques

4.3.5. Handling Concept Drift

4.3.6. Consideration of Additional Factors

4.3.7. Comparison with State-of-the-Art Techniques

4.3.8. Hyperparameter Optimization and More Deep Learning Architectures

4.3.9. Real-Time Dataset and Processing

4.3.10. Exploration of Other Machine Learning Techniques

4.3.11. Incorporating Additional Data Sources

4.3.12. Enriching the Dataset

4.3.13. Exploring Attackers’ Behavior and Modus Operandi

4.3.14. Testing on Other Domains

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI