Machine Learning for Cybersecurity Threats, Challenges, and Opportunities III

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 9344

Special Issue Editors

Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain
Interests: computer security; cyber security; privacy; information security; cryptography; intrusion detection; malware; trust; anonymity
Special Issues, Collections and Topics in MDPI journals
Group of Analysis, Security and Systems (GASS), Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain
Interests: computer and network security; multimedia forensics; error-correcting codes; information theory
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Cybersecurity has become a major priority for every organization. The right controls and procedures must be put in place to detect potential attacks and protect against them. However, the number of cyber-attacks will be always bigger than the number of people trying to protect themselves against them. New threats are being discovered on a daily basis, making it harder for current solutions to cope with a large amount of data to analyze. Machine-learning systems can be trained to find attacks which are similar to known attacks. In this way, we can detect even the first intrusions of their kind and develop better security measures.

The sophistication of threats has also increased substantially. Sophisticated zero-day attacks may go undetected for months at a time. Attack patterns may be engineered to take place over extended periods of time, making them very difficult for traditional intrusion-detection technologies to detect. Even worse, new attack tools and strategies can now be developed using adversarial machine learning techniques, requiring a rapid co-evolution of defenses that matches the speed and sophistication of machine-learning-based offensive techniques. Based on this motivation, this Special Issue aims to provide a forum for people from academia and industry to communicate their latest results on theoretical advances and industrial case studies that combine machine learning techniques, such as reinforcement learning, adversarial machine learning, and deep learning with significant problems in cybersecurity. Research papers can be focused on offensive and defensive applications of machine learning to security. The potential topics of interest to this Special Issue are listed below. Submissions can contemplate original research, serious dataset collection and benchmarking, or critical surveys.

Potential topics include, but are not limited to, the following:

  • Adversarial training and defensive distillation;
  • Attacks against machine learning;
  • Black-box attacks against machine learning;
  • Challenges of machine learning for cyber-security;
  • Ethics of machine learning for cyber-security applications;
  • Generative adversarial models;
  • Graph representation learning;
  • Machine-learning forensics;
  • Machine-learning threat intelligence;
  • Malware detection;
  • Neural graph learning;
  • One-shot learning; continuous learning;
  • Scalable machine learning for cyber security;
  • Steganography and steganalysis based on machine-learning techniques;
  • Strength and shortcomings of machine learning for cyber-security.

Prof. Dr. Luis Javier García Villalba
Dr. Ana Lucila Sandoval Orozco
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 6639 KiB  
Article
BlockMatch: A Fine-Grained Binary Code Similarity Detection Approach Using Contrastive Learning for Basic Block Matching
by Zhenhao Luo, Pengfei Wang, Wei Xie, Xu Zhou and Baosheng Wang
Appl. Sci. 2023, 13(23), 12751; https://doi.org/10.3390/app132312751 - 28 Nov 2023
Viewed by 713
Abstract
Binary code similarity detection (BCSD) plays a vital role in computer security and software engineering. Traditional BCSD methods heavily rely on specific features and necessitate rich expert knowledge, which are sensitive to code alterations. To improve the robustness against minor code alterations, recent [...] Read more.
Binary code similarity detection (BCSD) plays a vital role in computer security and software engineering. Traditional BCSD methods heavily rely on specific features and necessitate rich expert knowledge, which are sensitive to code alterations. To improve the robustness against minor code alterations, recent research has shifted towards machine learning-based approaches. However, existing BCSD approaches mainly focus on function-level matching and face challenges related to large batch optimization and high quality sample selection at the basic block level. To overcome these challenges, we propose BlockMatch, a novel fine-grained BCSD approach that leverages natural language processing (NLP) techniques and contrastive learning for basic block matching. We treat instructions of basic blocks as a language and utilize a DeBERTa model to capture relative position relations and contextual semantics for encoding instruction sequences. For various operands in binary code, we propose a root operand model pre-training task to mitigate semantic missing of unseen operands. We then employ a mean pooling layer to generate basic block embeddings for detecting binary code similarity. Additionally, we propose a contrastive training framework, including a block augmentation model to generate high-quality training samples, improving the effectiveness of model training. Inspired by contrastive learning, we adopt the NT-Xent loss as our objective function, which allows larger sample sizes for model training and mitigates the convergence issues caused by limited local positive/negative samples. By conducting extensive experiments, we evaluate BlockMatch and compare it against state-of-the-art approaches such as PalmTree and SAFE. The results demonstrate that BlockMatch achieves a recall@1 of 0.912 at the basic block level under the cross-compiler scenario (pool size = 10), which outperforms PalmTree (0.810) and SAFE (0.798). Furthermore, our ablation study shows that the proposed contrastive training framework and root operand model pre-training task help our model achieve superior performance. Full article
Show Figures

Figure 1

22 pages, 6849 KiB  
Article
Leveraging Explainable Artificial Intelligence in Real-Time Cyberattack Identification: Intrusion Detection System Approach
by Xavier Larriva-Novo, Carmen Sánchez-Zas, Víctor A. Villagrá, Andrés Marín-Lopez and Julio Berrocal
Appl. Sci. 2023, 13(15), 8587; https://doi.org/10.3390/app13158587 - 26 Jul 2023
Cited by 1 | Viewed by 915
Abstract
Cyberattacks are part of the continuous race, where research in computer science both contributes to discovering new threats and vulnerabilities and also mitigates them. When new vulnerabilities are not reported but sold to attackers, they are called “zero-days,” and are particularly difficult to [...] Read more.
Cyberattacks are part of the continuous race, where research in computer science both contributes to discovering new threats and vulnerabilities and also mitigates them. When new vulnerabilities are not reported but sold to attackers, they are called “zero-days,” and are particularly difficult to identify. Modern intrusion detection systems (IDS) that leverage artificial intelligence (AI) and machine learning (ML) are becoming essential in identifying these cyber threats. This study presents the design of an IDS using ML and Explainable AI (XAI) techniques for real-time classification of various detected cyberattacks. By utilizing frameworks such as Apache Kafka and Spark, along with libraries such as Scikit-learn and SHAP, the system identifies and classifies normal or anomalous network traffic in real-time. The XAI offers the IDS the option to explain the rationale behind each classification. The primary aim of this research is to develop a flexible and scalable IDS that can provide clear explanations for its decisions. The second aim is to compare and analyze different ML models to achieve the best results in terms of accuracy, f1, recall, and precision. Random Forest models proposed in this research article obtained the best results in figuring out the key features identified by the XAI model, which includes Ct_state_ttl, Sttl, Dmean, and Dbytes from the UNSW-NB15 dataset. Finally, this research work introduces different machine learning algorithms with superior performance metrics compared to other real-time classification methods. Full article
Show Figures

Figure 1

13 pages, 1297 KiB  
Article
Malware API Calls Detection Using Hybrid Logistic Regression and RNN Model
by Abdulaziz Almaleh, Reem Almushabb and Rahaf Ogran
Appl. Sci. 2023, 13(9), 5439; https://doi.org/10.3390/app13095439 - 27 Apr 2023
Cited by 1 | Viewed by 2019
Abstract
Behavioral malware analysis is a powerful technique used against zero-day and obfuscated malware. Additionally referred to as dynamic malware analysis, this approach employs various methods to achieve enhanced detection. One such method involves using machine learning and deep learning algorithms to learn from [...] Read more.
Behavioral malware analysis is a powerful technique used against zero-day and obfuscated malware. Additionally referred to as dynamic malware analysis, this approach employs various methods to achieve enhanced detection. One such method involves using machine learning and deep learning algorithms to learn from the behavior of malware. However, the task of weight initialization in neural networks remains an active area of research. In this paper, we present a novel hybrid model that utilizes both machine learning and deep learning algorithms to detect malware across various categories. The proposed model achieves this by recognizing the malicious functions performed by the malware, which can be inferred from its API call sequences. Failure to detect these malware instances can result in severe cyberattacks, which pose a significant threat to the confidentiality, privacy, and availability of systems. We rely on a secondary dataset containing API call sequences, and we apply logistic regression to obtain the initial weight that serves as input to the neural network. By utilizing this hybrid approach, our research aims to address the challenges associated with traditional weight initialization techniques and to improve the accuracy and efficiency of malware detection based on API calls. The integration of both machine learning and deep learning algorithms allows the proposed model to capitalize on the strengths of each approach, potentially leading to a more robust and versatile solution to malware detection. Moreover, our research contributes to the ongoing efforts in the field of neural networks, by offering a novel perspective on weight initialization techniques and their impact on the performance of neural networks in the context of behavioral malware analysis. Experimental results using a balanced dataset showed 83% accuracy and a 0.44 loss, which outperformed the baseline model in terms of the minimum loss. The imbalanced dataset’s accuracy was 98%, and the loss was 0.10, which exceeded the state-of-the-art model’s accuracy. This demonstrates how well the suggested model can handle malware classification. Full article
Show Figures

Figure 1

17 pages, 4418 KiB  
Article
A Study on Detection of Malicious Behavior Based on Host Process Data Using Machine Learning
by Ryeobin Han, Kookjin Kim, Byunghun Choi and Youngsik Jeong
Appl. Sci. 2023, 13(7), 4097; https://doi.org/10.3390/app13074097 - 23 Mar 2023
Cited by 5 | Viewed by 1673
Abstract
With the rapid increase in the number of cyber-attacks, detecting and preventing malicious behavior has become more important than ever before. In this study, we propose a method for detecting and classifying malicious behavior in host process data using machine learning algorithms. One [...] Read more.
With the rapid increase in the number of cyber-attacks, detecting and preventing malicious behavior has become more important than ever before. In this study, we propose a method for detecting and classifying malicious behavior in host process data using machine learning algorithms. One of the challenges in this study is dealing with high-dimensional and imbalanced data. To address this, we first preprocessed the data using Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) to reduce the dimensions of the data and visualize the distribution. We then used the Adaptive Synthetic (ADASYN) and Synthetic Minority Over-sampling Technique (SMOTE) to handle the imbalanced data. We trained and evaluated the performance of the models using various machine learning algorithms, such as K-Nearest Neighbor, Naive Bayes, Random Forest, Autoencoder, and Memory-Augmented Deep Autoencoder (MemAE). Our results show that the preprocessed datasets using both ADASYN and SMOTE significantly improved the performance of all models, achieving higher precision, recall, and F1-Score values. Notably, the best performance was obtained when using the preprocessed dataset (SMOTE) with the MemAE model, yielding an F1-Score of 1.00. The evaluation was also conducted by measuring the Area Under the Receiver Operating Characteristic Curve (AUROC), which showed that all models performed well with an AUROC of over 90%. Our proposed method provides a promising approach for detecting and classifying malicious behavior in host process data using machine learning algorithms, which can be used in various fields such as anomaly detection and medical diagnosis. Full article
Show Figures

Figure 1

16 pages, 1187 KiB  
Article
Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files
by Fouad Trad, Ali Hussein and Ali Chehab
Appl. Sci. 2023, 13(6), 3472; https://doi.org/10.3390/app13063472 - 08 Mar 2023
Viewed by 1195
Abstract
The Portable Document Format (PDF) is considered one of the most popular formats due to its flexibility and portability across platforms. Although people have used machine learning techniques to detect malware in PDF files, the problem with these models is their weak resistance [...] Read more.
The Portable Document Format (PDF) is considered one of the most popular formats due to its flexibility and portability across platforms. Although people have used machine learning techniques to detect malware in PDF files, the problem with these models is their weak resistance against evasion attacks, which constitutes a major security threat. The goal of this study is to introduce three machine learning-based systems that enhance malware detection in the presence of evasion attacks by substantially relying on evasive data to train malware and evasion detection models. To evaluate the robustness of the proposed systems, we used two testing datasets, a real dataset containing around 100,000 PDF samples and an evasive dataset containing 500,000 samples that we generated. We compared the results of the proposed systems to a baseline model that was not adversarially trained. When tested against the evasive dataset, the proposed systems provided an increase of around 80% in the f1-score compared to the baseline. This proves the value of the proposed approaches towards the ability to deal with evasive attacks. Full article
Show Figures

Figure 1

Review

Jump to: Research

19 pages, 792 KiB  
Review
Analysis of Machine Learning Techniques for Information Classification in Mobile Applications
by Sandra Pérez Arteaga, Ana Lucila Sandoval Orozco and Luis Javier García Villalba
Appl. Sci. 2023, 13(9), 5438; https://doi.org/10.3390/app13095438 - 27 Apr 2023
Cited by 4 | Viewed by 2051
Abstract
Due to the daily use of mobile technologies, we live in constant connection with the world through the Internet. Technological innovations in smart devices have allowed us to carry out everyday activities such as communicating, working, studying or using them as a means [...] Read more.
Due to the daily use of mobile technologies, we live in constant connection with the world through the Internet. Technological innovations in smart devices have allowed us to carry out everyday activities such as communicating, working, studying or using them as a means of entertainment, which has led to smartphones displacing computers as the most important device connected to the Internet today, causing users to demand smarter applications or functionalities that allow them to meet their needs. Artificial intelligence has been a major innovation in information technology that is transforming the way users use smart devices. Using applications that make use of artificial intelligence has revolutionised our lives, from making predictions of possible words based on typing in a text box, to being able to unlock devices through pattern recognition. However, these technologies face problems such as overheating and battery drain due to high resource consumption, low computational capacity, memory limitations, etc. This paper reviews the most important artificial intelligence algorithms for mobile devices, emphasising the challenges and problems that can arise when implementing these technologies in low-resource devices. Full article
Show Figures

Figure 1

Back to TopTop