Hybrid Data Processing by Combining Machine Learning, Expert, Safety and Security

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 30 June 2024 | Viewed by 9250

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Digital Science and Technology, Macau Millennium College, Macau, China
Interests: software engineering; data processing and analysis; system modeling

E-Mail Website
Guest Editor
Institute for Data Engineering and Sciences, University of Saint Joseph, Macau, China
Interests: structure learning; bayesian networks; artificial intelligence

E-Mail Website
Guest Editor
School of Computer & Information Technology, Beijing Jiaotong University, Beijing 100044, China
Interests: data and knowledge engineering; artificial intelligence and applications

E-Mail Website
Guest Editor
Faculty of Data Science, City University of Macau, Macau 999078, China
Interests: blockchain; federal learning; attribute encryption
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data contain important information and knowlege that can advance the human endeavour. With the progress of science and technology, there are more and more requirements for data mining and data analysis. At present, machine learning and deep learning are often used in data processing, which have achieved good results. In some cases, experienced experts can argue and game to reach a balance with machine assisting and deep learning to save resources and improve the results. While mining and analyzing data, the issues of data safety, data security, and data privacy also need to be much concerned widely.

Therefore, how to integrate hybrid data intelligence by combining expert systems, safety, and security reasonably is a problem worth studying.

All related theoretical and applied works can be published in the Special Issue, and these works should involve the above areas and the combination of machine learning with other issues.

Original research articles and comments are welcome in this Special Issue. Research fields may include (but are not limited to) the following: machine learning, data mining, data safety, data security, etc. The goal of the Special Issue is to promote combining machine learning with expert systems, data safety, and security.

We look forward to receiving your contribution.

Prof. Dr. Zhiming Cai
Prof. Dr. Wencai Du
Dr. Zuobin Ying
Prof. Dr. Zhihai Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning 
  • data intelligence 
  • data safety 
  • data security 
  • expert system

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 4716 KiB  
Article
SSGCL: Simple Social Recommendation with Graph Contrastive Learning
by Zhihua Duan, Chun Wang and Wending Zhong
Mathematics 2024, 12(7), 1107; https://doi.org/10.3390/math12071107 - 07 Apr 2024
Viewed by 355
Abstract
As user–item interaction information is typically limited, collaborative filtering (CF)-based recommender systems often suffer from the data sparsity issue. To address this issue, recent recommender systems have turned to graph neural networks (GNNs) due to their superior performance in capturing high-order relationships. Furthermore, [...] Read more.
As user–item interaction information is typically limited, collaborative filtering (CF)-based recommender systems often suffer from the data sparsity issue. To address this issue, recent recommender systems have turned to graph neural networks (GNNs) due to their superior performance in capturing high-order relationships. Furthermore, some of these GNN-based recommendation models also attempt to incorporate other information. They either extract self-supervised signals to mitigate the data sparsity problem or employ social information to assist with learning better representations under a social recommendation setting. However, only a few methods can take full advantage of these different aspects of information. Based on some testing, we believe most of these methods are complex and redundantly designed, which may lead to sub-optimal results. In this paper, we propose SSGCL, which is a recommendation system model that utilizes both social information and self-supervised information. We design a GNN-based propagation strategy that integrates social information with interest information in a simple yet effective way to learn user–item representations for recommendations. In addition, a specially designed contrastive learning module is employed to take advantage of the self-supervised signals for a better user–item representation distribution. The contrastive learning module is jointly optimized with the recommendation module to benefit the final recommendation result. Experiments on several benchmark data sets demonstrate the significant improvement in performance achieved by our model when compared with baseline models. Full article
Show Figures

Figure 1

13 pages, 1061 KiB  
Article
GA-CatBoost-Weight Algorithm for Predicting Casualties in Terrorist Attacks: Addressing Data Imbalance and Enhancing Performance
by Yuxiang He, Baisong Yang and Chiawei Chu
Mathematics 2024, 12(6), 818; https://doi.org/10.3390/math12060818 - 11 Mar 2024
Viewed by 439
Abstract
Terrorism poses a significant threat to international peace and stability. The ability to predict potential casualties resulting from terrorist attacks, based on specific attack characteristics, is vital for protecting the safety of innocent civilians. However, conventional data sampling methods struggle to effectively address [...] Read more.
Terrorism poses a significant threat to international peace and stability. The ability to predict potential casualties resulting from terrorist attacks, based on specific attack characteristics, is vital for protecting the safety of innocent civilians. However, conventional data sampling methods struggle to effectively address the challenge of data imbalance in textual features. To tackle this issue, we introduce a novel algorithm, GA-CatBoost-Weight, designed for predicting whether terrorist attacks will lead to casualties among innocent civilians. Our approach begins with feature selection using the RF-RFE method, followed by leveraging the CatBoost algorithm to handle diverse modal features comprehensively and to mitigate data imbalance. Additionally, we employ Genetic Algorithm (GA) to finetune hyperparameters. Experimental validation has demonstrated the superior performance of our method, achieving a sensitivity of 92.68% and an F1 score of 90.99% with fewer iterations. To the best of our knowledge, our study is the pioneering research that applies CatBoost to address the prediction of terrorist attack outcomes. Full article
Show Figures

Figure 1

21 pages, 1178 KiB  
Article
CLG: Contrastive Label Generation with Knowledge for Few-Shot Learning
by Han Ma, Baoyu Fan, Benjamin K. Ng and Chan-Tong Lam
Mathematics 2024, 12(3), 472; https://doi.org/10.3390/math12030472 - 01 Feb 2024
Viewed by 579
Abstract
Training large-scale models needs big data. However, the few-shot problem is difficult to resolve due to inadequate training data. It is valuable to use only a few training samples to perform the task, such as using big data for application scenarios due to [...] Read more.
Training large-scale models needs big data. However, the few-shot problem is difficult to resolve due to inadequate training data. It is valuable to use only a few training samples to perform the task, such as using big data for application scenarios due to cost and resource problems. So, to tackle this problem, we present a simple and efficient method, contrastive label generation with knowledge for few-shot learning (CLG). Specifically, we: (1) Propose contrastive label generation to align the label with data input and enhance feature representations; (2) Propose a label knowledge filter to avoid noise during injection of the explicit knowledge into the data and label; (3) Employ label logits mask to simplify the task; (4) Employ multi-task fusion loss to learn different perspectives from the training set. The experiments demonstrate that CLG achieves an accuracy of 59.237%, which is more than about 3% in comparison with the best baseline. It shows that CLG obtains better features and gives the model more information about the input sentences to improve the classification ability. Full article
Show Figures

Figure 1

20 pages, 522 KiB  
Article
Predicting Typhoon Flood in Macau Using Dynamic Gaussian Bayesian Network and Surface Confluence Analysis
by Shujie Zou, Chiawei Chu, Weijun Dai, Ning Shen, Jia Ren and Weiping Ding
Mathematics 2024, 12(2), 340; https://doi.org/10.3390/math12020340 - 19 Jan 2024
Viewed by 897
Abstract
A typhoon passing through or making landfall in a coastal city may result in seawater intrusion and continuous rainfall, which may cause urban flooding. The urban flood disaster caused by a typhoon is a dynamic process that changes over time, and a dynamic [...] Read more.
A typhoon passing through or making landfall in a coastal city may result in seawater intrusion and continuous rainfall, which may cause urban flooding. The urban flood disaster caused by a typhoon is a dynamic process that changes over time, and a dynamic Gaussian Bayesian network (DGBN) is used to model the time series events in this paper. The scene data generated by each typhoon are different, which means that each typhoon has different characteristics. This paper establishes multiple DGBNs based on the historical data of Macau flooding caused by multiple typhoons, and similar analysis is made between the scene data related to the current flooding to be predicted and the scene data of historical flooding. The DGBN most similar to the scene characteristics of the current flooding is selected as the predicting network of the current flooding. According to the topography, the influence of the surface confluence is considered, and the Manning formula analysis method is proposed. The Manning formula is combined with the DGBN to obtain the final prediction model, DGBN-m, which takes into account the effects of time series and non-time-series factors. The flooding data provided by the Macau Meteorological Bureau are used to carry out experiments, and it is proved that the proposed model can predict the flooding depth well in a specific area of Macau under the condition of a small amount of data and that the best predicting accuracy can reach 84%. Finally, generalization analysis is performed to further confirm the validity of the proposed model. Full article
Show Figures

Figure 1

13 pages, 560 KiB  
Article
Healthcare Cost Prediction Based on Hybrid Machine Learning Algorithms
by Shujie Zou, Chiawei Chu, Ning Shen and Jia Ren
Mathematics 2023, 11(23), 4778; https://doi.org/10.3390/math11234778 - 27 Nov 2023
Cited by 1 | Viewed by 1176
Abstract
Healthcare cost is an issue of concern right now. While many complex machine learning algorithms have been proposed to analyze healthcare cost and address the shortcomings of linear regression and reliance on expert analyses, these algorithms do not take into account whether each [...] Read more.
Healthcare cost is an issue of concern right now. While many complex machine learning algorithms have been proposed to analyze healthcare cost and address the shortcomings of linear regression and reliance on expert analyses, these algorithms do not take into account whether each characteristic variable contained in the healthcare data has a positive effect on predicting healthcare cost. This paper uses hybrid machine learning algorithms to predict healthcare cost. First, network structure learning algorithms (a score-based algorithm, constraint-based algorithm, and hybrid algorithm) for a Conditional Gaussian Bayesian Network (CGBN) are used to learn the isolated characteristic variables in healthcare data without changing the data properties (i.e., discrete or continuous). Then, the isolated characteristic variables are removed from the original data and the remaining data used to train regression algorithms. Two public healthcare datasets are used to test the performance of the proposed hybrid machine learning algorithm model. Experiments show that when compared to popular single machine learning algorithms (Long Short Term Memory, Random Forest, etc.) the proposed scheme can obtain similar or higher prediction accuracy with a reduced amount of data. Full article
Show Figures

Figure 1

19 pages, 2159 KiB  
Article
Enhancing the Security and Privacy in the IoT Supply Chain Using Blockchain and Federated Learning with Trusted Execution Environment
by Linkai Zhu, Shanwen Hu, Xiaolian Zhu, Changpu Meng and Maoyi Huang
Mathematics 2023, 11(17), 3759; https://doi.org/10.3390/math11173759 - 01 Sep 2023
Cited by 1 | Viewed by 1190
Abstract
Federated learning has emerged as a promising technique for the Internet of Things (IoT) in various domains, including supply chain management. It enables IoT devices to collaboratively learn without exposing their raw data, ensuring data privacy. However, federated learning faces the threats of [...] Read more.
Federated learning has emerged as a promising technique for the Internet of Things (IoT) in various domains, including supply chain management. It enables IoT devices to collaboratively learn without exposing their raw data, ensuring data privacy. However, federated learning faces the threats of local data tampering and upload process attacks. This paper proposes an innovative framework that leverages Trusted Execution Environment (TEE) and blockchain technology to address the data security and privacy challenges in federated learning for IoT supply chain management. Our framework achieves the security of local data computation and the tampering resistance of data update uploads using TEE and the blockchain. We adopt Intel Software Guard Extensions (SGXs) as the specific implementation of TEE, which can guarantee the secure execution of local models on SGX-enabled processors. We also use consortium blockchain technology to build a verification network and consensus mechanism, ensuring the security and tamper resistance of the data upload and aggregation process. Finally, each cluster can obtain the aggregated parameters from the blockchain. To evaluate the performance of our proposed framework, we conducted several experiments with different numbers of participants and different datasets and validated the effectiveness of our scheme. We tested the final global model obtained from federated training on a test dataset and found that increasing both the number of iterations and the number of participants improves its accuracy. For instance, it reaches 94% accuracy with one participant and five iterations and 98.5% accuracy with ten participants and thirty iterations. Full article
Show Figures

Figure 1

16 pages, 973 KiB  
Article
Parallel Dense Video Caption Generation with Multi-Modal Features
by Xuefei Huang, Ka-Hou Chan, Wei Ke and Hao Sheng
Mathematics 2023, 11(17), 3685; https://doi.org/10.3390/math11173685 - 26 Aug 2023
Cited by 1 | Viewed by 1094
Abstract
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame sequences, resulting [...] Read more.
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame sequences, resulting in caption generation that is highly dependent on which objects have been detected. This work proposes a parallel-based dense video captioning method that can simultaneously address the mutual constraint between event proposals and captions. Additionally, a deformable Transformer framework is introduced to reduce or free manual threshold of hyperparameters in such methods. An information transfer station is also added as a representation organisation, which receives the hidden features extracted from a frame and implicitly generates multiple event proposals. The proposed method also adopts LSTM (Long short-term memory) with deformable attention as the main layer for caption generation. Experimental results show that the proposed method outperforms other methods in this area to a certain degree on the ActivityNet Caption dataset, providing competitive results. Full article
Show Figures

Figure 1

11 pages, 4951 KiB  
Article
Optimal Multi-Attribute Auctions Based on Multi-Scale Loss Network
by Zefeng Zhao, Haohao Cai, Huawei Ma, Shujie Zou and Chiawei Chu
Mathematics 2023, 11(14), 3240; https://doi.org/10.3390/math11143240 - 24 Jul 2023
Viewed by 871
Abstract
There is a strong demand for multi-attribute auctions in real-world scenarios for non-price attributes that allow participants to express their preferences and the item’s value. However, this also makes it difficult to perform calculations with incomplete information, as a single attribute—price—no longer determines [...] Read more.
There is a strong demand for multi-attribute auctions in real-world scenarios for non-price attributes that allow participants to express their preferences and the item’s value. However, this also makes it difficult to perform calculations with incomplete information, as a single attribute—price—no longer determines the revenue. At the same time, the mechanism must satisfy individual rationality (IR) and incentive compatibility (IC). This paper proposes an innovative dual network to solve these problems. A shared MLP module is constructed to extract bidder features, and multiple-scale loss is used to determine network status and update. The method was tested on real and extended cases, showing that the approach effectively improves the auctioneer’s revenue without compromising the bidder. Full article
Show Figures

Figure 1

22 pages, 1082 KiB  
Article
From Replay to Regeneration: Recovery of UDP Flood Network Attack Scenario Based on SDN
by Yichuan Wang, Junxia Ding, Tong Zhang, Yeqiu Xiao and Xinhong Hei
Mathematics 2023, 11(8), 1897; https://doi.org/10.3390/math11081897 - 17 Apr 2023
Cited by 1 | Viewed by 1334
Abstract
In recent years, various network attacks have emerged. These attacks are often recorded in the form of Pcap data, which contains many attack details and characteristics that cannot be analyzed through traditional methods alone. Therefore, restoring the network attack scenario through scene reconstruction [...] Read more.
In recent years, various network attacks have emerged. These attacks are often recorded in the form of Pcap data, which contains many attack details and characteristics that cannot be analyzed through traditional methods alone. Therefore, restoring the network attack scenario through scene reconstruction to achieve data regeneration has become an important entry point for detecting and defending against network attacks. However, current network attack scenarios mainly reproduce the attacker’s attack steps by building a sequence collection of attack scenarios, constructing an attack behavior diagram, or simply replaying the captured network traffic. These methods still have shortcomings in terms of traffic regeneration. To address this limitation, this paper proposes an SDN-based network attack scenario recovery method. By parsing Pcap data and utilizing network topology reconstruction, probability, and packet sequence models, network traffic data can be regenerated. The experimental results show that the proposed method is closer to the real network, with a higher similarity between the reconstructed and actual attack scenarios. Additionally, this method allows for adjusting the intensity of the network attack and the generated topology nodes, which helps network defenders better understand the attackers’ posture and analyze and formulate corresponding security strategies. Full article
Show Figures

Figure 1

Back to TopTop