Machine Learning and Data Analysis

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: closed (31 May 2023) | Viewed by 43516

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Networks and Systems, Silesian University of Technology, 44-100 Gliwice, Poland
Interests: image processing; data mining; machine learning; pattern recognition; rough set theory; biclustering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

There is no need to convince anyone about the huge influence of theoretical models of machine learning or data analysis techniques on our present way of living. In particular, the last year was an unparalleled occasion for solving the new unexpected challenges connected with the spread of the CoViD-19 pandemic—modelling it, predicting the scale and course, and efforts to limit it scale. Moreover, the resulting large- and mid-scale lockdowns created rapid changes in our habits, ways of working, and ways of living, which influenced the data that describe these activities.

This Special Issue is devoted to the problems of machine learning and data analysis in general, all interesting papers considering this area of computer science (ML and DA) are invited for submission.

The topics of the Special Issue include, but are not limited to, the following:

  • Supervised learning
  • Unsupervised learning
  • Time series analysis
  • Descriptive analysis
  • Biclustering
  • Genetic algorithms
  • ML and DM applications
  • Artificial neural networks
  • Deep learning
  • Decision support systems
  • Anomaly detection
  • Image analysis
  • Pattern recognition

Dr. Marcin Michalak
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • data analysis
  • process modelling
  • time series prediction

Related Special Issue

Published Papers (20 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

4 pages, 180 KiB  
Editorial
Special Issue: Machine Learning and Data Analysis
by Marcin Michalak
Symmetry 2023, 15(7), 1397; https://doi.org/10.3390/sym15071397 - 11 Jul 2023
Viewed by 806
Abstract
This Special Issue contains 2 reviews and 17 research papers related to the following topics:Time series forecasting [1,2,3,4,5];Image analysis [6];Medical applications [7,8];Knowledge graph analysis [9,10];Cybersecurity [11,12,13];Traffic analysis [14,15];Agriculture [16];Environmental data analysis [...] Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)

Research

Jump to: Editorial, Review

23 pages, 2535 KiB  
Article
JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph
by Guoyan Xu, Qirui Zhang, Du Yu, Sijun Lu and Yuwei Lu
Symmetry 2023, 15(5), 1056; https://doi.org/10.3390/sym15051056 - 10 May 2023
Cited by 2 | Viewed by 1296
Abstract
The purpose of knowledge representation learning is to learn the vector representation of research objects projected by a matrix in low-dimensional vector space and explore the relationship between embedded objects in low-dimensional space. However, most methods only consider the triple structure in the [...] Read more.
The purpose of knowledge representation learning is to learn the vector representation of research objects projected by a matrix in low-dimensional vector space and explore the relationship between embedded objects in low-dimensional space. However, most methods only consider the triple structure in the knowledge graph and ignore the additional information related to the triple, especially the text description information. In this paper, we propose a knowledge graph representation model with a symmetric architecture called Joint Knowledge Representation Learning of Text Description and Knowledge Graph (JKRL), which models the entity description and relationship description of the triple structure for joint representation learning of knowledge and balances the contribution of the triple structure and text description in the process of vector learning. First, we adopt the TransE model to learn the structural vector representations of entities and relations, and then use a CNN model to encode the entity description to obtain the text representation of the entity. To semantically encode the relation descriptions, we designed an Attention-Bi-LSTM text encoder, which introduces an attention mechanism into the Bi-LSTM model to calculate the semantic relevance between each word in the sentence and different relations. In addition, we also introduce position features into word features in order to better encode word order information. Finally, we define a joint evaluation function to learn the joint representation of structural and textual representations. The experiments show that compared with the baseline methods, our model achieves the best performance on both Mean Rank and Hits@10 metrics. The accuracy of the triple classification task on the FB15K dataset reached 93.2%. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

18 pages, 7372 KiB  
Article
Ollivier–Ricci Curvature Based Spatio-Temporal Graph Neural Networks for Traffic Flow Forecasting
by Xing Han, Guowei Zhu, Ling Zhao, Ronghua Du, Yuhan Wang, Zhe Chen, Yang Liu and Silu He
Symmetry 2023, 15(5), 995; https://doi.org/10.3390/sym15050995 - 27 Apr 2023
Cited by 3 | Viewed by 1811
Abstract
Traffic flow forecasting is a basic function of intelligent transportation systems, and the accuracy of prediction is of great significance for traffic management and urban planning. The main difficulty of traffic flow predictions is that there is complex underlying spatiotemporal dependence in traffic [...] Read more.
Traffic flow forecasting is a basic function of intelligent transportation systems, and the accuracy of prediction is of great significance for traffic management and urban planning. The main difficulty of traffic flow predictions is that there is complex underlying spatiotemporal dependence in traffic flow; thus, the existing spatiotemporal graph neural network (STGNN) models need to model both temporal dependence and spatial dependence. Graph neural networks (GNNs) are adopted to capture the spatial dependence in traffic flow, which can model the symmetric or asymmetric spatial relations between nodes in the traffic network. The transmission process of traffic features in GNNs is guided by the node-to-node relationship (e.g., adjacency or spatial distance) between nodes, ignoring the spatial dependence caused by local topological constraints in the road network. To further consider the influence of local topology on the spatial dependence of road networks, in this paper, we introduce Ollivier–Ricci curvature information between connected edges in the road network, which is based on optimal transport theory and makes comprehensive use of the neighborhood-to-neighborhood relationship to guide the transmission process of traffic features between nodes in STGNNs. Experiments on real-world traffic datasets show that the models with Ollivier–Ricci curvature information outperforms those based on only node-to-node relationships between nodes by ten percent on average in the RMSE metric. This study indicates that by utilizing complex topological features in road networks, spatial dependence can be captured more sufficiently, further improving the predictive ability of traffic forecasting models. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

27 pages, 9267 KiB  
Article
A Data-Driven Two-Phase Multi-Split Causal Ensemble Model for Time Series
by Zhipeng Ma, Marco Kemmerling, Daniel Buschmann, Chrismarie Enslin, Daniel Lütticke and Robert H. Schmitt
Symmetry 2023, 15(5), 982; https://doi.org/10.3390/sym15050982 - 26 Apr 2023
Cited by 1 | Viewed by 1428
Abstract
Causal inference is a fundamental research topic for discovering the cause–effect relationships in many disciplines. Inferring causality means identifying asymmetric relations between two variables. In real-world systems, e.g., finance, healthcare, and industrial processes, time series data from sensors and other data sources offer [...] Read more.
Causal inference is a fundamental research topic for discovering the cause–effect relationships in many disciplines. Inferring causality means identifying asymmetric relations between two variables. In real-world systems, e.g., finance, healthcare, and industrial processes, time series data from sensors and other data sources offer an especially good basis to infer causal relationships. Therefore, many different time series causal inference algorithms have been proposed in recent years. However, not all algorithms are equally well-suited for a given dataset. For instance, some approaches may only be able to identify linear relationships, while others are applicable for non-linearities. Algorithms further vary in their sensitivity to noise and their ability to infer causal information from coupled vs. non-coupled time series. As a consequence, different algorithms often generate different causal relationships for the same input. In order to achieve a more robust causal inference result, this publication proposes a novel data-driven two-phase multi-split causal ensemble model to combine the strengths of different causality base algorithms. In comparison to existing approaches, the proposed ensemble method reduces the influence of noise through a data partitioning scheme in a first phase. To achieve this, the data are initially divided into several partitions and the base causal inference algorithms are applied to each partition. Subsequently, Gaussian mixture models are used to identify the causal relationships derived from the different partitions that are likely to be valid. In the second phase, the identified relationships from each base algorithm are then merged based on three combination rules. The proposed ensemble approach is evaluated using multiple metrics, among them a newly developed evaluation index for causal ensemble approaches. We perform experiments using three synthetic datasets with different volumes and complexity, which have been specifically designed to test causality detection methods under different circumstances while knowing the ground truth causal relationships. In these experiments, our causality ensemble outperforms each of its base algorithms. In practical applications, the use of the proposed method could hence lead to more robust and reliable causality results. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

16 pages, 3875 KiB  
Article
Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction
by Winoto Chandra, Bambang Suprihatin and Yulia Resti
Symmetry 2023, 15(4), 887; https://doi.org/10.3390/sym15040887 - 09 Apr 2023
Cited by 6 | Viewed by 1837
Abstract
The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict air quality. Unfortunately, this dataset often has many missing observations and imbalanced classes. Both of these problems [...] Read more.
The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict air quality. Unfortunately, this dataset often has many missing observations and imbalanced classes. Both of these problems can affect the performance of the prediction model. In particular, predictions for the minority class are very important because inaccurate predictions can be fatal or cause big losses. Moreover, the missing data may lead to biased results. This paper proposes the single imputation of the median and the multiple imputations of the k-Nearest Neighbor (KNN) regressor to handle missing values of less than or equal to 10% and more than 10%, respectively. At the same time, the SMOTE-Tomek Links address the imbalanced class. These proposed approaches to handle both issues are then used to assess the air quality prediction of the India AQI dataset using Naive Bayes (NB), KNN, and C4.5. The five treatments show that the proposed method of the Median-KNN regressor-SMOTE-Tomek Links is able to improve the performance of the India air quality prediction model. In other words, the proposed method succeeds in overcoming the problems of missing values and class imbalance. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

16 pages, 2375 KiB  
Article
AutoEncoder and LightGBM for Credit Card Fraud Detection Problems
by Haichao Du, Li Lv, An Guo and Hongliang Wang
Symmetry 2023, 15(4), 870; https://doi.org/10.3390/sym15040870 - 06 Apr 2023
Cited by 7 | Viewed by 2878
Abstract
This paper proposes a method called autoencoder with probabilistic LightGBM (AED-LGB) for detecting credit card frauds. This deep learning-based AED-LGB algorithm first extracts low-dimensional feature data from high-dimensional bank credit card feature data using the characteristics of an autoencoder which has a symmetrical [...] Read more.
This paper proposes a method called autoencoder with probabilistic LightGBM (AED-LGB) for detecting credit card frauds. This deep learning-based AED-LGB algorithm first extracts low-dimensional feature data from high-dimensional bank credit card feature data using the characteristics of an autoencoder which has a symmetrical network structure, enhancing the ability of feature representation learning. The credit card fraud dataset comes from a real dataset anonymized by a bank and is highly imbalanced, with normal data far greater than fraud data. For this situation, the smote algorithm is used to resample the data before putting the extracted feature data into LightGBM, making the amount of fraud data and non-fraud data equal. After comparing the resampled and non-resampled data, it was found that the performance of the AED-LGB algorithm was not improved after resampling, and it was concluded that the AED-LGB algorithm is more suitable for imbalanced data. Finally, the AED-LGB algorithm is comparable with other commonly used machine learning algorithms, such as KNN and LightGBM, and it has an overall improvement of 2% in terms of the ACC index compared to LightGBM and KNN. When the threshold is set to 0.2, the MCC index of AED-LGB is 4% higher than that of the second-highest LightGBM algorithm and 30% higher than that of KNN. It shows that the AED-LGB algorithm has higher performance in accuracy, true positive rate, true negative rate, and Matthew’s correlation coefficient. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

16 pages, 2995 KiB  
Article
Bio-Inspired Machine Learning Approach to Type 2 Diabetes Detection
by Marwan Al-Tawil, Basel A. Mahafzah, Arar Al Tawil and Ibrahim Aljarah
Symmetry 2023, 15(3), 764; https://doi.org/10.3390/sym15030764 - 20 Mar 2023
Cited by 10 | Viewed by 1821
Abstract
Type 2 diabetes is a common life-changing disease that has been growing rapidly in recent years. According to the World Health Organization, approximately 90% of patients with diabetes worldwide have type 2 diabetes. Although there is no permanent cure for type 2 diabetes, [...] Read more.
Type 2 diabetes is a common life-changing disease that has been growing rapidly in recent years. According to the World Health Organization, approximately 90% of patients with diabetes worldwide have type 2 diabetes. Although there is no permanent cure for type 2 diabetes, this disease needs to be detected at an early stage to provide prognostic support to allied health professionals and develop an effective prevention plan. This can be accomplished by analyzing medical datasets using data mining and machine-learning techniques. Due to their efficiency, metaheuristic algorithms are now utilized in medical datasets for detecting chronic diseases, with better results than traditional methods. The main goal is to improve the performance of the existing approaches for the detection of type 2 diabetes. A bio-inspired metaheuristic algorithm called cuttlefish was used to select the essential features in the medical data preprocessing stage. The performance of the proposed approach was compared to that of a well-known bio-inspired metaheuristic feature selection algorithm called the genetic algorithm. The features selected from the cuttlefish and genetic algorithms were used with different classifiers. The implementation was applied to two datasets: the Pima Indian diabetes dataset and the hospital Frankfurt diabetes dataset; generally, these datasets are asymmetry, but some of the features in these datasets are close to symmetry. The results show that the cuttlefish algorithm has better accuracy rates, particularly when the number of instances in the dataset increases. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

29 pages, 5993 KiB  
Article
Short-Term Photovoltaic Power Forecasting Based on a Novel Autoformer Model
by Yuanshao Huang and Yonghong Wu
Symmetry 2023, 15(1), 238; https://doi.org/10.3390/sym15010238 - 15 Jan 2023
Cited by 4 | Viewed by 1909
Abstract
Deep learning techniques excel at capturing and understanding the symmetry inherent in data patterns and non-linear properties of photovoltaic (PV) power, therefore they achieve excellent performance on short-term PV power forecasting. In order to produce more precise and detailed forecasting results, this research [...] Read more.
Deep learning techniques excel at capturing and understanding the symmetry inherent in data patterns and non-linear properties of photovoltaic (PV) power, therefore they achieve excellent performance on short-term PV power forecasting. In order to produce more precise and detailed forecasting results, this research suggests a novel Autoformer model with De-Stationary Attention and Multi-Scale framework (ADAMS) for short-term PV power forecasting. In this approach, the multi-scale framework is applied to the Autoformer model to capture the inter-dependencies and specificities of each scale. Furthermore, the de-stationary attention is incorporated into an auto-correlation mechanism for more efficient non-stationary information extraction. Based on the operational data from a 1058.4 kW PV facility in Central Australia, the ADAMS model and the other six baseline models are compared with 5 min and 1 h temporal resolution PV power data predictions. The results show in terms of four performance measurements, the proposed method can handle the task of projecting short-term PV output more effectively than other methods. Taking the result of predicting the PV energy in the next 24 h based on the 1 h resolution data as an example, MSE is 0.280, MAE is 0.302, RMSE is 0.529, and adjusted R-squared is 0.824. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

15 pages, 2261 KiB  
Article
Increasing the Accuracy of Soil Nutrient Prediction by Improving Genetic Algorithm Backpropagation Neural Networks
by Yanqing Liu, Cuiqing Jiang, Cuiping Lu, Zhao Wang and Wanliu Che
Symmetry 2023, 15(1), 151; https://doi.org/10.3390/sym15010151 - 04 Jan 2023
Cited by 5 | Viewed by 1448
Abstract
Soil nutrient prediction has been eliciting increasing attention in agricultural production. Backpropagation (BP) neural networks have demonstrated remarkable ability in many prediction scenarios. However, directly utilizing BP neural networks in soil nutrient prediction may not yield promising results due to the random assignment [...] Read more.
Soil nutrient prediction has been eliciting increasing attention in agricultural production. Backpropagation (BP) neural networks have demonstrated remarkable ability in many prediction scenarios. However, directly utilizing BP neural networks in soil nutrient prediction may not yield promising results due to the random assignment of initial weights and thresholds and the tendency to fall into local extreme points. In this study, a BP neural network model optimized by an improved genetic algorithm (IGA) was proposed to predict soil nutrient time series with high accuracy. First, the crossover and mutation operations of the genetic algorithm (GA) were improved. Next, the IGA was used to optimize the BP model. The symmetric nature of the model lies in its feedforward and feedback connections, i.e., the same weights must be used for the forward and backward passes. An empirical evaluation was performed using annual soil nutrient data from China. Soil pH, total nitrogen, organic matter, fast-acting potassium, and effective phosphorus were selected as evaluation indicators. The prediction results of the IGA–BP, GA–BP, and BP neural network models were compared and analyzed. For the IGA–BP prediction model, the coefficient of determination for soil pH was 0.8, while those for total nitrogen, organic matter, fast-acting potassium, and effective phosphorus were all greater than 0.98, exhibiting a strong generalization ability. The root-mean-square errors of the IGA–BP prediction models were reduced to 50% of the BP models. The results indicated that the IGA–BP method can accurately predict soil nutrient content for future time series. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

26 pages, 4282 KiB  
Article
Determination of Air Traffic Complexity Most Influential Parameters Based on Machine Learning Models
by Francisco Pérez Moreno, Víctor Fernando Gómez Comendador, Raquel Delgado-Aguilera Jurado, María Zamarreño Suárez, Dominik Janisch and Rosa María Arnaldo Valdés
Symmetry 2022, 14(12), 2629; https://doi.org/10.3390/sym14122629 - 12 Dec 2022
Cited by 4 | Viewed by 1839
Abstract
Today, aircraft demand is exceeding the capacity of the Air Traffic Control (ATC) system. As a result, airspace is becoming a very complex environment to control. The complexity of airspace is thus closely related to the workload of controllers and is a topic [...] Read more.
Today, aircraft demand is exceeding the capacity of the Air Traffic Control (ATC) system. As a result, airspace is becoming a very complex environment to control. The complexity of airspace is thus closely related to the workload of controllers and is a topic of great interest. The major concern is that variables that are related to complexity are currently recognised, but there is still a debate about how to define complexity. This paper attempts to define which variables determine airspace complexity. To do so, a novel methodology based on the use of machine learning models is used. In this way, it tries to overcome one of the main disadvantages of the current complexity models: the subjectivity of the models based on expert opinion. This study has determined that the main indicator that defines complexity is the number of aircraft in the sector, together with the occupancy of the traffic flows and the vertical distribution of aircraft. This research can help numerous studies on both air traffic complexity assessment and Air Traffic Controller (ATCO) workload studies. This model can also help to study the behaviour of air traffic and to verify that there is symmetry in structure and the origin of the complexity in the different ATC sectors. This would have a great benefit on ATM, as it would allow progress to be made in solving the existing capacity problem. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

25 pages, 3032 KiB  
Article
Flow-Based IDS Features Enrichment for ICMPv6-DDoS Attacks Detection
by Omar E. Elejla, Mohammed Anbar, Shady Hamouda, Bahari Belaton, Taief Alaa Al-Amiedy and Iznan H. Hasbullah
Symmetry 2022, 14(12), 2556; https://doi.org/10.3390/sym14122556 - 03 Dec 2022
Cited by 3 | Viewed by 1649
Abstract
Internet Protocol version 6 (IPv6) and its core protocol, Internet Control Message Protocol version 6 (ICMPv6), need to be secured from attacks, such as Denial of Service (DoS) and Distributed DoS (DDoS), in order to be reliable for deployment. Several Intrusion Detection Systems [...] Read more.
Internet Protocol version 6 (IPv6) and its core protocol, Internet Control Message Protocol version 6 (ICMPv6), need to be secured from attacks, such as Denial of Service (DoS) and Distributed DoS (DDoS), in order to be reliable for deployment. Several Intrusion Detection Systems (IDSs) have been built and proposed to detect ICMPv6-based DoS and DDoS attacks. However, these IDSs suffer from several drawbacks, such as the inability to detect novel attacks and a low detection accuracy due to their reliance on packet-based traffic representation. Furthermore, the existing IDSs that rely on flow-based traffic representation use simple heuristics features that do not contribute to detecting ICMPv6-based DoS and DDoS attacks. This paper proposes a flow-based IDS by enriching the existing features with a set of new features to improve the detection accuracy. The flow consists of packets with similar attributes (i.e., packets with the same source and destination IP address) and features that can differentiate between normal and malicious traffic behavior, such as the source IP address’s symmetry and the whole flow’s symmetry. The experimental results reveal that the enriched features significantly improved the IDS’s detection accuracy by 16.02% and that the false positive rate decreased by 19.17% compared with state-of-the-art IDSs. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

18 pages, 1321 KiB  
Article
PeerAmbush: Multi-Layer Perceptron to Detect Peer-to-Peer Botnet
by Arkan Hammoodi Hasan Kabla, Achmad Husni Thamrin, Mohammed Anbar, Selvakumar Manickam and Shankar Karuppayah
Symmetry 2022, 14(12), 2483; https://doi.org/10.3390/sym14122483 - 23 Nov 2022
Cited by 5 | Viewed by 1919
Abstract
Due to emerging internet technologies that mostly depend on the decentralization concept, such as cryptocurrencies, cyber attackers also use the decentralization concept to develop P2P botnets. P2P botnets are considered one of the most serious and challenging threats to internet infrastructure security. Consequently, [...] Read more.
Due to emerging internet technologies that mostly depend on the decentralization concept, such as cryptocurrencies, cyber attackers also use the decentralization concept to develop P2P botnets. P2P botnets are considered one of the most serious and challenging threats to internet infrastructure security. Consequently, several open issues still need to be addressed, such as improving botnet intrusion detection systems, because botnet detection is essentially a confrontational problem. This paper presents PeerAmbush, a novel approach for detecting P2P botnets using, for the first time, one of the most effective deep learning techniques, which is the Multi-Layer Perceptron, with certain parameter settings to detect this type of botnet, unlike most current research, which is entirely based on machine learning techniques. The reason for employing machine learning/deep learning techniques, besides data analysis, is because the bots under the same botnet have a symmetrical behavior, and that makes them recognizable compared to benign network traffic. The PeerAmbush also takes the challenge of detecting P2P botnets with fewer selected features compared to the existing related works by proposing a novel feature engineering method based on Best First Union (BFU). The proposed approach showed considerable results, with a very high detection accuracy of 99.9%, with no FPR. The experimental results showed that PeerAmbush is a promising approach, and we look forward to building on it to develop better security defenses. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

16 pages, 600 KiB  
Article
Personalized Relationships-Based Knowledge Graph for Recommender Systems with Dual-View Items
by Zhifeng Liu, Xianzhan Zhong and Conghua Zhou
Symmetry 2022, 14(11), 2386; https://doi.org/10.3390/sym14112386 - 11 Nov 2022
Cited by 1 | Viewed by 1532
Abstract
The knowledge graph has received a lot of interest in the field of recommender systems as side information because it can address the sparsity and cold start issues associated with collaborative filtering-based recommender systems. However, when incorporating entities from a knowledge graph to [...] Read more.
The knowledge graph has received a lot of interest in the field of recommender systems as side information because it can address the sparsity and cold start issues associated with collaborative filtering-based recommender systems. However, when incorporating entities from a knowledge graph to represent semantic information, most current KG-based recommendation methods are unaware of the relationships between these users and items. As such, the learned semantic information representation of users and items cannot fully reflect the connectivity between users and items. In this paper, we present the PRKG-DI symmetry model, a Personalized Relationships-based Knowledge Graph for recommender systems with Dual-view Items that explores user-item relatedness by mining associated entities in the KG from user-oriented entity view and item-oriented entity view to augment item semantic information. Specifically, PRKG-DI utilizes a heterogeneous propagation strategy to gather information on higher-order user-item interactions and an attention mechanism to generate the weighted representation of entities. Moreover, PRKG-DI provides a score feature as a filter for individualized relationships to evaluate users’ potential interests. The empirical results demonstrate that our approach significantly outperforms several state-of-the-art baselines by 1.6%, 2.1%, and 0.8% on AUC, and 1.8%, 2.3%, and 0.8% on F1 when applied to three real-world scenarios for music, movie, and book recommendations, respectively. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

16 pages, 3955 KiB  
Article
ConvLSTM Coupled Economics Indicators Quantitative Trading Decision Model
by Yong Qi, Hefeifei Jiang, Shaoxuan Li and Junyu Cao
Symmetry 2022, 14(9), 1896; https://doi.org/10.3390/sym14091896 - 10 Sep 2022
Cited by 2 | Viewed by 1642
Abstract
Time series prediction methods based on deep learning have been widely used in quantitative trading. However, the price of virtual currency represented by Bitcoin has random fluctuation characteristics, which is extremely misleading for time series prediction. In this paper, a virtual currency quantitative [...] Read more.
Time series prediction methods based on deep learning have been widely used in quantitative trading. However, the price of virtual currency represented by Bitcoin has random fluctuation characteristics, which is extremely misleading for time series prediction. In this paper, a virtual currency quantitative trading model is established, which uses a convolution long short term memory (ConvLSTM) deep learning method to predict the transaction price, and uses the evaluation model composed of Chandler momentum oscillator (CMO), percentage price oscillator (PPO), stop and reverse(SAR) and other economic indicators to make further decisions. The model quantitatively classifies the random wandering characteristics by fusing economic indicators and extracts the symmetric economic laws among them, making full use of deep learning methods to extract spatial and temporal features within the data. The 2016–2021 Bitcoin value dataset published on Kaggle was used for simulated investment. The results show that compared with other existing decision models, it shows better performance and robustness, and shows good stability in dealing with the interdependence of long-term and short-term data. Our work provides a new idea for short-term prediction of long time series data affected by multiple complex factors: coupling deep learning methods with prior knowledge to complete prediction and decision making. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

21 pages, 1807 KiB  
Article
Computational Study of Methods for Determining the Elasticity of Red Blood Cells Using Machine Learning
by Samuel Molčan, Monika Smiešková, Hynek Bachratý and Katarína Bachratá
Symmetry 2022, 14(8), 1732; https://doi.org/10.3390/sym14081732 - 19 Aug 2022
Cited by 3 | Viewed by 1481
Abstract
RBC (Red Blood Cell) membrane is a highly elastic structure, and proper modelling of this elasticity is essential for biomedical applications that involve computational experiments with blood flow. In this work, we present a new method for estimating one of the key parameters [...] Read more.
RBC (Red Blood Cell) membrane is a highly elastic structure, and proper modelling of this elasticity is essential for biomedical applications that involve computational experiments with blood flow. In this work, we present a new method for estimating one of the key parameters of red blood cell elasticity, which uses a neural network trained on the simulation outputs. We test classic LSTM (Long-Short Term Memory) architecture for the time series regression task, and we also experiment with novel CNN-LSTM (Convolutional Neural Network) architecture. We paid special attention to investigating the impact of the way the three-dimensional training data are reduced to their two-dimensional projections. Such a comparison is possible thanks to working with simulation outputs that are equivalently defined for all dimensions and their combinations. The obtained results can be used as recommendations for an appropriate way to record real experiments for which the reduced dimension of the acquired data is essential. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

31 pages, 5580 KiB  
Article
A Novel Deep Learning Model for Sea State Classification Using Visual-Range Sea Images
by Muhammad Umair, Manzoor Ahmed Hashmani, Syed Sajjad Hussain Rizvi, Hasmi Taib, Mohd Nasir Abdullah and Mehak Maqbool Memon
Symmetry 2022, 14(7), 1487; https://doi.org/10.3390/sym14071487 - 20 Jul 2022
Cited by 3 | Viewed by 2488
Abstract
Wind-waves exhibit variations both in shape and steepness, and their asymmetrical nature is a well-known feature. One of the important characteristics of the sea surface is the front-back asymmetry of wind-wave crests. The wind-wave conditions on the surface of the sea constitute a [...] Read more.
Wind-waves exhibit variations both in shape and steepness, and their asymmetrical nature is a well-known feature. One of the important characteristics of the sea surface is the front-back asymmetry of wind-wave crests. The wind-wave conditions on the surface of the sea constitute a sea state, which is listed as an essential climate variable by the Global Climate Observing System and is considered a critical factor for structural safety and optimal operations of offshore oil and gas platforms. Methods such as statistical representations of sensor-based wave parameters observations and numerical modeling are used to classify sea states. However, for offshore structures such as oil and gas platforms, these methods induce high capital expenditures (CAPEX) and operating expenses (OPEX), along with extensive computational power and time requirements. To address this issue, in this paper, we propose a novel, low-cost deep learning-based sea state classification model using visual-range sea images. Firstly, a novel visual-range sea state image dataset was designed and developed for this purpose. The dataset consists of 100,800 images covering four sea states. The dataset was then benchmarked on state-of-the-art deep learning image classification models. The highest classification accuracy of 81.8% was yielded by NASNet-Mobile. Secondly, a novel sea state classification model was proposed. The model took design inspiration from GoogLeNet, which was identified as the optimal reference model for sea state classification. Systematic changes in GoogLeNet’s inception block were proposed, which resulted in an 8.5% overall classification accuracy improvement in comparison with NASNet-Mobile and a 7% improvement from the reference model (i.e., GoogLeNet). Additionally, the proposed model took 26% less training time, and its per-image classification time remains competitive. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

16 pages, 4936 KiB  
Article
Application of Feature Selection Based on Multilayer GA in Stock Prediction
by Xiaoning Li, Qiancheng Yu, Chen Tang, Zekun Lu and Yufan Yang
Symmetry 2022, 14(7), 1415; https://doi.org/10.3390/sym14071415 - 10 Jul 2022
Cited by 6 | Viewed by 1541
Abstract
This paper proposes a feature selection model based on a multilayer genetic algorithm (GA) to select the features of a high stock dividend (HSD) and eliminate the relatively redundant features in the optimal solution by using layer-by-layer information transfer and two-dimensionality reduction methods. [...] Read more.
This paper proposes a feature selection model based on a multilayer genetic algorithm (GA) to select the features of a high stock dividend (HSD) and eliminate the relatively redundant features in the optimal solution by using layer-by-layer information transfer and two-dimensionality reduction methods. Combining the ensemble model and time-series split cross-validation (TSCV) indicator as the fitness function solves the problem of selecting the fitness function for each layer. The symmetry character of the model is fully utilized in the two-dimensionality reduction processes, according to the change in data dimensions and the unbalanced characteristics of the HSD, setting the corresponding TSCV indicators. We built seven ensemble prediction models for actual stock trading data for comparison experiments. The results show that the feature selection model based on multilayer GA can effectively eliminate the relatively redundant features after dimensionality reduction and significantly improve the balancing accuracy, precision and AUC performance of the seven ensemble learning models. Finally, adversarial validation is used to analyze the differences in the balanced accuracy of the training and test sets caused by the inconsistent distribution of the data sets. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

12 pages, 1934 KiB  
Article
Short Time Series Forecasting: Recommended Methods and Techniques
by Mariel Abigail Cruz-Nájera, Mayra Guadalupe Treviño-Berrones, Mirna Patricia Ponce-Flores, Jesús David Terán-Villanueva, José Antonio Castán-Rocha, Salvador Ibarra-Martínez, Alejandro Santiago and Julio Laria-Menchaca
Symmetry 2022, 14(6), 1231; https://doi.org/10.3390/sym14061231 - 14 Jun 2022
Cited by 7 | Viewed by 5958
Abstract
This paper tackles the problem of forecasting real-life crime. However, the recollected data only produced thirty-five short-sized crime time series for three urban areas. We present a comparative analysis of four simple and four machine-learning-based ensemble forecasting methods. Additionally, we propose five forecasting [...] Read more.
This paper tackles the problem of forecasting real-life crime. However, the recollected data only produced thirty-five short-sized crime time series for three urban areas. We present a comparative analysis of four simple and four machine-learning-based ensemble forecasting methods. Additionally, we propose five forecasting techniques that manage the seasonal component of the time series. Furthermore, we used the symmetric mean average percentage error and a Friedman test to compare the performance of the forecasting methods and proposed techniques. The results showed that simple moving average with seasonal removal techniques produce the best performance for these series. It is important to highlight that a high percentage of the time series has no auto-correlation and a high level of symmetry, which is deemed as white noise and, therefore, difficult to forecast. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

45 pages, 6452 KiB  
Review
Time Series Analysis Based on Informer Algorithms: A Survey
by Qingbo Zhu, Jialin Han, Kai Chai and Cunsheng Zhao
Symmetry 2023, 15(4), 951; https://doi.org/10.3390/sym15040951 - 21 Apr 2023
Cited by 4 | Viewed by 4743
Abstract
Long series time forecasting has become a popular research direction in recent years, due to the ability to predict weather changes, traffic conditions and so on. This paper provides a comprehensive discussion of long series time forecasting techniques and their applications, using the [...] Read more.
Long series time forecasting has become a popular research direction in recent years, due to the ability to predict weather changes, traffic conditions and so on. This paper provides a comprehensive discussion of long series time forecasting techniques and their applications, using the Informer algorithm model as a framework. Specifically, we examine sequential time prediction models published in the last two years, including the tightly coupled convolutional transformer (TCCT) algorithm, Autoformer algorithm, FEDformer algorithm, Pyraformer algorithm, and Triformer algorithm. Researchers have made significant improvements to the attention mechanism and Informer algorithm model architecture in these different neural network models, resulting in recent approaches such as wavelet enhancement structure, auto-correlation mechanism, and depth decomposition architecture. In addition to the above, attention algorithms and many models show potential and possibility in mechanical vibration prediction. In recent state-of-the-art studies, researchers have used the Informer algorithm model as an experimental control, and it can be seen that the algorithm model itself has research value. The informer algorithm model performs relatively well on various data sets and has become a more typical algorithm model for time series forecasting, and its model value is worthy of in-depth exploration and research. This paper discusses the structures and innovations of five representative models, including Informer, and reviews the performance of different neural network structures. The advantages and disadvantages of each model are discussed and compared, and finally, the future research direction of long series time forecasting is discussed. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

15 pages, 2075 KiB  
Review
Recent Synergies of Machine Learning and Neurorobotics: A Bibliometric and Visualized Analysis
by Chien-Liang Lin, Yu-Hui Zhu, Wang-Hui Cai and Yu-Sheng Su
Symmetry 2022, 14(11), 2264; https://doi.org/10.3390/sym14112264 - 28 Oct 2022
Cited by 1 | Viewed by 1604
Abstract
Over the past decade, neurorobotics-integrated machine learning has emerged as a new methodology to investigate and address related problems. The combined use of machine learning and neurorobotics allows us to solve problems and find explanatory models that would not be possible with traditional [...] Read more.
Over the past decade, neurorobotics-integrated machine learning has emerged as a new methodology to investigate and address related problems. The combined use of machine learning and neurorobotics allows us to solve problems and find explanatory models that would not be possible with traditional techniques, which are basic within the principles of symmetry. Hence, neuro-robotics has become a new research field. Accordingly, this study aimed to classify existing publications on neurorobotics via content analysis and knowledge mapping. The study also aimed to effectively understand the development trend of neurorobotics-integrated machine learning. Based on data collected from the Web of Science, 46 references were obtained, and bibliometric data from 2013 to 2021 were analyzed to identify the most productive countries, universities, authors, journals, and prolific publications in neurorobotics. CiteSpace was used to visualize the analysis based on co-citations, bibliographic coupling, and co-occurrence. The study also used keyword network analysis to discuss the current status of research in this field and determine the primary core topic network based on cluster analysis. Through the compilation and content analysis of specific bibliometric analyses, this study provides a specific explanation for the knowledge structure of the relevant subject area. Finally, the implications and future research context are discussed as references for future research. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis)
Show Figures

Figure 1

Back to TopTop