Machine Learning and Knowledge Extraction

20 pages, 878 KiB

Open AccessArticle

Multi-Task Representation Learning for Renewable-Power Forecasting: A Comparative Analysis of Unified Autoencoder Variants and Task-Embedding Dimensions

by Chandana Priya Nivarthi, Stephan Vogt and Bernhard Sick

Mach. Learn. Knowl. Extr. 2023, 5(3), 1214-1233; https://doi.org/10.3390/make5030062 - 20 Sep 2023

Viewed by 1266

Abstract

Typically, renewable-power-generation forecasting using machine learning involves creating separate models for each photovoltaic or wind park, known as single-task learning models. However, transfer learning has gained popularity in recent years, as it allows for the transfer of knowledge from source parks to target [...] Read more.

Typically, renewable-power-generation forecasting using machine learning involves creating separate models for each photovoltaic or wind park, known as single-task learning models. However, transfer learning has gained popularity in recent years, as it allows for the transfer of knowledge from source parks to target parks. Nevertheless, determining the most similar source park(s) for transfer learning can be challenging, particularly when the target park has limited or no historical data samples. To address this issue, we propose a multi-task learning architecture that employs a Unified Autoencoder (UAE) to initially learn a common representation of input weather features among tasks and then utilizes a Task-Embedding layer in a Neural Network (TENN) to learn task-specific information. This proposed UAE-TENN architecture can be easily extended to new parks with or without historical data. We evaluate the performance of our proposed architecture and compare it to single-task learning models on six photovoltaic and wind farm datasets consisting of a total of 529 parks. Our results show that the UAE-TENN architecture significantly improves power-forecasting performance by 10 to 19% for photovoltaic parks and 5 to 15% for wind parks compared to baseline models. We also demonstrate that UAE-TENN improves forecast accuracy for a new park by 19% for photovoltaic parks, even in a zero-shot learning scenario where there is no historical data. Additionally, we propose variants of the Unified Autoencoder with convolutional and LSTM layers, compare their performance, and provide a comparison among architectures with different numbers of task-embedding dimensions. Finally, we demonstrate the utility of trained task embeddings for interpretation and visualization purposes. Full article

(This article belongs to the Special Issue Deep Learning and Applications)

► Show Figures

Figure 1

19 pages, 3814 KiB

Open AccessArticle

Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers

by Mohammad H. Alshayeji

Mach. Learn. Knowl. Extr. 2023, 5(3), 1195-1213; https://doi.org/10.3390/make5030061 - 18 Sep 2023

Cited by 2 | Viewed by 1675

Abstract

Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system [...] Read more.

Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

19 pages, 1760 KiB

Open AccessReview

Gradient-Based Neural Architecture Search: A Comprehensive Evaluation

by Sarwat Ali and M. Arif Wani

Mach. Learn. Knowl. Extr. 2023, 5(3), 1176-1194; https://doi.org/10.3390/make5030060 - 14 Sep 2023

Cited by 1 | Viewed by 1389

Abstract

One of the challenges in deep learning involves discovering the optimal architecture for a specific task. This is effectively tackled through Neural Architecture Search (NAS). Neural Architecture Search encompasses three prominent approaches—reinforcement learning, evolutionary algorithms, and gradient descent—that have demonstrated noteworthy potential in [...] Read more.

One of the challenges in deep learning involves discovering the optimal architecture for a specific task. This is effectively tackled through Neural Architecture Search (NAS). Neural Architecture Search encompasses three prominent approaches—reinforcement learning, evolutionary algorithms, and gradient descent—that have demonstrated noteworthy potential in identifying good candidate architectures. However, approaches based on reinforcement learning and evolutionary algorithms often necessitate extensive computational resources, requiring hundreds of GPU days or more. Therefore, we confine this work to a gradient-based approach due to its lower computational resource demands. Our objective encompasses identifying the optimal gradient-based NAS method and pinpointing opportunities for future enhancements. To achieve this, a comprehensive evaluation of the use of four major Gradient descent-based architecture search methods for discovering the best neural architecture for image classification tasks is provided. An overview of these gradient-based methods, i.e., DARTS, PDARTS, Fair DARTS and Att-DARTS, is presented. A theoretical comparison, based on search spaces, continuous relaxation strategy and bi-level optimization, for deriving the best neural architecture is then provided. The strong and weak features of these methods are also listed. Experimental results for comparing the error rate and computational cost of these gradient-based methods are analyzed. These experiments involved using bench marking datasets CIFAR-10, CIFAR-100 and ImageNet. The results show that PDARTS is better and faster among the examined methods, making it a potent candidate for automating Neural Architecture Search. By effectively conducting a comparative analysis, our research provides valuable insights and future research directions to address the criticism and gaps in the literature. Full article

(This article belongs to the Special Issue Deep Learning and Applications)

► Show Figures

Figure 1

27 pages, 1127 KiB

Open AccessArticle

Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models

by Taja Kuzman, Igor Mozetič and Nikola Ljubešić

Mach. Learn. Knowl. Extr. 2023, 5(3), 1149-1175; https://doi.org/10.3390/make5030059 - 12 Sep 2023

Viewed by 2152

Abstract

Massive text collections are the backbone of large language models, the main ingredient of the current significant progress in artificial intelligence. However, as these collections are mostly collected using automatic methods, researchers have few insights into what types of texts they consist of. [...] Read more.

Massive text collections are the backbone of large language models, the main ingredient of the current significant progress in artificial intelligence. However, as these collections are mostly collected using automatic methods, researchers have few insights into what types of texts they consist of. Automatic genre identification is a text classification task that enriches texts with genre labels, such as promotional and legal, providing meaningful insights into the composition of these large text collections. In this paper, we evaluate machine learning approaches for the genre identification task based on their generalizability across different datasets to assess which model is the most suitable for the downstream task of enriching large web corpora with genre information. We train and test multiple fine-tuned BERT-like Transformer-based models and show that merging different genre-annotated datasets yields superior results. Moreover, we explore the zero-shot capabilities of large GPT Transformer models in this task and discuss the advantages and disadvantages of the zero-shot approach. We also publish the best-performing fine-tuned model that enables automatic genre annotation in multiple languages. In addition, to promote further research in this area, we plan to share, upon request, a new benchmark for automatic genre annotation, ensuring the non-exposure of the latest large language models. Full article

(This article belongs to the Topic Artificial Intelligence and Computational Methods: Modeling, Simulations and Optimization of Complex Systems)

► Show Figures

Figure 1

17 pages, 5385 KiB

Open AccessArticle

Cyberattack Detection in Social Network Messages Based on Convolutional Neural Networks and NLP Techniques

by Jorge E. Coyac-Torres, Grigori Sidorov, Eleazar Aguirre-Anaya and Gerardo Hernández-Oregón

Mach. Learn. Knowl. Extr. 2023, 5(3), 1132-1148; https://doi.org/10.3390/make5030058 - 01 Sep 2023

Cited by 1 | Viewed by 1602

Abstract

Social networks have captured the attention of many people worldwide. However, these services have also attracted a considerable number of malicious users whose aim is to compromise the digital assets of other users by using messages as an attack vector to execute different [...] Read more.

Social networks have captured the attention of many people worldwide. However, these services have also attracted a considerable number of malicious users whose aim is to compromise the digital assets of other users by using messages as an attack vector to execute different types of cyberattacks against them. This work presents an approach based on natural language processing tools and a convolutional neural network architecture to detect and classify four types of cyberattacks in social network messages, including malware, phishing, spam, and even one whose aim is to deceive a user into spreading malicious messages to other users, which, in this work, is identified as a bot attack. One notable feature of this work is that it analyzes textual content without depending on any characteristics from a specific social network, making its analysis independent of particular data sources. Finally, this work was tested on real data, demonstrating its results in two stages. The first stage detected the existence of any of the four types of cyberattacks within the message, achieving an accuracy value of 0.91. After detecting a message as a cyberattack, the next stage was to classify it as one of the four types of cyberattack, achieving an accuracy value of 0.82. Full article

(This article belongs to the Section Privacy)

► Show Figures

Figure 1

13 pages, 2012 KiB

Open AccessArticle

Comparing the Performance of Machine Learning Algorithms in the Automatic Classification of Psychotherapeutic Interactions in Avatar Therapy

by Alexandre Hudon, Kingsada Phraxayavong, Stéphane Potvin and Alexandre Dumais

Mach. Learn. Knowl. Extr. 2023, 5(3), 1119-1131; https://doi.org/10.3390/make5030057 - 24 Aug 2023

Cited by 2 | Viewed by 2243

Abstract

(1) Background: Avatar Therapy (AT) is currently being studied to help patients suffering from treatment-resistant schizophrenia. Facilitating annotations of immersive verbatims in AT by using classification algorithms could be an interesting avenue to reduce the time and cost of conducting such analysis and [...] Read more.

(1) Background: Avatar Therapy (AT) is currently being studied to help patients suffering from treatment-resistant schizophrenia. Facilitating annotations of immersive verbatims in AT by using classification algorithms could be an interesting avenue to reduce the time and cost of conducting such analysis and adding objective quantitative data in the classification of the different interactions taking place during the therapy. The aim of this study is to compare the performance of machine learning algorithms in the automatic annotation of immersive session verbatims of AT. (2) Methods: Five machine learning algorithms were implemented over a dataset as per the Scikit-Learn library: Support vector classifier, Linear support vector classifier, Multinomial Naïve Bayes, Decision Tree, and Multi-layer perceptron classifier. The dataset consisted of the 27 different types of interactions taking place in AT for the Avatar and the patient for 35 patients who underwent eight immersive sessions as part of their treatment in AT. (3) Results: The Linear SVC performed best over the dataset as compared with the other algorithms with the highest accuracy score, recall score, and F1-Score. The regular SVC performed best for precision. (4) Conclusions: This study presented an objective method for classifying textual interactions based on immersive session verbatims and gave a first comparison of multiple machine learning algorithms on AT. Full article

► Show Figures

Figure 1

43 pages, 12327 KiB

Open AccessArticle

Analyzing Quality Measurements for Dimensionality Reduction

by Michael C. Thrun, Julian Märte and Quirin Stier

Mach. Learn. Knowl. Extr. 2023, 5(3), 1076-1118; https://doi.org/10.3390/make5030056 - 21 Aug 2023

Cited by 1 | Viewed by 1776

Abstract

Dimensionality reduction methods can be used to project high-dimensional data into low-dimensional space. If the output space is restricted to two dimensions, the result is a scatter plot whose goal is to present insightful visualizations of distance- and density-based structures. The topological invariance [...] Read more.

Dimensionality reduction methods can be used to project high-dimensional data into low-dimensional space. If the output space is restricted to two dimensions, the result is a scatter plot whose goal is to present insightful visualizations of distance- and density-based structures. The topological invariance of dimension indicates that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional distances. In praxis, projections of several datasets with distance- and density-based structures show a misleading interpretation of the underlying structures. The examples outline that the evaluation of projections remains essential. Here, 19 unsupervised quality measurements (QM) are grouped into semantic classes with the aid of graph theory. We use three representative benchmark datasets to show that QMs fail to evaluate the projections of straightforward structures when common methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation projection, or t-distributed stochastic neighbor embedding (t-SNE) are applied. This work shows that unsupervised QMs are biased towards assumed underlying structures. Based on insights gained from graph theory, we propose a new quality measurement called the Gabriel Classification Error (GCE). This work demonstrates that GCE can make an unbiased evaluation of projections. The GCE is accessible within the R package DR quality available on CRAN. Full article

(This article belongs to the Section Visualization)

► Show Figures

Figure 1

21 pages, 1082 KiB

Open AccessArticle

Tabular Machine Learning Methods for Predicting Gas Turbine Emissions

by Rebecca Potts, Rick Hackney and Georgios Leontidis

Mach. Learn. Knowl. Extr. 2023, 5(3), 1055-1075; https://doi.org/10.3390/make5030055 - 14 Aug 2023

Cited by 2 | Viewed by 1678

Abstract

Predicting emissions for gas turbines is critical for monitoring harmful pollutants being released into the atmosphere. In this study, we evaluate the performance of machine learning models for predicting emissions for gas turbines. We compared an existing predictive emissions model, a first-principles-based Chemical [...] Read more.

Predicting emissions for gas turbines is critical for monitoring harmful pollutants being released into the atmosphere. In this study, we evaluate the performance of machine learning models for predicting emissions for gas turbines. We compared an existing predictive emissions model, a first-principles-based Chemical Kinetics model, against two machine learning models we developed based on the Self-Attention and Intersample Attention Transformer (SAINT) and eXtreme Gradient Boosting (XGBoost), with the aim to demonstrate the improved predictive performance of nitrogen oxides (NOx) and carbon monoxide (CO) using machine learning techniques and determine whether XGBoost or a deep learning model performs the best on a specific real-life gas turbine dataset. Our analysis utilises a Siemens Energy gas turbine test bed tabular dataset to train and validate the machine learning models. Additionally, we explore the trade-off between incorporating more features to enhance the model complexity, and the resulting presence of increased missing values in the dataset. Full article

► Show Figures

Figure 1

19 pages, 773 KiB

Open AccessPerspective

Defining a Digital Twin: A Data Science-Based Unification

by Frank Emmert-Streib

Mach. Learn. Knowl. Extr. 2023, 5(3), 1036-1054; https://doi.org/10.3390/make5030054 - 12 Aug 2023

Cited by 2 | Viewed by 1899

Abstract

The concept of a digital twin (DT) has gained significant attention in academia and industry because of its perceived potential to address critical global challenges, such as climate change, healthcare, and economic crises. Originally introduced in manufacturing, many attempts have been made to [...] Read more.

The concept of a digital twin (DT) has gained significant attention in academia and industry because of its perceived potential to address critical global challenges, such as climate change, healthcare, and economic crises. Originally introduced in manufacturing, many attempts have been made to present proper definitions of this concept. Unfortunately, there remains a great deal of confusion surrounding the underlying concept, with many scientists still uncertain about the distinction between a simulation, a mathematical model and a DT. The aim of this paper is to propose a formal definition of a digital twin. To achieve this goal, we utilize a data science framework that facilitates a functional representation of a DT and other components that can be combined together to form a larger entity we refer to as a digital twin system (DTS). In our framework, a DT is an open dynamical system with an updating mechanism, also referred to as complex adaptive system (CAS). Its primary function is to generate data via simulations, ideally, indistinguishable from its physical counterpart. On the other hand, a DTS provides techniques for analyzing data and decision-making based on the generated data. Interestingly, we find that a DTS shares similarities to the principles of general systems theory. This multi-faceted view of a DTS explains its versatility in adapting to a wide range of problems in various application domains such as engineering, manufacturing, urban planning, and personalized medicine. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

13 pages, 1314 KiB

Open AccessReview

Artificial Intelligence Ethics and Challenges in Healthcare Applications: A Comprehensive Review in the Context of the European GDPR Mandate

by Mohammad Mohammad Amini, Marcia Jesus, Davood Fanaei Sheikholeslami, Paulo Alves, Aliakbar Hassanzadeh Benam and Fatemeh Hariri

Mach. Learn. Knowl. Extr. 2023, 5(3), 1023-1035; https://doi.org/10.3390/make5030053 - 07 Aug 2023

Cited by 8 | Viewed by 5740

Abstract

This study examines the ethical issues surrounding the use of Artificial Intelligence (AI) in healthcare, specifically nursing, under the European General Data Protection Regulation (GDPR). The analysis delves into how GDPR applies to healthcare AI projects, encompassing data collection and decision-making stages, to [...] Read more.

This study examines the ethical issues surrounding the use of Artificial Intelligence (AI) in healthcare, specifically nursing, under the European General Data Protection Regulation (GDPR). The analysis delves into how GDPR applies to healthcare AI projects, encompassing data collection and decision-making stages, to reveal the ethical implications at each step. A comprehensive review of the literature categorizes research investigations into three main categories: Ethical Considerations in AI; Practical Challenges and Solutions in AI Integration; and Legal and Policy Implications in AI. The analysis uncovers a significant research deficit in this field, with a particular focus on data owner rights and AI ethics within GDPR compliance. To address this gap, the study proposes new case studies that emphasize the importance of comprehending data owner rights and establishing ethical norms for AI use in medical applications, especially in nursing. This review makes a valuable contribution to the AI ethics debate and assists nursing and healthcare professionals in developing ethical AI practices. The insights provided help stakeholders navigate the intricate terrain of data protection, ethical considerations, and regulatory compliance in AI-driven healthcare. Lastly, the study introduces a case study of a real AI health-tech project named SENSOMATT, spotlighting GDPR and privacy issues. Full article

(This article belongs to the Topic Secure Applications with Blockchain and Artificial Intelligence)

► Show Figures

Figure 1

13 pages, 1436 KiB

Open AccessArticle

Improving Spiking Neural Network Performance with Auxiliary Learning

by Paolo G. Cachi, Sebastián Ventura and Krzysztof J. Cios

Mach. Learn. Knowl. Extr. 2023, 5(3), 1010-1022; https://doi.org/10.3390/make5030052 - 05 Aug 2023

Viewed by 1845

Abstract

The use of back propagation through the time learning rule enabled the supervised training of deep spiking neural networks to process temporal neuromorphic data. However, their performance is still below non-spiking neural networks. Previous work pointed out that one of the main causes [...] Read more.

The use of back propagation through the time learning rule enabled the supervised training of deep spiking neural networks to process temporal neuromorphic data. However, their performance is still below non-spiking neural networks. Previous work pointed out that one of the main causes is the limited number of neuromorphic data currently available, which are also difficult to generate. With the goal of overcoming this problem, we explore the usage of auxiliary learning as a means of helping spiking neural networks to identify more general features. Tests are performed on neuromorphic DVS-CIFAR10 and DVS128-Gesture datasets. The results indicate that training with auxiliary learning tasks improves their accuracy, albeit slightly. Different scenarios, including manual and automatic combination losses using implicit differentiation, are explored to analyze the usage of auxiliary tasks. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

► Show Figures

Figure 1

31 pages, 3978 KiB

Open AccessArticle

Identifying the Regions of a Space with the Self-Parameterized Recursively Assessed Decomposition Algorithm (SPRADA)

by Dylan Molinié, Kurosh Madani, Véronique Amarger and Abdennasser Chebira

Mach. Learn. Knowl. Extr. 2023, 5(3), 979-1009; https://doi.org/10.3390/make5030051 - 04 Aug 2023

Viewed by 1245

Abstract

This paper introduces a non-parametric methodology based on classical unsupervised clustering techniques to automatically identify the main regions of a space, without requiring the objective number of clusters, so as to identify the major regular states of unknown industrial systems. Indeed, useful knowledge [...] Read more.

This paper introduces a non-parametric methodology based on classical unsupervised clustering techniques to automatically identify the main regions of a space, without requiring the objective number of clusters, so as to identify the major regular states of unknown industrial systems. Indeed, useful knowledge on real industrial processes entails the identification of their regular states, and their historically encountered anomalies. Since both should form compact and salient groups of data, unsupervised clustering generally performs this task fairly accurately; however, this often requires the number of clusters upstream, knowledge which is rarely available. As such, the proposed algorithm operates a first partitioning of the space, then it estimates the integrity of the clusters, and splits them again and again until every cluster obtains an acceptable integrity; finally, a step of merging based on the clusters’ empirical distributions is performed to refine the partitioning. Applied to real industrial data obtained in the scope of a European project, this methodology proved able to automatically identify the main regular states of the system. Results show the robustness of the proposed approach in the fully-automatic and non-parametric identification of the main regions of a space, knowledge which is useful to industrial anomaly detection and behavioral modeling. Full article

(This article belongs to the Topic Artificial Intelligence and Computational Methods: Modeling, Simulations and Optimization of Complex Systems)

► Show Figures

Figure 1

22 pages, 30347 KiB

Open AccessArticle

Behavior-Aware Pedestrian Trajectory Prediction in Ego-Centric Camera Views with Spatio-Temporal Ego-Motion Estimation

by Phillip Czech, Markus Braun, Ulrich Kreßel and Bin Yang

Mach. Learn. Knowl. Extr. 2023, 5(3), 957-978; https://doi.org/10.3390/make5030050 - 03 Aug 2023

Cited by 2 | Viewed by 1469

Abstract

With the ongoing development of automated driving systems, the crucial task of predicting pedestrian behavior is attracting growing attention. The prediction of future pedestrian trajectories from the ego-vehicle camera perspective is particularly challenging due to the dynamically changing scene. Therefore, we present Behavior-Aware [...] Read more.

With the ongoing development of automated driving systems, the crucial task of predicting pedestrian behavior is attracting growing attention. The prediction of future pedestrian trajectories from the ego-vehicle camera perspective is particularly challenging due to the dynamically changing scene. Therefore, we present Behavior-Aware Pedestrian Trajectory Prediction (BA-PTP), a novel approach to pedestrian trajectory prediction for ego-centric camera views. It incorporates behavioral features extracted from real-world traffic scene observations such as the body and head orientation of pedestrians, as well as their pose, in addition to positional information from body and head bounding boxes. For each input modality, we employed independent encoding streams that are combined through a modality attention mechanism. To account for the ego-motion of the camera in an ego-centric view, we introduced Spatio-Temporal Ego-Motion Module (STEMM), a novel approach to ego-motion prediction. Compared to the related works, it utilizes spatial goal points of the ego-vehicle that are sampled from its intended route. We experimentally validated the effectiveness of our approach using two datasets for pedestrian behavior prediction in urban traffic scenes. Based on ablation studies, we show the advantages of incorporating different behavioral features for pedestrian trajectory prediction in the image plane. Moreover, we demonstrate the benefit of integrating STEMM into our pedestrian trajectory prediction method, BA-PTP. BA-PTP achieves state-of-the-art performance on the PIE dataset, outperforming prior work by 7% in MSE-1.5 s and C_MSE as well as 9% in CF_MSE. Full article

(This article belongs to the Special Issue Deep Learning and Applications)

► Show Figures

Figure 1

20 pages, 1544 KiB

Open AccessArticle

Alternative Formulations of Decision Rule Learning from Neural Networks

by Litao Qiao, Weijia Wang and Bill Lin

Mach. Learn. Knowl. Extr. 2023, 5(3), 937-956; https://doi.org/10.3390/make5030049 - 03 Aug 2023

Viewed by 1152

Abstract

This paper extends recent work on decision rule learning from neural networks for tabular data classification. We propose alternative formulations to trainable Boolean logic operators as neurons with continuous weights, including trainable NAND neurons. These alternative formulations provide uniform treatments to different trainable [...] Read more.

This paper extends recent work on decision rule learning from neural networks for tabular data classification. We propose alternative formulations to trainable Boolean logic operators as neurons with continuous weights, including trainable NAND neurons. These alternative formulations provide uniform treatments to different trainable logic neurons so that they can be uniformly trained, which enables, for example, the direct application of existing sparsity-promoting neural net training techniques like reweighted

L_{1}

regularization to derive sparse networks that translate to simpler rules. In addition, we present an alternative network architecture based on trainable NAND neurons by applying De Morgan’s law to realize a NAND-NAND network instead of an AND-OR network, both of which can be readily mapped to decision rule sets. Our experimental results show that these alternative formulations can also generate accurate decision rule sets that achieve state-of-the-art performance in terms of accuracy in tabular learning applications. Full article

(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence (XAI))

► Show Figures

Figure 1

15 pages, 2157 KiB

Open AccessArticle

Achievable Minimally-Contrastive Counterfactual Explanations

by Hosein Barzekar and Susan McRoy

Mach. Learn. Knowl. Extr. 2023, 5(3), 922-936; https://doi.org/10.3390/make5030048 - 03 Aug 2023

Viewed by 1470

Abstract

Decision support systems based on machine learning models should be able to help users identify opportunities and threats. Popular model-agnostic explanation models can identify factors that support various predictions, answering questions such as “What factors affect sales?” or “Why did sales decline?”, but [...] Read more.

Decision support systems based on machine learning models should be able to help users identify opportunities and threats. Popular model-agnostic explanation models can identify factors that support various predictions, answering questions such as “What factors affect sales?” or “Why did sales decline?”, but do not highlight what a person should or could do to get a more desirable outcome. Counterfactual explanation approaches address intervention, and some even consider feasibility, but none consider their suitability for real-time applications, such as question answering. Here, we address this gap by introducing a novel model-agnostic method that provides specific, feasible changes that would impact the outcomes of a complex Black Box AI model for a given instance and assess its real-world utility by measuring its real-time performance and ability to find achievable changes. The method uses the instance of concern to generate high-precision explanations and then applies a secondary method to find achievable minimally-contrastive counterfactual explanations (AMCC) while limiting the search to modifications that satisfy domain-specific constraints. Using a widely recognized dataset, we evaluated the classification task to ascertain the frequency and time required to identify successful counterfactuals. For a 90% accurate classifier, our algorithm identified AMCC explanations in 47% of cases (38 of 81), with an average discovery time of 80 ms. These findings verify the algorithm’s efficiency in swiftly producing AMCC explanations, suitable for real-time systems. The AMCC method enhances the transparency of Black Box AI models, aiding individuals in evaluating remedial strategies or assessing potential outcomes. Full article

(This article belongs to the Special Issue Advances in Explainable Artificial Intelligence (XAI))

► Show Figures

Figure 1

31 pages, 5585 KiB

Open AccessReview

Capsule Network with Its Limitation, Modification, and Applications—A Survey

by Mahmood Ul Haq, Muhammad Athar Javed Sethi and Atiq Ur Rehman

Mach. Learn. Knowl. Extr. 2023, 5(3), 891-921; https://doi.org/10.3390/make5030047 - 02 Aug 2023

Cited by 2 | Viewed by 3435

Abstract

Numerous advancements in various fields, including pattern recognition and image classification, have been made thanks to modern computer vision and machine learning methods. The capsule network is one of the advanced machine learning algorithms that encodes features based on their hierarchical relationships. Basically, [...] Read more.

Numerous advancements in various fields, including pattern recognition and image classification, have been made thanks to modern computer vision and machine learning methods. The capsule network is one of the advanced machine learning algorithms that encodes features based on their hierarchical relationships. Basically, a capsule network is a type of neural network that performs inverse graphics to represent the object in different parts and view the existing relationship between these parts, unlike CNNs, which lose most of the evidence related to spatial location and requires lots of training data. So, we present a comparative review of various capsule network architectures used in various applications. The paper’s main contribution is that it summarizes and explains the significant current published capsule network architectures with their advantages, limitations, modifications, and applications. Full article

► Show Figures

Figure 1

23 pages, 903 KiB

Open AccessArticle

Autoencoder Feature Residuals for Network Intrusion Detection: One-Class Pretraining for Improved Performance

by Brian Lewandowski and Randy Paffenroth

Mach. Learn. Knowl. Extr. 2023, 5(3), 868-890; https://doi.org/10.3390/make5030046 - 31 Jul 2023

Cited by 1 | Viewed by 1054

Abstract

The proliferation of novel attacks and growing amounts of data has caused practitioners in the field of network intrusion detection to constantly work towards keeping up with this evolving adversarial landscape. Researchers have been seeking to harness deep learning techniques in efforts to [...] Read more.

The proliferation of novel attacks and growing amounts of data has caused practitioners in the field of network intrusion detection to constantly work towards keeping up with this evolving adversarial landscape. Researchers have been seeking to harness deep learning techniques in efforts to detect zero-day attacks and allow network intrusion detection systems to more efficiently alert network operators. The technique outlined in this work uses a one-class training process to shape autoencoder feature residuals for the effective detection of network attacks. Compared to an original set of input features, we show that autoencoder feature residuals are a suitable replacement, and often perform at least as well as the original feature set. This quality allows autoencoder feature residuals to prevent the need for extensive feature engineering without reducing classification performance. Additionally, it is found that without generating new data compared to an original feature set, using autoencoder feature residuals often improves classifier performance. Practical side effects from using autoencoder feature residuals emerge by analyzing the potential data compression benefits they provide. Full article

(This article belongs to the Special Issue Deep Learning and Applications)

► Show Figures

Figure 1

21 pages, 1211 KiB

Open AccessArticle

Efficient Latent Space Compression for Lightning-Fast Fine-Tuning and Inference of Transformer-Based Models

by Ala Alam Falaki and Robin Gras

Mach. Learn. Knowl. Extr. 2023, 5(3), 847-867; https://doi.org/10.3390/make5030045 - 30 Jul 2023

Viewed by 1361

Abstract

This paper presents a technique to reduce the number of parameters in a transformer-based encoder–decoder architecture by incorporating autoencoders. To discover the optimal compression, we trained different autoencoders on the embedding space (encoder’s output) of several pre-trained models. The experiments reveal that reducing [...] Read more.

This paper presents a technique to reduce the number of parameters in a transformer-based encoder–decoder architecture by incorporating autoencoders. To discover the optimal compression, we trained different autoencoders on the embedding space (encoder’s output) of several pre-trained models. The experiments reveal that reducing the embedding size has the potential to dramatically decrease the GPU memory usage while speeding up the inference process. The proposed architecture was included in the BART model and tested for summarization, translation, and classification tasks. The summarization results show that a 60% decoder size reduction (from 96 M to 40 M parameters) will make the inference twice as fast and use less than half of GPU memory during fine-tuning process with only a 4.5% drop in R-1 score. The same trend is visible for translation and partially for classification tasks. Our approach reduces the GPU memory usage and processing time of large-scale sequence-to-sequence models for fine-tuning and inference. The implementation and checkpoints are available on GitHub. Full article

(This article belongs to the Special Issue Deep Learning and Applications)

► Show Figures

Figure 1

17 pages, 653 KiB

Open AccessArticle

Low Cost Evolutionary Neural Architecture Search (LENAS) Applied to Traffic Forecasting

by Daniel Klosa and Christof Büskens

Mach. Learn. Knowl. Extr. 2023, 5(3), 830-846; https://doi.org/10.3390/make5030044 - 28 Jul 2023

Viewed by 1198

Abstract

Traffic forecasting is an important task for transportation engineering as it helps authorities to plan and control traffic flow, detect congestion, and reduce environmental impact. Deep learning techniques have gained traction in handling such complex datasets, but require expertise in neural architecture engineering, [...] Read more.

Traffic forecasting is an important task for transportation engineering as it helps authorities to plan and control traffic flow, detect congestion, and reduce environmental impact. Deep learning techniques have gained traction in handling such complex datasets, but require expertise in neural architecture engineering, often beyond the scope of traffic management decision-makers. Our study aims to address this challenge by using neural architecture search (NAS) methods. These methods, which simplify neural architecture engineering by discovering task-specific neural architectures, are only recently applied to traffic prediction. We specifically focus on the performance estimation of neural architectures, a computationally demanding sub-problem of NAS, that often hinders the real-world application of these methods. Extending prior work on evolutionary NAS (ENAS), our work evaluates the utility of zero-cost (ZC) proxies, recently emerged cost-effective evaluators of network architectures. These proxies operate without necessitating training, thereby circumventing the computational bottleneck, albeit at a slight cost to accuracy. Our findings indicate that, when integrated into the ENAS framework, ZC proxies can accelerate the search process by two orders of magnitude at a small cost of accuracy. These results establish the viability of ZC proxies as a practical solution to accelerate NAS methods while maintaining model accuracy. Our research contributes to the domain by showcasing how ZC proxies can enhance the accessibility and usability of NAS methods for traffic forecasting, despite potential limitations in neural architecture engineering expertise. This novel approach significantly aids in the efficient application of deep learning techniques in real-world traffic management scenarios. Full article

(This article belongs to the Special Issue Deep Learning and Applications)

► Show Figures

Figure 1

27 pages, 8245 KiB

Open AccessArticle

Classification Confidence in Exploratory Learning: A User’s Guide

by Peter Salamon, David Salamon, V. Adrian Cantu, Michelle An, Tyler Perry, Robert A. Edwards and Anca M. Segall

Mach. Learn. Knowl. Extr. 2023, 5(3), 803-829; https://doi.org/10.3390/make5030043 - 21 Jul 2023

Viewed by 1356

Abstract

This paper investigates the post-hoc calibration of confidence for “exploratory” machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding [...] Read more.

This paper investigates the post-hoc calibration of confidence for “exploratory” machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the “one-versus-all” approach (top-label calibration) must be used rather than the “calibrate-the-full-response-matrix” approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation using only the test set and the final model. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, may be performed using only the test dataset, and should be sanity-checked visually. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

► Show Figures

Figure 1

21 pages, 1507 KiB

Open AccessArticle

A Probabilistic Transformation of Distance-Based Outliers

by David Muhr, Michael Affenzeller and Josef Küng

Mach. Learn. Knowl. Extr. 2023, 5(3), 782-802; https://doi.org/10.3390/make5030042 - 18 Jul 2023

Cited by 4 | Viewed by 1678

Abstract

The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. [...] Read more.

The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

19 pages, 4338 KiB

Open AccessSystematic Review

Deep Learning and Autonomous Vehicles: Strategic Themes, Applications, and Research Agenda Using SciMAT and Content-Centric Analysis, a Systematic Review

by Fábio Eid Morooka, Adalberto Manoel Junior, Tiago F. A. C. Sigahi, Jefferson de Souza Pinto, Izabela Simon Rampasso and Rosley Anholon

Mach. Learn. Knowl. Extr. 2023, 5(3), 763-781; https://doi.org/10.3390/make5030041 - 13 Jul 2023

Cited by 6 | Viewed by 2627

Abstract

Applications of deep learning (DL) in autonomous vehicle (AV) projects have gained increasing interest from both researchers and companies. This has caused a rapid expansion of scientific production on DL-AV in recent years, encouraging researchers to conduct systematic literature reviews (SLRs) to organize [...] Read more.

Applications of deep learning (DL) in autonomous vehicle (AV) projects have gained increasing interest from both researchers and companies. This has caused a rapid expansion of scientific production on DL-AV in recent years, encouraging researchers to conduct systematic literature reviews (SLRs) to organize knowledge on the topic. However, a critical analysis of the existing SLRs on DL-AV reveals some methodological gaps, particularly regarding the use of bibliometric software, which are powerful tools for analyzing large amounts of data and for providing a holistic understanding on the structure of knowledge of a particular field. This study aims to identify the strategic themes and trends in DL-AV research using the Science Mapping Analysis Tool (SciMAT) and content analysis. Strategic diagrams and cluster networks were developed using SciMAT, allowing the identification of motor themes and research opportunities. The content analysis allowed categorization of the contribution of the academic literature on DL applications in AV project design; neural networks and AI models used in AVs; and transdisciplinary themes in DL-AV research, including energy, legislation, ethics, and cybersecurity. Potential research avenues are discussed for each of these categories. The findings presented in this study can benefit both experienced scholars who can gain access to condensed information about the literature on DL-AV and new researchers who may be attracted to topics related to technological development and other issues with social and environmental impacts. Full article

(This article belongs to the Section Thematic Reviews)

► Show Figures

Figure 1

17 pages, 3106 KiB

Open AccessArticle

The Value of Numbers in Clinical Text Classification

by Kristian Miok, Padraig Corcoran and Irena Spasić

Mach. Learn. Knowl. Extr. 2023, 5(3), 746-762; https://doi.org/10.3390/make5030040 - 07 Jul 2023

Cited by 1 | Viewed by 2041

Abstract

Clinical text often includes numbers of various types and formats. However, most current text classification approaches do not take advantage of these numbers. This study aims to demonstrate that using numbers as features can significantly improve the performance of text classification models. This [...] Read more.

Clinical text often includes numbers of various types and formats. However, most current text classification approaches do not take advantage of these numbers. This study aims to demonstrate that using numbers as features can significantly improve the performance of text classification models. This study also demonstrates the feasibility of extracting such features from clinical text. Unsupervised learning was used to identify patterns of number usage in clinical text. These patterns were analyzed manually and converted into pattern-matching rules. Information extraction was used to incorporate numbers as features into a document representation model. We evaluated text classification models trained on such representation. Our experiments were performed with two document representation models (vector space model and word embedding model) and two classification models (support vector machines and neural networks). The results showed that even a handful of numerical features can significantly improve text classification performance. We conclude that commonly used document representations do not represent numbers in a way that machine learning algorithms can effectively utilize them as features. Although we demonstrated that traditional information extraction can be effective in converting numbers into features, further community-wide research is required to systematically incorporate number representation into the word embedding process. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

► Show Figures

Figure 1

21 pages, 9088 KiB

Open AccessArticle

Research on Forest Fire Detection Algorithm Based on Improved YOLOv5

by Jianfeng Li and Xiaoqin Lian

Mach. Learn. Knowl. Extr. 2023, 5(3), 725-745; https://doi.org/10.3390/make5030039 - 28 Jun 2023

Viewed by 1368

Abstract

Forest fires are one of the world’s deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke [...] Read more.

Forest fires are one of the world’s deadliest natural disasters. Early detection of forest fires can help minimize the damage to ecosystems and forest life. In this paper, we propose an improved fire detection method YOLOv5-IFFDM for YOLOv5. Firstly, the fire and smoke detection accuracy and the network perception accuracy of small targets are improved by adding an attention mechanism to the backbone network. Secondly, the loss function is improved and the SoftPool pyramid pooling structure is used to improve the regression accuracy and detection performance of the model and the robustness of the model. In addition, a random mosaic augmentation technique is used to enhance the data to increase the generalization ability of the model, and re-clustering of flame and smoke detection a priori frames are used to improve the accuracy and speed. Finally, the parameters of the convolutional and normalization layers of the trained model are homogeneously merged to further reduce the model processing load and to improve the detection speed. Experimental results on self-built forest-fire and smoke datasets show that this algorithm has high detection accuracy and fast detection speed, with average accuracy of fire up to 90.5% and smoke up to 84.3%, and detection speed up to 75 FPS (frames per second transmission), which can meet the requirements of real-time and efficient fire detection. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition)

► Show Figures

Figure 1

12 pages, 767 KiB

Open AccessArticle

Using Machine Learning with Eye-Tracking Data to Predict if a Recruiter Will Approve a Resume

by Angel Pina, Corbin Petersheim, Josh Cherian, Joanna Nicole Lahey, Gerianne Alexander and Tracy Hammond

Mach. Learn. Knowl. Extr. 2023, 5(3), 713-724; https://doi.org/10.3390/make5030038 - 28 Jun 2023

Viewed by 1975

Abstract

When job seekers are unsuccessful in getting a position, they often do not get feedback to inform them on how to develop a better application in the future. Therefore, there is a critical need to understand what qualifications recruiters value in order to [...] Read more.

When job seekers are unsuccessful in getting a position, they often do not get feedback to inform them on how to develop a better application in the future. Therefore, there is a critical need to understand what qualifications recruiters value in order to help applicants. To address this need, we utilized eye-trackers to measure and record visual data of recruiters screening resumes to gain insight into which Areas of Interest (AOIs) influenced recruiters’ decisions the most. Using just this eye-tracking data, we trained a machine learning classifier to predict whether or not a recruiter would move a resume on to the next level of the hiring process with an AUC of 0.767. We found that features associated with recruiters looking outside the content of a resume were most predictive of their decision as well as total time viewing the resume and time spent on the Experience and Education sections. We hypothesize that this behavior is indicative of the recruiter reflecting on the content of the resume. These initial results show that applicants should focus on designing clear and concise resumes that are easy for recruiters to absorb and think about, with additional attention given to the Experience and Education sections. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

► Show Figures

Figure 1

29 pages, 4857 KiB

Open AccessArticle

CovC-ReDRNet: A Deep Learning Model for COVID-19 Classification

by Hanruo Zhu, Ziquan Zhu, Shuihua Wang and Yudong Zhang

Mach. Learn. Knowl. Extr. 2023, 5(3), 684-712; https://doi.org/10.3390/make5030037 - 27 Jun 2023

Viewed by 1405

Abstract

Since the COVID-19 pandemic outbreak, over 760 million confirmed cases and over 6.8 million deaths have been reported globally, according to the World Health Organization. While the SARS-CoV-2 virus carried by COVID-19 patients can be identified though the reverse transcription–polymerase chain reaction (RT-PCR) [...] Read more.

Since the COVID-19 pandemic outbreak, over 760 million confirmed cases and over 6.8 million deaths have been reported globally, according to the World Health Organization. While the SARS-CoV-2 virus carried by COVID-19 patients can be identified though the reverse transcription–polymerase chain reaction (RT-PCR) test with high accuracy, clinical misdiagnosis between COVID-19 and pneumonia patients remains a challenge. Therefore, we developed a novel CovC-ReDRNet model to distinguish COVID-19 patients from pneumonia patients as well as normal cases. ResNet-18 was introduced as the backbone model and tailored for the feature representation afterward. In our feature-based randomized neural network (RNN) framework, the feature representation automatically pairs with the deep random vector function link network (dRVFL) as the optimal classifier, producing a CovC-ReDRNet model for the classification task. Results based on five-fold cross-validation reveal that our method achieved 94.94%, 97.01%, 97.56%, 96.81%, and 95.84% MA sensitivity, MA specificity, MA accuracy, MA precision, and MA F1-score, respectively. Ablation studies evidence the superiority of ResNet-18 over different backbone networks, RNNs over traditional classifiers, and deep RNNs over shallow RNNs. Moreover, our proposed model achieved a better MA accuracy than the state-of-the-art (SOTA) methods, the highest score of which was 95.57%. To conclude, our CovC-ReDRNet model could be perceived as an advanced computer-aided diagnostic model with high speed and high accuracy for classifying and predicting COVID-19 diseases. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Mach. Learn. Knowl. Extr., Volume 5, Issue 3 (September 2023) – 26 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI