Algorithms

Editorial

Jump to: Research

3 pages, 134 KiB

Open AccessEditorial

Special Issue “Algorithms in Data Classification”

by Ioannis G. Tsoulos

Algorithms 2024, 17(1), 5; https://doi.org/10.3390/a17010005 - 22 Dec 2023

Viewed by 1273

Abstract

Data classification is a well-known procedure, with many applications to real-world problems [...] Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

Research

Jump to: Editorial

14 pages, 1085 KiB

Open AccessArticle

On the Influence of Data Imbalance on Supervised Gaussian Mixture Models

by Luca Scrucca

Algorithms 2023, 16(12), 563; https://doi.org/10.3390/a16120563 - 11 Dec 2023

Viewed by 1575

Abstract

Imbalanced data present a pervasive challenge in many real-world applications of statistical and machine learning, where the instances of one class significantly outnumber those of the other. This paper examines the impact of class imbalance on the performance of Gaussian mixture models in [...] Read more.

Imbalanced data present a pervasive challenge in many real-world applications of statistical and machine learning, where the instances of one class significantly outnumber those of the other. This paper examines the impact of class imbalance on the performance of Gaussian mixture models in classification tasks and establishes the need for a strategy to reduce the adverse effects of imbalanced data on the accuracy and reliability of classification outcomes. We explore various strategies to address this problem, including cost-sensitive learning, threshold adjustments, and sampling-based techniques. Through extensive experiments on synthetic and real-world datasets, we evaluate the effectiveness of these methods. Our findings emphasize the need for effective mitigation strategies for class imbalance in supervised Gaussian mixtures, offering valuable insights for practitioners and researchers in improving classification outcomes. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

25 pages, 9322 KiB

Open AccessArticle

Blood Cell Revolution: Unveiling 11 Distinct Types with ‘Naturalize’ Augmentation

by Mohamad Abou Ali, Fadi Dornaika and Ignacio Arganda-Carreras

Algorithms 2023, 16(12), 562; https://doi.org/10.3390/a16120562 - 10 Dec 2023

Cited by 1 | Viewed by 1849

Abstract

Artificial intelligence (AI) has emerged as a cutting-edge tool, simultaneously accelerating, securing, and enhancing the diagnosis and treatment of patients. An exemplification of this capability is evident in the analysis of peripheral blood smears (PBS). In university medical centers, hematologists routinely examine hundreds [...] Read more.

Artificial intelligence (AI) has emerged as a cutting-edge tool, simultaneously accelerating, securing, and enhancing the diagnosis and treatment of patients. An exemplification of this capability is evident in the analysis of peripheral blood smears (PBS). In university medical centers, hematologists routinely examine hundreds of PBS slides daily to validate or correct outcomes produced by advanced hematology analyzers assessing samples from potentially problematic patients. This process may logically lead to erroneous PBC readings, posing risks to patient health. AI functions as a transformative tool, significantly improving the accuracy and precision of readings and diagnoses. This study reshapes the parameters of blood cell classification, harnessing the capabilities of AI and broadening the scope from 5 to 11 specific blood cell categories with the challenging 11-class PBC dataset. This transformation facilitates a more profound exploration of blood cell diversity, surpassing prior constraints in medical image analysis. Our approach combines state-of-the-art deep learning techniques, including pre-trained ConvNets, ViTb16 models, and custom CNN architectures. We employ transfer learning, fine-tuning, and ensemble strategies, such as CBAM and Averaging ensembles, to achieve unprecedented accuracy and interpretability. Our fully fine-tuned EfficientNetV2 B0 model sets a new standard, with a macro-average precision, recall, and F1-score of 91%, 90%, and 90%, respectively, and an average accuracy of 93%. This breakthrough underscores the transformative potential of 11-class blood cell classification for more precise medical diagnoses. Moreover, our groundbreaking “Naturalize” augmentation technique produces remarkable results. The 2K-PBC dataset generated with “Naturalize” boasts a macro-average precision, recall, and F1-score of 97%, along with an average accuracy of 96% when leveraging the fully fine-tuned EfficientNetV2 B0 model. This innovation not only elevates classification performance but also addresses data scarcity and bias in medical deep learning. Our research marks a paradigm shift in blood cell classification, enabling more nuanced and insightful medical analyses. The “Naturalize” technique’s impact extends beyond blood cell classification, emphasizing the vital role of diverse and comprehensive datasets in advancing healthcare applications through deep learning. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

21 pages, 4889 KiB

Open AccessArticle

A Case-Study Comparison of Machine Learning Approaches for Predicting Student’s Dropout from Multiple Online Educational Entities

by José Manuel Porras, Juan Alfonso Lara, Cristóbal Romero and Sebastián Ventura

Algorithms 2023, 16(12), 554; https://doi.org/10.3390/a16120554 - 3 Dec 2023

Viewed by 1718

Abstract

Predicting student dropout is a crucial task in online education. Traditionally, each educational entity (institution, university, faculty, department, etc.) creates and uses its own prediction model starting from its own data. However, that approach is not always feasible or advisable and may depend [...] Read more.

Predicting student dropout is a crucial task in online education. Traditionally, each educational entity (institution, university, faculty, department, etc.) creates and uses its own prediction model starting from its own data. However, that approach is not always feasible or advisable and may depend on the availability of data, local infrastructure, and resources. In those cases, there are various machine learning approaches for sharing data and/or models between educational entities, using a classical centralized machine learning approach or other more advanced approaches such as transfer learning or federated learning. In this paper, we used data from three different LMS Moodle servers representing homogeneous different-sized educational entities. We tested the performance of the different machine learning approaches for the problem of predicting student dropout with multiple educational entities involved. We used a deep learning algorithm as a predictive classifier method. Our preliminary findings provide useful information on the benefits and drawbacks of each approach, as well as suggestions for enhancing performance when there are multiple institutions. In our case, repurposed transfer learning, stacked transfer learning, and centralized approaches produced similar or better results than the locally trained models for most of the entities. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Graphical abstract

18 pages, 4392 KiB

Open AccessArticle

Assessing Algorithms Used for Constructing Confidence Ellipses in Multidimensional Scaling Solutions

by Panos Nikitas and Efthymia Nikita

Algorithms 2023, 16(12), 535; https://doi.org/10.3390/a16120535 - 24 Nov 2023

Viewed by 1221

Abstract

This paper assesses algorithms proposed for constructing confidence ellipses in multidimensional scaling (MDS) solutions and proposes a new approach to interpreting these confidence ellipses via hierarchical cluster analysis (HCA). It is shown that the most effective algorithm for constructing confidence ellipses involves the [...] Read more.

This paper assesses algorithms proposed for constructing confidence ellipses in multidimensional scaling (MDS) solutions and proposes a new approach to interpreting these confidence ellipses via hierarchical cluster analysis (HCA). It is shown that the most effective algorithm for constructing confidence ellipses involves the generation of simulated distances based on the original multivariate dataset and then the creation of MDS maps that are scaled, reflected, rotated, translated, and finally superimposed. For this algorithm, the stability measure of the average areas tends to zero with increasing sample size n following the power model, An^−B, with positive B values ranging from 0.7 to 2 and high R-squared fitting values around 0.99. This algorithm was applied to create confidence ellipses in the MDS plots of squared Euclidean and Mahalanobis distances for continuous and binary data. It was found that plotting confidence ellipses in MDS plots offers a better visualization of the distance map of the populations under study compared to plotting single points. However, the confidence ellipses cannot eliminate the subjective selection of clusters in the MDS plot based simply on the proximity of the MDS points. To overcome this subjective selection, we should quantify the formation of clusters of proximal samples. Thus, in addition to the algorithm assessment, we propose a new approach that estimates all possible cluster probabilities associated with the confidence ellipses by applying HCA using distance matrices derived from these ellipses. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

17 pages, 8845 KiB

Open AccessArticle

Utilizing Mixture Regression Models for Clustering Time-Series Energy Consumption of a Plastic Injection Molding Process

by Massimo Pacella, Matteo Mangini and Gabriele Papadia

Algorithms 2023, 16(11), 524; https://doi.org/10.3390/a16110524 - 15 Nov 2023

Cited by 1 | Viewed by 1386

Abstract

Considering the issue of energy consumption reduction in industrial plants, we investigated a clustering method for mining the time-series data related to energy consumption. The industrial case study considered in our work is one of the most energy-intensive processes in the plastics industry: [...] Read more.

Considering the issue of energy consumption reduction in industrial plants, we investigated a clustering method for mining the time-series data related to energy consumption. The industrial case study considered in our work is one of the most energy-intensive processes in the plastics industry: the plastic injection molding process. Concerning the industrial setting, the energy consumption of the injection molding machine was monitored across multiple injection molding cycles. The collected data were then analyzed to establish patterns and trends in the energy consumption of the injection molding process. To this end, we considered mixtures of regression models given their flexibility in modeling heterogeneous time series and clustering time series in an unsupervised machine learning framework. Given the assumption of autocorrelated data and exogenous variables in the mixture model, we implemented an algorithm for model fitting that combined autocorrelated observations with spline and polynomial regressions. Our results demonstrate an accurate grouping of energy-consumption profiles, where each cluster is related to a specific production schedule. The clustering method also provides a unique profile of energy consumption for each cluster, depending on the production schedule and regression approach (i.e., spline and polynomial). According to these profiles, information related to the shape of energy consumption was identified, providing insights into reducing the electrical demand of the plant. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

15 pages, 936 KiB

Open AccessArticle

An Intelligent Injury Rehabilitation Guidance System for Recreational Runners Using Data Mining Algorithms

by Theodoros Tzelepis, George Matlis, Nikos Dimokas, Petros Karvelis, Paraskevi Malliou and Anastasia Beneka

Algorithms 2023, 16(11), 523; https://doi.org/10.3390/a16110523 - 15 Nov 2023

Viewed by 1127

Abstract

In recent years the number of people who exercise every day has increased dramatically. More precisely, due to COVID period many people have become recreational runners. Recreational running is a regular way to keep active and healthy at any age. Additionally, running is [...] Read more.

In recent years the number of people who exercise every day has increased dramatically. More precisely, due to COVID period many people have become recreational runners. Recreational running is a regular way to keep active and healthy at any age. Additionally, running is a popular physical exercise that offers numerous health advantages. However, recreational runners report a high incidence of musculoskeletal injuries due to running. The healthcare industry has been compelled to use information technology due to the quick rate of growth and developments in electronic systems, the internet, and telecommunications. Our proposed intelligent system uses data mining algorithms for the rehabilitation guidance of recreational runners with musculoskeletal discomfort. The system classifies recreational runners based on a questionnaire that has been built according to the severity, irritability, nature, stage, and stability model and advise them on the appropriate treatment plan/exercises to follow. Through rigorous testing across various case studies, our method has yielded highly promising results, underscoring its potential to significantly contribute to the well-being and rehabilitation of recreational runners facing musculoskeletal challenges. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

16 pages, 3965 KiB

Open AccessArticle

Grammatical Evolution-Driven Algorithm for Efficient and Automatic Hyperparameter Optimisation of Neural Networks

by Gauri Vaidya, Meghana Kshirsagar and Conor Ryan

Algorithms 2023, 16(7), 319; https://doi.org/10.3390/a16070319 - 29 Jun 2023

Viewed by 1980

Abstract

Neural networks have revolutionised the way we approach problem solving across multiple domains; however, their effective design and efficient use of computational resources is still a challenging task. One of the most important factors influencing this process is model hyperparameters which vary significantly [...] Read more.

Neural networks have revolutionised the way we approach problem solving across multiple domains; however, their effective design and efficient use of computational resources is still a challenging task. One of the most important factors influencing this process is model hyperparameters which vary significantly with models and datasets. Recently, there has been an increased focus on automatically tuning these hyperparameters to reduce complexity and to optimise resource utilisation. From traditional human-intuitive tuning methods to random search, grid search, Bayesian optimisation, and evolutionary algorithms, significant advancements have been made in this direction that promise improved performance while using fewer resources. In this article, we propose HyperGE, a two-stage model for automatically tuning hyperparameters driven by grammatical evolution (GE), a bioinspired population-based machine learning algorithm. GE provides an advantage in that it allows users to define their own grammar for generating solutions, making it ideal for defining search spaces across datasets and models. We test HyperGE to fine-tune VGG-19 and ResNet-50 pre-trained networks using three benchmark datasets. We demonstrate that the search space is significantly reduced by a factor of ~90% in Stage 2 with fewer number of trials. HyperGE could become an invaluable tool within the deep learning community, allowing practitioners greater freedom when exploring complex problem domains for hyperparameter fine-tuning. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

16 pages, 3361 KiB

Open AccessArticle

Distributed Fuzzy Cognitive Maps for Feature Selection in Big Data Classification

by K. Haritha, M. V. Judy, Konstantinos Papageorgiou, Vassilis C. Georgiannis and Elpiniki Papageorgiou

Algorithms 2022, 15(10), 383; https://doi.org/10.3390/a15100383 - 19 Oct 2022

Cited by 4 | Viewed by 1745

Abstract

The features of a dataset play an important role in the construction of a machine learning model. Because big datasets often have a large number of features, they may contain features that are less relevant to the machine learning task, which makes the [...] Read more.

The features of a dataset play an important role in the construction of a machine learning model. Because big datasets often have a large number of features, they may contain features that are less relevant to the machine learning task, which makes the process more time-consuming and complex. In order to facilitate learning, it is always recommended to remove the less significant features. The process of eliminating the irrelevant features and finding an optimal feature set involves comprehensively searching the dataset and considering every subset in the data. In this research, we present a distributed fuzzy cognitive map based learning-based wrapper method for feature selection that is able to extract those features from a dataset that play the most significant role in decision making. Fuzzy cognitive maps (FCMs) represent a hybrid computing technique combining elements of both fuzzy logic and cognitive maps. Using Spark’s resilient distributed datasets (RDDs), the proposed model can work effectively in a distributed manner for quick, in-memory processing along with effective iterative computations. According to the experimental results, when the proposed model is applied to a classification task, the features selected by the model help to expedite the classification process. The selection of relevant features using the proposed algorithm is on par with existing feature selection algorithms. In conjunction with a random forest classifier, the proposed model produced an average accuracy above 90%, as opposed to 85.6% accuracy when no feature selection strategy was adopted. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

28 pages, 6356 KiB

Open AccessArticle

A Novel Adaptive FCM with Cooperative Multi-Population Differential Evolution Optimization

by Amit Banerjee and Issam Abu-Mahfouz

Algorithms 2022, 15(10), 380; https://doi.org/10.3390/a15100380 - 17 Oct 2022

Cited by 1 | Viewed by 1406

Abstract

Fuzzy c-means (FCM), the fuzzy variant of the popular k-means, has been used for data clustering when cluster boundaries are not well defined. The choice of initial cluster prototypes (or the initialization of cluster memberships), and the fact that the number of [...] Read more.

Fuzzy c-means (FCM), the fuzzy variant of the popular k-means, has been used for data clustering when cluster boundaries are not well defined. The choice of initial cluster prototypes (or the initialization of cluster memberships), and the fact that the number of clusters needs to be defined a priori are two major factors that can affect the performance of FCM. In this paper, we review algorithms and methods used to overcome these two specific drawbacks. We propose a new cooperative multi-population differential evolution method with elitism to identify near-optimal initial cluster prototypes and also determine the most optimal number of clusters in the data. The differential evolution populations use a smaller subset of the dataset, one that captures the same structure of the dataset. We compare the proposed methodology to newer methods proposed in the literature, with simulations performed on standard benchmark data from the UCI machine learning repository. Finally, we present a case study for clustering time-series patterns from sensor data related to real-time machine health monitoring using the proposed method. Simulation results are promising and show that the proposed methodology can be effective in clustering a wide range of datasets. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

19 pages, 1297 KiB

Open AccessArticle

Detection and Classification of Unannounced Physical Activities and Acute Psychological Stress Events for Interventions in Diabetes Treatment

by Mohammad Reza Askari, Mahmoud Abdel-Latif, Mudassir Rashid, Mert Sevil and Ali Cinar

Algorithms 2022, 15(10), 352; https://doi.org/10.3390/a15100352 - 27 Sep 2022

Cited by 10 | Viewed by 1984

Abstract

Detection and classification of acute psychological stress (APS) and physical activity (PA) in daily lives of people with chronic diseases can provide precision medicine for the treatment of chronic conditions such as diabetes. This study investigates the classification of different types of APS [...] Read more.

Detection and classification of acute psychological stress (APS) and physical activity (PA) in daily lives of people with chronic diseases can provide precision medicine for the treatment of chronic conditions such as diabetes. This study investigates the classification of different types of APS and PA, along with their concurrent occurrences, using the same subset of feature maps via physiological variables measured by a wristband device. Random convolutional kernel transformation is used to extract a large number of feature maps from the biosignals measured by a wristband device (blood volume pulse, galvanic skin response, skin temperature, and 3D accelerometer signals). Three different feature selection techniques (principal component analysis, partial least squares–discriminant analysis (PLS-DA), and sequential forward selection) as well as four approaches for addressing imbalanced sizes of classes (upsampling, downsampling, adaptive synthetic sampling (ADASYN), and weighted training) are evaluated for maximizing detection and classification accuracy. A long short-term memory recurrent neural network model is trained to estimate PA (sedentary state, treadmill run, stationary bike) and APS (non-stress, emotional anxiety stress, mental stress) from wristband signals. The balanced accuracy scores for various combinations of data balancing and feature selection techniques range between 96.82% and 99.99%. The combination of PLS–DA for feature selection and ADASYN for data balancing provide the best overall performance. The detection and classification of APS and PA types along with their concurrent occurrences can provide precision medicine approaches for the treatment of diabetes. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

19 pages, 443 KiB

Open AccessArticle

QFC: A Parallel Software Tool for Feature Construction, Based on Grammatical Evolution

by Ioannis G. Tsoulos

Algorithms 2022, 15(8), 295; https://doi.org/10.3390/a15080295 - 21 Aug 2022

Cited by 4 | Viewed by 2131

Abstract

This paper presents and analyzes a programming tool that implements a method for classification and function regression problems. This method builds new features from existing ones with the assistance of a hybrid algorithm that makes use of artificial neural networks and grammatical evolution. [...] Read more.

This paper presents and analyzes a programming tool that implements a method for classification and function regression problems. This method builds new features from existing ones with the assistance of a hybrid algorithm that makes use of artificial neural networks and grammatical evolution. The implemented software exploits modern multi-core computing units for faster execution. The method has been applied to a variety of classification and function regression problems, and an extensive comparison with other methods of computational intelligence is made. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

20 pages, 15507 KiB

Open AccessArticle

Exploring the Efficiencies of Spectral Isolation for Intelligent Wear Monitoring of Micro Drill Bit Automatic Regrinding In-Line Systems

by Ugochukwu Ejike Akpudo and Jang-Wook Hur

Algorithms 2022, 15(6), 194; https://doi.org/10.3390/a15060194 - 6 Jun 2022

Viewed by 2048

Abstract

Despite the increasing digitalization of equipment diagnostic/condition monitoring systems, it remains a challenge to accurately harness discriminant information from multiple sensors with unique spectral (and transient) behaviors. High-precision systems such as the automatic regrinding in-line equipment provide intelligent regrinding of micro drill bits; [...] Read more.

Despite the increasing digitalization of equipment diagnostic/condition monitoring systems, it remains a challenge to accurately harness discriminant information from multiple sensors with unique spectral (and transient) behaviors. High-precision systems such as the automatic regrinding in-line equipment provide intelligent regrinding of micro drill bits; however, immediate monitoring of the grinder during the grinding process has become necessary because ignoring it directly affects the drill bit’s life and the equipment’s overall utility. Vibration signals from the frame and the high-speed grinding wheels reflect the different health stages of the grinding wheel and can be exploited for intelligent condition monitoring. The spectral isolation technique as a preprocessing tool ensures that only the critical spectral segments of the inputs are retained for improved diagnostic accuracy at reduced computational costs. This study explores artificial intelligence-based models for learning the discriminant spectral information stored in the vibration signals and considers the accuracy and cost implications of spectral isolation of the critical spectral segments of the signals for accurate equipment monitoring. Results from one-dimensional convolutional neural networks (1D-CNN) and multi-layer perceptron (MLP) neural networks, respectively, reveal that spectral isolation offers a higher condition monitoring accuracy at reduced computational costs. Experimental results using different 1D-CNN and MLP architectures reveal 4.6% and 7.5% improved diagnostic accuracy by the 1D-CNNs and MLPs, respectively, at about 1.3% and 5.71% reduced computational costs, respectively. Full article

(This article belongs to the Special Issue Algorithms in Data Classification)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Algorithms in Data Classification

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Related Special Issue

Published Papers (13 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI