Statistical Data Modeling and Machine Learning with Applications, 3rd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 15 December 2025 | Viewed by 5333

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mathematical Analysis, Faculty of Mathematics and Informatics, University of Plovdiv Paisii Hilendarski, 24 Tzar Assen St., 4000 Plovdiv, Bulgaria
Interests: computational statistics; applied mathematics; data mining; computer modeling in physics and engineering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Statistics and machine learning are two intertwined fields of mathematics and computer science. In recent years, very powerful classification and predictive methods have been developed in this area. As a rule, the new methods for statistical data modeling and machine learning provide enormous opportunities for the development of new methods and approaches, as well as for their use to effectively solve practical problems.

The proposed Special Issue aims to publish review papers, research articles, and communications that present new original methods, applications, data analyses, case studies, comparative studies, and other results. Special attention will be given to, but is not limited to, the theory and application of statistical data modeling and machine learning to diverse areas such as computer science, economics, industry, medicine, environmental sciences, forex and finance, education, engineering, marketing, agriculture, and more.

Prof. Dr. Snezhana Gocheva-Ilieva
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computational statistics
  • dimensionality reduction and variable selection
  • nonparametric statistical modeling
  • supervised learning (classification, regression)
  • clustering methods
  • financial statistics and econometrics
  • statistical algorithms
  • time series analysis and forecasting
  • machine learning algorithms
  • decision trees
  • ensemble methods
  • neural networks
  • deep learning
  • hybrid models
  • data analysis

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 5099 KiB  
Article
Predicting Compressive Strength of High-Performance Concrete Using Hybridization of Nature-Inspired Metaheuristic and Gradient Boosting Machine
by Nhat-Duc Hoang, Van-Duc Tran and Xuan-Linh Tran
Mathematics 2024, 12(8), 1267; https://doi.org/10.3390/math12081267 - 22 Apr 2024
Viewed by 355
Abstract
This study proposes a novel integration of the Extreme Gradient Boosting Machine (XGBoost) and Differential Flower Pollination (DFP) for constructing an intelligent method to predict the compressive strength (CS) of high-performance concrete (HPC) mixes. The former is employed to generalize a mapping function [...] Read more.
This study proposes a novel integration of the Extreme Gradient Boosting Machine (XGBoost) and Differential Flower Pollination (DFP) for constructing an intelligent method to predict the compressive strength (CS) of high-performance concrete (HPC) mixes. The former is employed to generalize a mapping function between the mechanical property of concrete and its influencing factors. DFP, as a metaheuristic algorithm, is employed to optimize the learning phase of XGBoost and reach a fine balance between the two goals of model building: reducing the prediction error and maximizing the generalization capability. To construct the proposed method, a historical dataset consisting of 400 samples was collected from previous studies. The model’s performance is reliably assessed via multiple experiments and Wilcoxon signed-rank tests. The hybrid DFP-XGBoost is able to achieve good predictive outcomes with a root mean square error of 5.27, a mean absolute percentage error of 6.74%, and a coefficient of determination of 0.94. Additionally, quantile regression based on XGBoost is performed to construct interval predictions of the CS of HPC. Notably, an asymmetric error loss is used to diminish overestimations committed by the model. It was found that this loss function successfully reduced the percentage of overestimated CS values from 47.1% to 27.5%. Hence, DFP-XGBoost can be a promising approach for accurately and reliably estimating the CS of untested HPC mixes. Full article
Show Figures

Figure 1

15 pages, 5583 KiB  
Article
Hybrid Model of Natural Time Series with Neural Network Component and Adaptive Nonlinear Scheme: Application for Anomaly Detection
by Oksana Mandrikova and Bogdana Mandrikova
Mathematics 2024, 12(7), 1079; https://doi.org/10.3390/math12071079 - 03 Apr 2024
Viewed by 437
Abstract
It is often difficult to describe natural time series due to implicit dependences and correlated noise. During anomalous natural processes, anomalous features appear in data. They have a nonstationary structure and do not allow us to apply traditional methods for time series modeling. [...] Read more.
It is often difficult to describe natural time series due to implicit dependences and correlated noise. During anomalous natural processes, anomalous features appear in data. They have a nonstationary structure and do not allow us to apply traditional methods for time series modeling. In order to solve these problems, new models, adequately describing natural data, are required. A new hybrid model of a time series (HMTS) with a nonstationary structure is proposed in this paper. The HMTS has regular and anomalous components. The HMTS regular component is determined on the basis of an autoencoder neural network. To describe the HMTS anomalous component, an adaptive nonlinear approximating scheme (ANAS) is used on a wavelet basis. HMTS is considered in this investigation for the problem of neutron monitor data modeling and anomaly detection. Anomalies in neutron monitor data indicate negative factors in space weather. The timely detection of these factors is critically important. This investigation showed that the developed HMTS adequately describes neutron monitor data and has satisfactory results from the point of view of numeric performance. The MSE model values are close to 0 and errors are white Gaussian noise. In order to optimize the estimate of the HMTS anomalous component, the likelihood ratio test was applied. Moreover, the wavelet basis, giving the least losses during ANAS construction, was determined. Statistical modeling results showed that HMTS provides a high accuracy of anomaly detection. When the signal/noise ratio is 1.3 and anomaly durations are more than 60 counts, the probability of their detection is close to 90%. This is a high rate in the problem domain under consideration and provides solution reliability of the problem of anomaly detection in neutron monitor data. Moreover, the processing of data from several neutron monitor stations showed the high sensitivity of the HMTS. This shows the possibility to minimize the number of engaged stations, maintaining anomaly detection accuracy compared to the global survey method widely used in this field. This result is important as the continuous operation of neutron monitor stations is not always provided. Thus, the results show that the developed HMTS has the potential to address the problem of anomaly detection in neutron monitor data even when the number of operating stations is small. The proposed HMTS can help us to decrease the risks of the negative impact of space weather anomalies on human health and modern infrastructure. Full article
Show Figures

Figure 1

17 pages, 1883 KiB  
Article
Analysis of a Predictive Mathematical Model of Weather Changes Based on Neural Networks
by Boris V. Malozyomov, Nikita V. Martyushev, Svetlana N. Sorokova, Egor A. Efremenkov, Denis V. Valuev and Mengxu Qi
Mathematics 2024, 12(3), 480; https://doi.org/10.3390/math12030480 - 02 Feb 2024
Viewed by 1268
Abstract
In this paper, we investigate mathematical models of meteorological forecasting based on the work of neural networks, which allow us to calculate presumptive meteorological parameters of the desired location on the basis of previous meteorological data. A new method of grouping neural networks [...] Read more.
In this paper, we investigate mathematical models of meteorological forecasting based on the work of neural networks, which allow us to calculate presumptive meteorological parameters of the desired location on the basis of previous meteorological data. A new method of grouping neural networks to obtain a more accurate output result is proposed. An algorithm is presented, based on which the most accurate meteorological forecast was obtained based on the results of the study. This algorithm can be used in a wide range of situations, such as obtaining data for the operation of equipment in a given location and studying meteorological parameters of the location. To build this model, we used data obtained from personal weather stations of the Weather Underground company and the US National Digital Forecast Database (NDFD). Also, a Google remote learning machine was used to compare the results with existing products on the market. The algorithm for building the forecast model covered several locations across the US in order to compare its performance in different weather zones. Different methods of training the machine to produce the most effective weather forecast result were also considered. Full article
Show Figures

Figure 1

14 pages, 613 KiB  
Article
Beyond Traditional Assessment: A Fuzzy Logic-Infused Hybrid Approach to Equitable Proficiency Evaluation via Online Practice Tests
by Todorka Glushkova, Vanya Ivanova and Boyan Zlatanov
Mathematics 2024, 12(3), 371; https://doi.org/10.3390/math12030371 - 24 Jan 2024
Viewed by 583
Abstract
This article presents a hybrid approach to assessing students’ foreign language proficiency in a cyber–physical educational environment. It focuses on the advantages of the integrated assessment of student knowledge by considering the impact of automatic assessment, learners’ independent work, and their achievements to [...] Read more.
This article presents a hybrid approach to assessing students’ foreign language proficiency in a cyber–physical educational environment. It focuses on the advantages of the integrated assessment of student knowledge by considering the impact of automatic assessment, learners’ independent work, and their achievements to date. An assessment approach is described using the mathematical theory of fuzzy functions, which are employed to ensure the fair evaluation of students. The largest possible number of students whose reevaluation of test results will not affect the overall performance of the student group is automatically determined. The study also models the assessment process in the cyber–physical educational environment through the formal semantics of calculus of context-aware ambients (CCAs). Full article
Show Figures

Figure 1

13 pages, 2193 KiB  
Article
Enhanced Checkerboard Detection Using Gaussian Processes
by Michaël Hillen, Ivan De Boi, Thomas De Kerf, Seppe Sels, Edgar Cardenas De La Hoz, Jona Gladines, Gunther Steenackers, Rudi Penne and Steve Vanlanduit
Mathematics 2023, 11(22), 4568; https://doi.org/10.3390/math11224568 - 07 Nov 2023
Viewed by 981
Abstract
Accurate checkerboard detection is of vital importance for computer vision applications, and a variety of checkerboard detectors have been developed in the past decades. While some detectors are able to handle partially occluded checkerboards, they fail when a large occlusion completely divides the [...] Read more.
Accurate checkerboard detection is of vital importance for computer vision applications, and a variety of checkerboard detectors have been developed in the past decades. While some detectors are able to handle partially occluded checkerboards, they fail when a large occlusion completely divides the checkerboard. We propose a new checkerboard detection pipeline for occluded checkerboards that has a robust performance under varying levels of noise, blurring, and distortion, and for a variety of imaging modalities. This pipeline consists of a checkerboard detector and checkerboard enhancement with Gaussian processes (GP). By learning a mapping from local board coordinates to image pixel coordinates via a Gaussian process, we can fill in occluded corners, expand the board beyond the image borders, allocate detected corners that do not fit an initial grid, and remove noise on the detected corner locations. We show that our method can improve the performance of other publicly available state-of-the-art checkerboard detectors, both in terms of accuracy and the number of corners detected. Our code and datasets are made publicly available. The checkerboard detector pipeline is contained within our Python checkerboard detection library, called PyCBD. The pipeline itself is modular and easy to adapt to different use cases. Full article
Show Figures

Figure 1

15 pages, 2194 KiB  
Article
Enhancing Medical Image Segmentation: Ground Truth Optimization through Evaluating Uncertainty in Expert Annotations
by Georgios Athanasiou, Josep Lluis Arcos and Jesus Cerquides
Mathematics 2023, 11(17), 3771; https://doi.org/10.3390/math11173771 - 02 Sep 2023
Viewed by 1194
Abstract
The surge of supervised learning methods for segmentation lately has underscored the critical role of label quality in predicting performance. This issue is prevalent in the domain of medical imaging, where high annotation costs and inter-observer variability pose significant challenges. Acquiring labels commonly [...] Read more.
The surge of supervised learning methods for segmentation lately has underscored the critical role of label quality in predicting performance. This issue is prevalent in the domain of medical imaging, where high annotation costs and inter-observer variability pose significant challenges. Acquiring labels commonly involves multiple experts providing their interpretations of the “true” segmentation labels, each influenced by their individual biases. The blind acceptance of these noisy labels as the ground truth restricts the potential effectiveness of segmentation algorithms. Here, we apply coupled convolutional neural network approaches to a small-sized real-world dataset of bovine cumulus oocyte complexes. This is the first time these methods have been applied to a real-world annotation medical dataset, since they were previously tested only on artificially generated labels of medical and non-medical datasets. This dataset is crucial for healthy embryo development. Its application revealed an important challenge: the inability to effectively learn distinct confusion matrices for each expert due to large areas of agreement. In response, we propose a novel method that focuses on areas of high uncertainty. This approach allows us to understand the individual characteristics better, extract their behavior, and use this insight to create a more sophisticated ground truth using maximum likelihood. These findings contribute to the ongoing discussion of leveraging machine learning algorithms for medical image segmentation, particularly in scenarios involving multiple human annotators. Full article
Show Figures

Figure 1

Back to TopTop