entropy-logo

Journal Browser

Journal Browser

Machine/Statistical Learning and Modeling with Potential Applications in Entropy, Information Theory, and Artificial Intelligence

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Statistical Physics".

Deadline for manuscript submissions: closed (30 April 2021) | Viewed by 29224

Special Issue Editor


E-Mail Website
Guest Editor
School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile
Interests: advanced applied multivariate analysis; artificial intelligence, deep learning, and machine learning; big data, business intelligence, data mining, and data science; statistical learning and modeling
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Today, regression is a supervised technique widely used in data science, data mining, machine learning, and statistical learning. Although the focus of this Special Issue is the machine/statistical learning and modeling, we welcome contributions in artificial intelligence, classification, and unsupervised learning, as well as in the topics detailed below. We strongly encourage interdisciplinary works with real data.

This Special Issue looks for submissions in but not limited to the following areas:

(i) Machine learning and clustering;
(ii) Artificial intelligence;
(iii) Big data, dimensionality high, and large-scale data analysis in supervised learning;
(iv) Multivariate analysis with emphasis in dimensionality reduction, such as PCA, PLS, and others;
(v) Genetic algorithms, particle swarm optimization, and others, for supervised learning;
(vi) Applications of supervised learning and data science in entropy and information theory;
(vii) Bayesian methods;
(viii) Global and local influence diagnostics in supervised learning.

Prof. Dr. Victor Leiva
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Artificial intelligence
  • Bayesian methods
  • Big data, data mining, and data science
  • Clustering
  • Entropy and information theory
  • Global and local diagnostics
  • Machine learning
  • PLS regression and PCA regression
  • Statistical learning and modeling

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 4586 KiB  
Article
Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering
by Meshal Shutaywi and Nezamoddin N. Kachouie
Entropy 2021, 23(6), 759; https://doi.org/10.3390/e23060759 - 16 Jun 2021
Cited by 76 | Viewed by 8273
Abstract
Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means [...] Read more.
Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept. Full article
Show Figures

Figure 1

17 pages, 1401 KiB  
Article
A New Two-Stage Algorithm for Solving Optimization Problems
by Sajjad Amiri Doumari, Hadi Givi, Mohammad Dehghani, Zeinab Montazeri, Victor Leiva and Josep M. Guerrero
Entropy 2021, 23(4), 491; https://doi.org/10.3390/e23040491 - 20 Apr 2021
Cited by 31 | Viewed by 2958
Abstract
Optimization seeks to find inputs for an objective function that result in a maximum or minimum. Optimization methods are divided into exact and approximate (algorithms). Several optimization algorithms imitate natural phenomena, laws of physics, and behavior of living organisms. Optimization based on algorithms [...] Read more.
Optimization seeks to find inputs for an objective function that result in a maximum or minimum. Optimization methods are divided into exact and approximate (algorithms). Several optimization algorithms imitate natural phenomena, laws of physics, and behavior of living organisms. Optimization based on algorithms is the challenge that underlies machine learning, from logistic regression to training neural networks for artificial intelligence. In this paper, a new algorithm called two-stage optimization (TSO) is proposed. The TSO algorithm updates population members in two steps at each iteration. For this purpose, a group of good population members is selected and then two members of this group are randomly used to update the position of each of them. This update is based on the first selected good member at the first stage, and on the second selected good member at the second stage. We describe the stages of the TSO algorithm and model them mathematically. Performance of the TSO algorithm is evaluated for twenty-three standard objective functions. In order to compare the optimization results of the TSO algorithm, eight other competing algorithms are considered, including genetic, gravitational search, grey wolf, marine predators, particle swarm, teaching-learning-based, tunicate swarm, and whale approaches. The numerical results show that the new algorithm is superior and more competitive in solving optimization problems when compared with other algorithms. Full article
Show Figures

Figure 1

23 pages, 602 KiB  
Article
Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
by Carlos A. Palacios, José A. Reyes-Suárez, Lorena A. Bearzotti, Víctor Leiva and Carolina Marchant
Entropy 2021, 23(4), 485; https://doi.org/10.3390/e23040485 - 20 Apr 2021
Cited by 50 | Viewed by 6993
Abstract
Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting [...] Read more.
Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student’s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms. Full article
Show Figures

Figure 1

17 pages, 413 KiB  
Article
Change Point Test for the Conditional Mean of Time Series of Counts Based on Support Vector Regression
by Sangyeol Lee and Sangjo Lee
Entropy 2021, 23(4), 433; https://doi.org/10.3390/e23040433 - 07 Apr 2021
Cited by 2 | Viewed by 1515
Abstract
This study considers support vector regression (SVR) and twin SVR (TSVR) for the time series of counts, wherein the hyper parameters are tuned using the particle swarm optimization (PSO) method. For prediction, we employ the framework of integer-valued generalized autoregressive conditional heteroskedasticity (INGARCH) [...] Read more.
This study considers support vector regression (SVR) and twin SVR (TSVR) for the time series of counts, wherein the hyper parameters are tuned using the particle swarm optimization (PSO) method. For prediction, we employ the framework of integer-valued generalized autoregressive conditional heteroskedasticity (INGARCH) models. As an application, we consider change point problems, using the cumulative sum (CUSUM) test based on the residuals obtained from the PSO-SVR and PSO-TSVR methods. We conduct Monte Carlo simulation experiments to illustrate the methods’ validity with various linear and nonlinear INGARCH models. Subsequently, a real data analysis, with the return times of extreme events constructed based on the daily log-returns of Goldman Sachs stock prices, is conducted to exhibit its scope of application. Full article
Show Figures

Figure 1

15 pages, 545 KiB  
Article
Co-Training for Visual Object Recognition Based on Self-Supervised Models Using a Cross-Entropy Regularization
by Gabriel Díaz, Billy Peralta, Luis Caro and Orietta Nicolis
Entropy 2021, 23(4), 423; https://doi.org/10.3390/e23040423 - 01 Apr 2021
Cited by 7 | Viewed by 2213
Abstract
Automatic recognition of visual objects using a deep learning approach has been successfully applied to multiple areas. However, deep learning techniques require a large amount of labeled data, which is usually expensive to obtain. An alternative is to use semi-supervised models, such as [...] Read more.
Automatic recognition of visual objects using a deep learning approach has been successfully applied to multiple areas. However, deep learning techniques require a large amount of labeled data, which is usually expensive to obtain. An alternative is to use semi-supervised models, such as co-training, where multiple complementary views are combined using a small amount of labeled data. A simple way to associate views to visual objects is through the application of a degree of rotation or a type of filter. In this work, we propose a co-training model for visual object recognition using deep neural networks by adding layers of self-supervised neural networks as intermediate inputs to the views, where the views are diversified through the cross-entropy regularization of their outputs. Since the model merges the concepts of co-training and self-supervised learning by considering the differentiation of outputs, we called it Differential Self-Supervised Co-Training (DSSCo-Training). This paper presents some experiments using the DSSCo-Training model to well-known image datasets such as MNIST, CIFAR-100, and SVHN. The results indicate that the proposed model is competitive with the state-of-art models and shows an average relative improvement of 5% in accuracy for several datasets, despite its greater simplicity with respect to more recent approaches. Full article
Show Figures

Figure 1

16 pages, 1031 KiB  
Article
Research on the Prediction of A-Share “High Stock Dividend” Phenomenon—A Feature Adaptive Improved Multi-Layers Ensemble Model
by Yi Fu, Bingwen Li, Jinshi Zhao and Qianwen Bi
Entropy 2021, 23(4), 416; https://doi.org/10.3390/e23040416 - 31 Mar 2021
Cited by 2 | Viewed by 1637
Abstract
Since the “high stock dividend” of A-share companies in China often leads to the short-term stock price increase, this phenomenon’s prediction has been widely concerned by academia and industry. In this study, a new multi-layer stacking ensemble algorithm is proposed. Unlike the classic [...] Read more.
Since the “high stock dividend” of A-share companies in China often leads to the short-term stock price increase, this phenomenon’s prediction has been widely concerned by academia and industry. In this study, a new multi-layer stacking ensemble algorithm is proposed. Unlike the classic stacking ensemble algorithm that focused on the differentiation of base models, this paper used the equal weight comprehensive feature evaluation method to select features before predicting the base model and used a genetic algorithm to match the optimal feature subset for each base model. After the base model’s output prediction, the LightGBM (LGB) model was added to the algorithm as a secondary information extraction layer. Finally, the algorithm inputs the extracted information into the Logistic Regression (LR) model to complete the prediction of the “high stock dividend” phenomenon. Using the A-share market data from 2010 to 2019 for simulation and evaluation, the proposed model improves the AUC (Area Under Curve) and F1 score by 0.173 and 0.303, respectively, compared to the baseline model. The prediction results shed light on event-driven investment strategies. Full article
Show Figures

Figure 1

12 pages, 938 KiB  
Article
Breakpoint Analysis for the COVID-19 Pandemic and Its Effect on the Stock Markets
by Karime Chahuán-Jiménez, Rolando Rubilar, Hanns de la Fuente-Mella and Víctor Leiva
Entropy 2021, 23(1), 100; https://doi.org/10.3390/e23010100 - 12 Jan 2021
Cited by 30 | Viewed by 4198
Abstract
In this research, statistical models are formulated to study the effect of the health crisis arising from COVID-19 in global markets. Breakpoints in the price series of stock indexes are considered. Such indexes are used as an approximation of the stock markets in [...] Read more.
In this research, statistical models are formulated to study the effect of the health crisis arising from COVID-19 in global markets. Breakpoints in the price series of stock indexes are considered. Such indexes are used as an approximation of the stock markets in different countries, taking into account that they are indicative of these markets because of their composition. The main results obtained in this investigation highlight that countries with better institutional and economic conditions are less affected by the pandemic. In addition, the effect of the health index in the models is associated with their non-significant parameters. This is due to that the health index used in the modeling would not determine the different capacities of the countries analyzed to respond efficiently to the pandemic effect. Therefore, the contagion is the preponderant factor when analyzing the structural breakdown that occurred in the world economy. Full article
Show Figures

Figure 1

Back to TopTop