entropy-logo

Journal Browser

Journal Browser

Data Science: Measuring Uncertainties II

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Multidisciplinary Applications".

Deadline for manuscript submissions: closed (31 October 2022) | Viewed by 16299

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, São Paulo 05508-900, Brazil
Interests: Bayesian statistics; controversies and paradoxes in probability and statistics; Bayesian reliability; Bayesian analysis of discrete data (BADD); applied statistics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Physics, Maths and Computing, Mathematics and Statistics, University of Western Australia, Crawley, WA 6009, Australia
Interests: Bayesian inference; data science; foundations of statistics; model selection; reliability and survival analysis; significance test
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Statistics, Federal University of Espirito Santo, Vitória, ES 29075-910, Brazil
Interests: data analysis; statistical analysis; statistical modeling; applied statistics; R statistical package

E-Mail Website
Guest Editor
School of Physics, Maths and Computing, Computer Science, University of Western Australia, Crawley, WA 6009, Australia
Interests: data analysis; complex systems; machine learning; time series analysis; applied dynamical systems; statistical analysis

Special Issue Information

Dear Colleagues,

The demand for data analysis is increasing day by day, and this is reflected in a large number of jobs and the high number of published articles. New solutions to the problems seem to be reproducing at a massive rate. A new era is coming! The dazzle is so great that many of us do not bother to check the suitability of the solutions for the problems that they are intended to solve. Current and future challenges require greater care in the creation of new solutions satisfying the rationality of each type of problem. Labels such as big data, data science, machine learning, statistical learning, and artificial intelligence are demanding more sophistication in the fundamentals and in the way that they are being applied.

This Special Issue is dedicated to solutions for and discussions of measuring uncertainties in data analysis problems. For example, considering the large amount of data related to an IoT (internet of things) problem, or even considering the small sample size of a biological study with huge dimensions, one must show how to properly understand the data, how to develop the best process of analysis and, finally, to illustrate how to apply the solutions that were obtained theoretically. We seek to respond to these challenges and publish papers that consider the reasons for a solution and how to apply them. Papers can cover existing methodologies by elucidating questions related to the reasons for their selection and their uses.

We are open to innovative solutions and theoretical works that justify the use of a method and to applied works that describe a good implementation of a theoretical method.

Prof. Dr. Carlos Alberto De Bragança Pereira
Prof. Dr. Adriano Polpo
Dr. Agatha Rodrigues
Dr. Débora Corrêa
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Related Special Issue

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 667 KiB  
Article
From p-Values to Posterior Probabilities of Null Hypotheses
by Daiver Vélez Ramos, Luis R. Pericchi Guerra and María Eglée Pérez Hernández
Entropy 2023, 25(4), 618; https://doi.org/10.3390/e25040618 - 06 Apr 2023
Cited by 1 | Viewed by 1157
Abstract
Minimum Bayes factors are commonly used to transform two-sided p-values to lower bounds on the posterior probability of the null hypothesis, in particular the bound e·p·log(p). This bound is easy to compute and [...] Read more.
Minimum Bayes factors are commonly used to transform two-sided p-values to lower bounds on the posterior probability of the null hypothesis, in particular the bound e·p·log(p). This bound is easy to compute and explain; however, it does not behave as a Bayes factor. For example, it does not change with the sample size. This is a very serious defect, particularly for moderate to large sample sizes, which is precisely the situation in which p-values are the most problematic. In this article, we propose adjusting this minimum Bayes factor with the information to approximate an exact Bayes factor, not only when p is a p-value but also when p is a pseudo-p-value. Additionally, we develop a version of the adjustment for linear models using the recent refinement of the Prior-Based BIC. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Figure 1

21 pages, 1562 KiB  
Article
Causal Confirmation Measures: From Simpson’s Paradox to COVID-19
by Chenguang Lu
Entropy 2023, 25(1), 143; https://doi.org/10.3390/e25010143 - 10 Jan 2023
Cited by 2 | Viewed by 1924
Abstract
When we compare the influences of two causes on an outcome, if the conclusion from every group is against that from the conflation, we think there is Simpson’s Paradox. The Existing Causal Inference Theory (ECIT) can make the overall conclusion consistent with the [...] Read more.
When we compare the influences of two causes on an outcome, if the conclusion from every group is against that from the conflation, we think there is Simpson’s Paradox. The Existing Causal Inference Theory (ECIT) can make the overall conclusion consistent with the grouping conclusion by removing the confounder’s influence to eliminate the paradox. The ECIT uses relative risk difference Pd = max(0, (R − 1)/R) (R denotes the risk ratio) as the probability of causation. In contrast, Philosopher Fitelson uses confirmation measure D (posterior probability minus prior probability) to measure the strength of causation. Fitelson concludes that from the perspective of Bayesian confirmation, we should directly accept the overall conclusion without considering the paradox. The author proposed a Bayesian confirmation measure b* similar to Pd before. To overcome the contradiction between the ECIT and Bayesian confirmation, the author uses the semantic information method with the minimum cross-entropy criterion to deduce causal confirmation measure Cc = (R − 1)/max(R, 1). Cc is like Pd but has normalizing property (between −1 and 1) and cause symmetry. It especially fits cases where a cause restrains an outcome, such as the COVID-19 vaccine controlling the infection. Some examples (about kidney stone treatments and COVID-19) reveal that Pd and Cc are more reasonable than D; Cc is more useful than Pd. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Figure 1

24 pages, 1317 KiB  
Article
TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block
by Ran Li, Hongchang Chen, Shuxin Liu, Kai Wang, Biao Wang and Xinxin Hu
Entropy 2023, 25(1), 112; https://doi.org/10.3390/e25010112 - 05 Jan 2023
Cited by 5 | Viewed by 1461
Abstract
Telecom fraud detection is of great significance in online social networks. Yet the massive, redundant, incomplete, and uncertain network information makes it a challenging task to handle. Hence, this paper mainly uses the correlation of attributes by entropy function to optimize the data [...] Read more.
Telecom fraud detection is of great significance in online social networks. Yet the massive, redundant, incomplete, and uncertain network information makes it a challenging task to handle. Hence, this paper mainly uses the correlation of attributes by entropy function to optimize the data quality and then solves the problem of telecommunication fraud detection with incomplete information. First, to filter out redundancy and noise, we propose an attribute reduction algorithm based on max-correlation and max-independence rate (MCIR) to improve data quality. Then, we design a rough-gain anomaly detection algorithm (MCIR-RGAD) using the idea of maximal consistent blocks to deal with missing incomplete data. Finally, the experimental results on authentic telecommunication fraud data and UCI data show that the MCIR-RGAD algorithm provides an effective solution for reducing the computation time, improving the data quality, and processing incomplete data. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Graphical abstract

20 pages, 557 KiB  
Article
Adaptive Significance Levels in Tests for Linear Regression Models: The e-Value and P-Value Cases
by Alejandra E. Patiño Hoyos, Victor Fossaluza, Luís Gustavo Esteves and Carlos Alberto de Bragança Pereira
Entropy 2023, 25(1), 19; https://doi.org/10.3390/e25010019 - 22 Dec 2022
Cited by 2 | Viewed by 1278
Abstract
The full Bayesian significance test (FBST) for precise hypotheses is a Bayesian alternative to the traditional significance tests based on p-values. The FBST is characterized by the e-value as an evidence index in favor of the null hypothesis (H). [...] Read more.
The full Bayesian significance test (FBST) for precise hypotheses is a Bayesian alternative to the traditional significance tests based on p-values. The FBST is characterized by the e-value as an evidence index in favor of the null hypothesis (H). An important practical issue for the implementation of the FBST is to establish how small the evidence against H must be in order to decide for its rejection. In this work, we present a method to find a cutoff value for the e-value in the FBST by minimizing the linear combination of the averaged type-I and type-II error probabilities for a given sample size and also for a given dimensionality of the parameter space. Furthermore, we compare our methodology with the results obtained from the test with adaptive significance level, which presents the capital-P P-value as a decision-making evidence measure. For this purpose, the scenario of linear regression models with unknown variance under the Bayesian approach is considered. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Figure 1

16 pages, 490 KiB  
Article
Modeling Overdispersed Dengue Data via Poisson Inverse Gaussian Regression Model: A Case Study in the City of Campo Grande, MS, Brazil
by Erlandson Ferreira Saraiva, Valdemiro Piedade Vigas, Mariana Villela Flesch, Mark Gannon and Carlos Alberto de Bragança Pereira
Entropy 2022, 24(9), 1256; https://doi.org/10.3390/e24091256 - 07 Sep 2022
Viewed by 1634
Abstract
Dengue fever is a tropical disease transmitted mainly by the female Aedes aegypti mosquito that affects millions of people every year. As there is still no safe and effective vaccine, currently the best way to prevent the disease is to control the proliferation [...] Read more.
Dengue fever is a tropical disease transmitted mainly by the female Aedes aegypti mosquito that affects millions of people every year. As there is still no safe and effective vaccine, currently the best way to prevent the disease is to control the proliferation of the transmitting mosquito. Since the proliferation and life cycle of the mosquito depend on environmental variables such as temperature and water availability, among others, statistical models are needed to understand the existing relationships between environmental variables and the recorded number of dengue cases and predict the number of cases for some future time interval. This prediction is of paramount importance for the establishment of control policies. In general, dengue-fever datasets contain the number of cases recorded periodically (in days, weeks, months or years). Since many dengue-fever datasets tend to be of the overdispersed, long-tail type, some common models like the Poisson regression model or negative binomial regression model are not adequate to model it. For this reason, in this paper we propose modeling a dengue-fever dataset by using a Poisson-inverse-Gaussian regression model. The main advantage of this model is that it adequately models overdispersed long-tailed data because it has a wider skewness range than the negative binomial distribution. We illustrate the application of this model in a real dataset and compare its performance to that of a negative binomial regression model. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Figure 1

26 pages, 560 KiB  
Article
On the Choice of the Item Response Model for Scaling PISA Data: Model Selection Based on Information Criteria and Quantifying Model Uncertainty
by Alexander Robitzsch
Entropy 2022, 24(6), 760; https://doi.org/10.3390/e24060760 - 27 May 2022
Cited by 11 | Viewed by 2025
Abstract
In educational large-scale assessment studies such as PISA, item response theory (IRT) models are used to summarize students’ performance on cognitive test items across countries. In this article, the impact of the choice of the IRT model on the distribution parameters of countries [...] Read more.
In educational large-scale assessment studies such as PISA, item response theory (IRT) models are used to summarize students’ performance on cognitive test items across countries. In this article, the impact of the choice of the IRT model on the distribution parameters of countries (i.e., mean, standard deviation, percentiles) is investigated. Eleven different IRT models are compared using information criteria. Moreover, model uncertainty is quantified by estimating model error, which can be compared with the sampling error associated with the sampling of students. The PISA 2009 dataset for the cognitive domains mathematics, reading, and science is used as an example of the choice of the IRT model. It turned out that the three-parameter logistic IRT model with residual heterogeneity and a three-parameter IRT model with a quadratic effect of the ability θ provided the best model fit. Furthermore, model uncertainty was relatively small compared to sampling error regarding country means in most cases but was substantial for country standard deviations and percentiles. Consequently, it can be argued that model error should be included in the statistical inference of educational large-scale assessment studies. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Figure 1

17 pages, 1100 KiB  
Article
Neural Networks for Financial Time Series Forecasting
by Kady Sako, Berthine Nyunga Mpinda and Paulo Canas Rodrigues
Entropy 2022, 24(5), 657; https://doi.org/10.3390/e24050657 - 07 May 2022
Cited by 10 | Viewed by 5510
Abstract
Financial and economic time series forecasting has never been an easy task due to its sensibility to political, economic and social factors. For this reason, people who invest in financial markets and currency exchange are usually looking for robust models that can ensure [...] Read more.
Financial and economic time series forecasting has never been an easy task due to its sensibility to political, economic and social factors. For this reason, people who invest in financial markets and currency exchange are usually looking for robust models that can ensure them to maximize their profile and minimize their losses as much as possible. Fortunately, recently, various studies have speculated that a special type of Artificial Neural Networks (ANNs) called Recurrent Neural Networks (RNNs) could improve the predictive accuracy of the behavior of the financial data over time. This paper aims to forecast: (i) the closing price of eight stock market indexes; and (ii) the closing price of six currency exchange rates related to the USD, using the RNNs model and its variants: the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The results show that the GRU gives the overall best results, especially for the univariate out-of-sample forecasting for the currency exchange rates and multivariate out-of-sample forecasting for the stock market indexes. Full article
(This article belongs to the Special Issue Data Science: Measuring Uncertainties II)
Show Figures

Figure 1

Back to TopTop