Topic Editors

Institut für Statistik, Alpen-Adria Universität Klagenfurt, Universitätsstraße 65, 9020 Klagenfurt, Austria
Dr. Noelle I. Samia
Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208, USA
Prof. Dr. Dirk Husmeier
School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8QQ, UK

Interfacing Statistics, Machine Learning and Data Science from a Probabilistic Modelling Viewpoint

Abstract submission deadline
31 July 2024
Manuscript submission deadline
31 December 2024
Viewed by
2582

Topic Information

Dear Colleagues,

Modern statistics is the science of learning from data. As a discipline, it is concerned with the collection, analysis, and interpretation of data, as well as the effective communication and presentation of results relying on data. Statistics is a highly interdisciplinary field; in developing methods and studying the theory that underlies the methods, statisticians draw on a great variety of mathematical and computational tools.

Today, vast amounts of data are transforming the world and the way we live in it. Statistical methods and theories are used everywhere, from health, science and business to managing traffic and studying sustainability and climate change. This, in turn, will create the need for a much closer collaboration between statisticians, mathematicians, computer scientists and domain scientists. The call for a new generation of data scientists working at this interface is becoming louder and louder; there is a strong need to develop data-science university curricula.

Undoubtedly, fundamental statistical research has laid important foundations upon which Data Science approaches have been established. Conversely, modern (applied) statistics is continuing to pave a broad road to its data-science future.

Machine Learning has substantially advanced through statistical learning. Two fundamental ideas in the field of statistical learning are uncertainty and variation. The common basis for dealing with these complex issues is probabilistic modelling of the problems at hand.

The aim of this Topic is to encourage interested researchers in applied mathematics and statistics, engineering science disciplines, and bio-, geo- and environmental sciences to present original and recent developments on interfacing statistical inference with advanced machine learning and data science concepts and approaches for model selection, data analysis, estimation and prediction, uncertainty quantification and risk analysis in their research work. We particularly welcome novel applications of these concepts for the following:

  • Statistical process control in industrial manufacturing;
  • Predicting natural hazards and climate change processes;
  • Graph modelling for energy, telecommunication and environmental monitoring;
  • Development of efficient numerical algorithms for big data analysis;
  • Model estimation (including variable selection) and validation;
  • Regularisation methods;
  • Causal inference and targeted learning;
  • Ensemble learning methods.

Prof. Dr. Jürgen Pilz
Dr. Noelle I. Samia
Prof. Dr. Dirk Husmeier
Topic Editors

Keywords

  • probability and stochastic processes
  • statistical inference
  • information theory
  • statistical learning
  • regression and classification
  • estimation and prediction
  • hypothesis testing
  • time-series analysis
  • causal inference
  • uncertainty quantification

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Entropy
entropy
2.7 4.7 1999 20.8 Days CHF 2600 Submit
Mathematics
mathematics
2.4 3.5 2013 16.9 Days CHF 2600 Submit
Modelling
modelling
- - 2020 15.8 Days CHF 1000 Submit
Stats
stats
1.3 0.3 2018 15.8 Days CHF 1600 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (2 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
18 pages, 1520 KiB  
Article
Utility in Time Description in Priority Best–Worst Discrete Choice Models: An Empirical Evaluation Using Flynn’s Data
by Sasanka Adikari and Norou Diawara
Stats 2024, 7(1), 185-202; https://doi.org/10.3390/stats7010012 - 19 Feb 2024
Viewed by 890
Abstract
Discrete choice models (DCMs) are applied in many fields and in the statistical modelling of consumer behavior. This paper focuses on a form of choice experiment, best–worst scaling in discrete choice experiments (DCEs), and the transition probability of a choice of a consumer [...] Read more.
Discrete choice models (DCMs) are applied in many fields and in the statistical modelling of consumer behavior. This paper focuses on a form of choice experiment, best–worst scaling in discrete choice experiments (DCEs), and the transition probability of a choice of a consumer over time. The analysis was conducted by using simulated data (choice pairs) based on data from Flynn’s (2007) ‘Quality of Life Experiment’. Most of the traditional approaches assume the choice alternatives are mutually exclusive over time, which is a questionable assumption. We introduced a new copula-based model (CO-CUB) for the transition probability, which can handle the dependent structure of best–worst choices while applying a very practical constraint. We used a conditional logit model to calculate the utility at consecutive time points and spread it to future time points under dynamic programming. We suggest that the CO-CUB transition probability algorithm is a novel way to analyze and predict choices in future time points by expressing human choice behavior. The numerical results inform decision making, help formulate strategy and learning algorithms under dynamic utility in time for best–worst DCEs. Full article
Show Figures

Figure 1

22 pages, 1145 KiB  
Article
Transfer Learning in Multiple Hypothesis Testing
by Stefano Cabras and María Eugenia Castellanos Nueda
Entropy 2024, 26(1), 49; https://doi.org/10.3390/e26010049 - 04 Jan 2024
Viewed by 1033
Abstract
In this investigation, a synthesis of Convolutional Neural Networks (CNNs) and Bayesian inference is presented, leading to a novel approach to the problem of Multiple Hypothesis Testing (MHT). Diverging from traditional paradigms, this study introduces a sequence-based uncalibrated Bayes factor approach to test [...] Read more.
In this investigation, a synthesis of Convolutional Neural Networks (CNNs) and Bayesian inference is presented, leading to a novel approach to the problem of Multiple Hypothesis Testing (MHT). Diverging from traditional paradigms, this study introduces a sequence-based uncalibrated Bayes factor approach to test many hypotheses using the same family of sampling parametric models. A two-step methodology is employed: initially, a learning phase is conducted utilizing simulated datasets encompassing a wide spectrum of null and alternative hypotheses, followed by a transfer phase applying this fitted model to real-world experimental sequences. The outcome is a CNN model capable of navigating the complex domain of MHT with improved precision over traditional methods, also demonstrating robustness under varying conditions, including the number of true nulls and dependencies between tests. Although indications of empirical evaluations are presented and show that the methodology will prove useful, more work is required to provide a full evaluation from a theoretical perspective. The potential of this innovative approach is further illustrated within the critical domain of genomics. Although formal proof of the consistency of the model remains elusive due to the inherent complexity of the algorithms, this paper also provides some theoretical insights and advocates for continued exploration of this methodology. Full article
Show Figures

Figure 1

Back to TopTop