Recent Advances in Computational Statistics

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Computational and Applied Mathematics".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 22323

Special Issue Editors

Department of Management Science and Statistics, University of Texas at San Antonio, San Antonio, TX 7824, USA
Interests: bayesian statistics; computational statistics; high-dimensional inference; statsitical machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mathematics & Statistics, Saint Louis University, St. Louis, MO 63103, USA
Interests: statistics; machine learning; deep learning; model checking; bioinformatics

E-Mail Website
Guest Editor
Faculty of Science, Kunming University of Science and Technology, Kunming 650500, China
Interests: bayesian statistics; computational statistics; mixture models; variable selection
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mathematics, Missouri State University, Springfield, MO 65897, USA
Interests: pattern recognition and machine learning; applied statistics; image analysis and bioinformatics; statistical learning theory

Special Issue Information

Dear Colleagues,

I am pleased to announce a Special Issue on “Recent Advances in Computational Statistics.” Nowadays, with a rapid development of the computing facilities, computational statistics has been widely used for data analysis in a broad range of fields, such as actuarial science, biometrics, biomedical engineering, econometrics, environmental science, finance markets. This Special Issue will emphasize the contribution and influence of computing on statistics and will present a collection of the latest developments in Computational Statistics, with a focus on statistical computing, algorithm, data analysis, machine learning, parallel computing, and simulation. Topics include new developments in, but are not limited to, the following:

  • Advanced statistical analysis with data applications.
  • Bayesian computing algorithms.
  • Computational and statistical models.
  • Computational methods of modeling and simulating real-world problems (e.g., the COVID-19 epidemic).
  • Data mining and pattern recognition methods.
  • Parametric and nonparametric machine learning algorithms.
  • Statistical software development.

I look forward to receiving your submissions.

Dr. Min Wang
Dr. Haijun Gong
Dr. Liucang Wu
Dr. Songfeng Zheng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computational statistics
  • data analytics
  • statistical methodology

Related Special Issue

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 337 KiB  
Article
A Bayesian Variable Selection Method for Spatial Autoregressive Quantile Models
by Yuanying Zhao and Dengke Xu
Mathematics 2023, 11(4), 987; https://doi.org/10.3390/math11040987 - 15 Feb 2023
Viewed by 1002
Abstract
In this paper, a Bayesian variable selection method for spatial autoregressive (SAR) quantile models is proposed on the basis of spike and slab prior for regression parameters. The SAR quantile models, which are more generalized than SAR models and quantile regression models, are [...] Read more.
In this paper, a Bayesian variable selection method for spatial autoregressive (SAR) quantile models is proposed on the basis of spike and slab prior for regression parameters. The SAR quantile models, which are more generalized than SAR models and quantile regression models, are specified by adopting the asymmetric Laplace distribution for the error term in the classical SAR models. The proposed approach could perform simultaneously robust parametric estimation and variable selection in the context of SAR quantile models. Bayesian statistical inferences are implemented by a detailed Markov chain Monte Carlo (MCMC) procedure that combines Gibbs samplers with a probability integral transformation (PIT) algorithm. In the end, empirical numerical examples including several simulation studies and a Boston housing price data analysis are employed to demonstrate the newly developed methodologies. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
16 pages, 405 KiB  
Article
Variable Selection and Allocation in Joint Models via Gradient Boosting Techniques
by Colin Griesbach, Andreas Mayr and Elisabeth Bergherr
Mathematics 2023, 11(2), 411; https://doi.org/10.3390/math11020411 - 12 Jan 2023
Cited by 2 | Viewed by 1185
Abstract
Modeling longitudinal data (e.g., biomarkers) and the risk for events separately leads to a loss of information and bias, even though the underlying processes are related to each other. Hence, the popularity of joint models for longitudinal and time-to-event-data has grown rapidly in [...] Read more.
Modeling longitudinal data (e.g., biomarkers) and the risk for events separately leads to a loss of information and bias, even though the underlying processes are related to each other. Hence, the popularity of joint models for longitudinal and time-to-event-data has grown rapidly in the last few decades. However, it is quite a practical challenge to specify which part of a joint model the single covariates should be assigned to as this decision usually has to be made based on background knowledge. In this work, we combined recent developments from the field of gradient boosting for distributional regression in order to construct an allocation routine allowing researchers to automatically assign covariates to the single sub-predictors of a joint model. The procedure provides several well-known advantages of model-based statistical learning tools, as well as a fast-performing allocation mechanism for joint models, which is illustrated via empirical results from a simulation study and a biomedical application. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

14 pages, 422 KiB  
Article
Robust Estimation for Semi-Functional Linear Model with Autoregressive Errors
by Bin Yang, Min Chen, Tong Su and Jianjun Zhou
Mathematics 2023, 11(2), 277; https://doi.org/10.3390/math11020277 - 05 Jan 2023
Viewed by 912
Abstract
It is well-known that the traditional functional regression model is mainly based on the least square or likelihood method. These methods usually rely on some strong assumptions, such as error independence and normality, that are not always satisfied. For example, the response variable [...] Read more.
It is well-known that the traditional functional regression model is mainly based on the least square or likelihood method. These methods usually rely on some strong assumptions, such as error independence and normality, that are not always satisfied. For example, the response variable may contain outliers, and the error term is serially correlated. Violation of assumptions can result in unfavorable influences on model estimation. Therefore, a robust estimation procedure of a semi-functional linear model with autoregressive error is developed to solve this problem. We compare the efficiency of our procedure to the least square method through a simulation study and two real data analyses. The conclusion illustrates that the proposed method outperforms the least square method, providing random errors follow the heavy-tail distribution. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

15 pages, 1402 KiB  
Article
Bayesian Inference Algorithm for Estimating Heterogeneity of Regulatory Mechanisms Based on Single-Cell Data
by Wenlong He, Peng Xia, Xinan Zhang and Tianhai Tian
Mathematics 2022, 10(24), 4748; https://doi.org/10.3390/math10244748 - 14 Dec 2022
Viewed by 1015
Abstract
The rapid progress in biological experimental technologies has generated a huge amount of experimental data to investigate complex regulatory mechanisms. Various mathematical models have been proposed to simulate the dynamic properties of molecular processes using the experimental data. However, it is still difficult [...] Read more.
The rapid progress in biological experimental technologies has generated a huge amount of experimental data to investigate complex regulatory mechanisms. Various mathematical models have been proposed to simulate the dynamic properties of molecular processes using the experimental data. However, it is still difficult to estimate unknown parameters in mathematical models for the dynamics in different cells due to the high demand for computing power. In this work, we propose a population statistical inference algorithm to improve the computing efficiency. In the first step, this algorithm clusters single cells into a number of groups based on the distances between each pair of cells. In each cluster, we then infer the parameters of the mathematical model for the first cell. We propose an adaptive approach that uses the inferred parameter values of the first cell to formulate the prior distribution and acceptance criteria of the following cells. Three regulatory network models were used to examine the efficiency and effectiveness of the designed algorithm. The computational results show that the new method reduces the computational time significantly and provides an effective algorithm to infer the parameters of regulatory networks in a large number of cells. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

13 pages, 494 KiB  
Article
An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data
by Xifen Huang, Jinfeng Xu, Hao Guo, Jianhua Shi and Wenjie Zhao
Mathematics 2022, 10(19), 3702; https://doi.org/10.3390/math10193702 - 10 Oct 2022
Cited by 2 | Viewed by 1298
Abstract
For analyzing multiple events data, the illness death model is often used to investigate the covariate–response association for its easy and direct interpretation as well as the flexibility to accommodate the within-subject dependence. The resulting estimation and inferential procedures often depend on the [...] Read more.
For analyzing multiple events data, the illness death model is often used to investigate the covariate–response association for its easy and direct interpretation as well as the flexibility to accommodate the within-subject dependence. The resulting estimation and inferential procedures often depend on the subjective specification of the parametric frailty distribution. For certain frailty distributions, the computation can be challenging as the estimation involves both the nonparametric component and the parametric component. In this paper, we develop efficient computational methods for analyzing semi-competing risks data in the illness death model with the general frailty, where the Minorization–Maximization (MM) principle is employed for yielding accurate estimation and inferential procedures. Simulation studies are conducted to assess the finite-sample performance of the proposed method. An application to a real data is also provided for illustration. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

11 pages, 290 KiB  
Article
Mixture Modeling of Time-to-Event Data in the Proportional Odds Model
by Xifen Huang, Chaosong Xiong, Jinfeng Xu, Jianhua Shi and Jinhong Huang
Mathematics 2022, 10(18), 3375; https://doi.org/10.3390/math10183375 - 16 Sep 2022
Viewed by 1009
Abstract
Subgroup analysis with survival data are most essential for detailed assessment of the risks of medical products in heterogeneous population subgroups. In this paper, we developed a semiparametric mixture modeling strategy in the proportional odds model for simultaneous subgroup identification and regression analysis [...] Read more.
Subgroup analysis with survival data are most essential for detailed assessment of the risks of medical products in heterogeneous population subgroups. In this paper, we developed a semiparametric mixture modeling strategy in the proportional odds model for simultaneous subgroup identification and regression analysis of survival data that flexibly allows the covariate effects to differ among several subgroups. Neither the membership or the subgroup-specific covariate effects are known a priori. The nonparametric maximum likelihood method together with a pair of MM algorithms with monotone ascent property are proposed to carry out the estimation procedures. Then, we conducted two series of simulation studies to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of German breast cancer data is further provided for illustrating the proposed methodology. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
17 pages, 366 KiB  
Article
Efficient Estimation and Inference in the Proportional Odds Model for Survival Data
by Xifen Huang, Chaosong Xiong, Tao Jiang, Junfeng Lu and Jinfeng Xu
Mathematics 2022, 10(18), 3362; https://doi.org/10.3390/math10183362 - 16 Sep 2022
Viewed by 1177
Abstract
In modeling time-to-event data with long-term survivors, the proportional hazards model is widely used for its easy and direct interpretation as well as the flexibility to accommodate the past information and allow time-varying predictors. This becomes most relevant when the mortality of individuals [...] Read more.
In modeling time-to-event data with long-term survivors, the proportional hazards model is widely used for its easy and direct interpretation as well as the flexibility to accommodate the past information and allow time-varying predictors. This becomes most relevant when the mortality of individuals converges with time, and the estimation and inference based upon the proportional odds model can often yield more accurate and reasonable results than the classical Cox’s proportional hazards model. Along with the fast development of the data science technologies, computational challenges for survival data with increasing sample size and diverging parameter dimension exist. Currently, existing methods for analyzing such data are computationally inconvenient. In this paper, we propose efficient computational methods for analyzing survival data in the proportional odds model, where the nonparametric maximum likelihood approach is combined with the minorization-maximization (MM) algorithm and the regularization scheme to yield fast and accurate estimation and inferential procedures. The illustration of the methodology using extensive simulation studies and then the application to the Veterans’ Administration lung cancer data is also given. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

14 pages, 349 KiB  
Article
Generalized Accelerated Failure Time Models for Recurrent Events
by Xiaoyi Wen and Jinfeng Xu
Mathematics 2022, 10(15), 2662; https://doi.org/10.3390/math10152662 - 28 Jul 2022
Viewed by 966
Abstract
For analyzing recurrent event data, we consider a generalization of the classical accelerated failure time model. In the proposed approach, the general function is no longer assumed to be a singleton but allowed to be time-varying. This is in the same spirit as [...] Read more.
For analyzing recurrent event data, we consider a generalization of the classical accelerated failure time model. In the proposed approach, the general function is no longer assumed to be a singleton but allowed to be time-varying. This is in the same spirit as in quantile regression and the counting process techniques can be utilized. Theoretical properties such as consistency and asymptotic normality are obtained. The illustration of the methodology using simulation studies and then the application to the bladder cancer data is also given. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

10 pages, 373 KiB  
Article
Nonparametric Sieve Maximum Likelihood Estimation of Semi-Competing Risks Data
by Xifen Huang and Jinfeng Xu
Mathematics 2022, 10(13), 2248; https://doi.org/10.3390/math10132248 - 27 Jun 2022
Viewed by 1502
Abstract
In biomedical studies involving time-to-event data, a subject may experience distinct types of events. We consider the problem of estimating the transition functions for a semi-competing risks model under illness-death model framework. We propose to estimate the intensity functions by maximizing a B-spline [...] Read more.
In biomedical studies involving time-to-event data, a subject may experience distinct types of events. We consider the problem of estimating the transition functions for a semi-competing risks model under illness-death model framework. We propose to estimate the intensity functions by maximizing a B-spline based sieve likelihood. The method yields smooth estimates without parametric assumptions. Our proposed approach facilitates easy computation of the covariance of the model parameters and yields direct interpretation. Compared with existing approaches, our proposed method requires neither the subjective specification of the frailty distribution nor the Markov or semi-Markov assumption which may be unmet in real applications. We establish the consistency, the convergence rate, and the asymptotic normality of the proposed estimators under some regularity conditions. We also provide simulation studies to assess the finite-sample performance of the proposed modeling and estimation strategy. A real data application is further used to illustrate the proposed methodology. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

13 pages, 3381 KiB  
Article
An Alternative Approach for Identifying Nonlinear Dynamics of the Cascade Logistic-Cubic System
by Yanan Liao, Kai Yang, Hua Wang and Qingtai Xiao
Mathematics 2022, 10(12), 2080; https://doi.org/10.3390/math10122080 - 15 Jun 2022
Cited by 2 | Viewed by 1354
Abstract
The 0-1 test for chaos, which is a simple binary method, has been widely used to detect the nonlinear behaviors of the non-cascade chaotic dynamics. In this paper, the validity checks of the 0-1 test for chaos to the popular cascade Logistic-Cubic (L-C) [...] Read more.
The 0-1 test for chaos, which is a simple binary method, has been widely used to detect the nonlinear behaviors of the non-cascade chaotic dynamics. In this paper, the validity checks of the 0-1 test for chaos to the popular cascade Logistic-Cubic (L-C) system is conducted through exploring the effects of sensitivity parameters. Results show that the periodic, weak-chaotic, and strong-chaotic states of the cascade L-C system can be effectively identified by the introduced simple method for detecting chaos. Nevertheless, the two sensitivity parameters, including the frequency ω and the amplitude α, are critical for the chaos indicator (i.e., the median of asymptotic growth rate, Km) when the cascade dynamic is detected by the method. It is found that the effect of α is more sensitive than that of ω on Km regarding the three dynamical states of the cascade L-C system. Meanwhile, it is recommended that the three states are identified according to the change of K with α from zero to ten since the periodic and weak-chaotic states cannot be identified when the α is greater than a certain constant. In addition, the modified mean square displacement Dc*(n) fails to distinguish its periodic and weak-chaotic states, whereas it can obviously distinguish the above two and strong-chaotic states. This work is therefore invaluable to gaining insight into the understanding of the complex nonlinearity of other different cascade dynamical systems with indicator comparison. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

19 pages, 1815 KiB  
Article
Classical and Bayesian Inference of a Progressive-Stress Model for the Nadarajah–Haghighi Distribution with Type II Progressive Censoring and Different Loss Functions
by Refah Alotaibi, Faten S. Alamri, Ehab M. Almetwally, Min Wang and Hoda Rezk
Mathematics 2022, 10(9), 1602; https://doi.org/10.3390/math10091602 - 08 May 2022
Cited by 2 | Viewed by 2688
Abstract
Accelerated life testing (ALT) is a time-saving technology used in a variety of fields to obtain failure time data for test units in a fraction of the time required to test them under normal operating conditions. This study investigated progressive-stress ALT with progressive [...] Read more.
Accelerated life testing (ALT) is a time-saving technology used in a variety of fields to obtain failure time data for test units in a fraction of the time required to test them under normal operating conditions. This study investigated progressive-stress ALT with progressive type II filtering with the lifetime of test units following a Nadarajah–Haghighi (NH) distribution. It is assumed that the scale parameter of the distribution obeys the inverse power law. The maximum likelihood estimates and estimated confidence intervals for the model parameters were obtained first. The Metropolis–Hastings (MH) algorithm was then used to build Bayes estimators for various squared error loss functions. We also computed the highest posterior density (HPD) credible ranges for the model parameters. Monte Carlo simulations were used to compare the outcomes of the various estimation methods proposed. Finally, one data set was analyzed for validation purposes. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

19 pages, 2398 KiB  
Article
Bayesian Influence Analysis of the Skew-Normal Spatial Autoregression Models
by Yuanyuan Ju, Yan Yang, Mingxing Hu, Lin Dai and Liucang Wu
Mathematics 2022, 10(8), 1306; https://doi.org/10.3390/math10081306 - 14 Apr 2022
Cited by 4 | Viewed by 1506
Abstract
In spatial data analysis, outliers or influential observations have a considerable influence on statistical inference. This paper develops Bayesian influence analysis, including the local influence approach and case influence measures in skew-normal spatial autoregression models (SSARMs). The Bayesian local influence method is proposed [...] Read more.
In spatial data analysis, outliers or influential observations have a considerable influence on statistical inference. This paper develops Bayesian influence analysis, including the local influence approach and case influence measures in skew-normal spatial autoregression models (SSARMs). The Bayesian local influence method is proposed to evaluate the impact of small perturbations in data, the distribution of sampling and prior. To measure the extent of different perturbations in SSARMs, the Bayes factor, the ϕ-divergence and the posterior mean distance are established. A Bayesian case influence measure is presented to examine the influence points in SSARMs. The potential influence points in the models are identified by Cook’s posterior mean distance and Cook’s posterior mode distance ϕ-divergence. The Bayesian influence analysis formulation of spatial data is given. Simulation studies and examples verify the effectiveness of the presented methodologies. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

19 pages, 1095 KiB  
Article
Permutation Variation and Alternative Hyper-Sphere Decomposition
by Qingze Li and Jianxin Pan
Mathematics 2022, 10(4), 562; https://doi.org/10.3390/math10040562 - 11 Feb 2022
Viewed by 1273
Abstract
Current covariance modeling methods work well in longitudinal data analysis. In the analysis of data with no nature order, a common covariance modeling method would be inadequate. In this paper, a study is implemented to investigate the effects of permutations of data on [...] Read more.
Current covariance modeling methods work well in longitudinal data analysis. In the analysis of data with no nature order, a common covariance modeling method would be inadequate. In this paper, a study is implemented to investigate the effects of permutations of data on the estimation of covariance matrix Σ. Based on the Hyper-sphere decomposition method (HPC), this study suggests that the change of data’s permutation breaks the consistency of covariance estimation. An alternative Hyper-sphere decomposition method with permutation invariant is introduced later in this paper. The alternative method’s consistency and asymptotic normality are studied when the observations follow a normal distribution. These results are tested using some example studies. Furthermore, a real data analysis is conducted for illustration purposes. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

21 pages, 384 KiB  
Article
Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data
by Xifen Huang, Jinfeng Xu and Yunpeng Zhou
Mathematics 2022, 10(4), 538; https://doi.org/10.3390/math10040538 - 09 Feb 2022
Cited by 1 | Viewed by 1293
Abstract
Motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, the objective of integration of important biomarkers for the early detection of Mild Cognitive Impairment (MCI) to Alzheimer’s disease (AD) as a therapeutic intervention is most likely to be beneficial in the early stages [...] Read more.
Motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, the objective of integration of important biomarkers for the early detection of Mild Cognitive Impairment (MCI) to Alzheimer’s disease (AD) as a therapeutic intervention is most likely to be beneficial in the early stages of disease progression. Developing predictors for MCI to AD comes down to genotype variables such that the dimension of predictors increases as the sample becomes large. Thus, we consider the sparsity concept of coefficients in a high-dimensional regression model with clustered failure time data such as ADNI, which enables enhancing predictive performances and facilitates the model’s interpretability. In this study, we propose two MM algorithms (profile and non-profile) for the shared frailty survival model firstly and then extend the two proposed MM algorithms to regularized estimation in sparse high-dimensional regression model. The convergence properties of our proposed estimators are also established. Furthermore simulation studies and analysis of ADNI data are illustrated by our proposed methods. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

22 pages, 1336 KiB  
Article
Bivariate Continuous Negatively Correlated Proportional Models with Applications in Schizophrenia Research
by Yuan Sun, Guoliang Tian, Shuixia Guo, Lianjie Shu and Chi Zhang
Mathematics 2022, 10(3), 353; https://doi.org/10.3390/math10030353 - 24 Jan 2022
Viewed by 1726
Abstract
Bivariate continuous negatively correlated proportional data defined in the unit square (0,1)2 often appear in many different disciplines, such as medical studies, clinical trials and so on. To model this type of data, the paper proposes two new [...] Read more.
Bivariate continuous negatively correlated proportional data defined in the unit square (0,1)2 often appear in many different disciplines, such as medical studies, clinical trials and so on. To model this type of data, the paper proposes two new bivariate continuous distributions (i.e., negatively correlated proportional inverse Gaussian (NPIG) and negatively correlated proportional gamma (NPGA) distributions) for the first time and provides corresponding distributional properties. Two mean regression models are further developed for data with covariates. The normalized expectation–maximization (N-EM) algorithm and the gradient descent algorithm are combined to obtain the maximum likelihood estimates of parameters of interest. Simulations studies are conducted, and a data set of cortical thickness for schizophrenia is used to illustrate the proposed methods. According to our analysis between patients and controls of cortical thickness in typical mutual inhibitory brain regions, we verified the compensatory of cortical thickness in patients with schizophrenia and found its negative correlation with age. Full article
(This article belongs to the Special Issue Recent Advances in Computational Statistics)
Show Figures

Figure 1

Back to TopTop