New Advances in High-Dimensional and Non-asymptotic Statistics

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: 31 December 2024 | Viewed by 10566

Special Issue Editors


E-Mail Website
Guest Editor
1. Institute of Artificial Intelligence, Beihang University, Beijing 100083, China
2. Zhuhai UM Science & Technology Research Institute, Zhuhai 519015, China
Interests: high-dimensional statistics; non-asymptotic theory; functional data analysis; robust statistical learning; the mathematics of deep learning; concentration inequalities; subsampling

E-Mail Website
Guest Editor
School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
Interests: network models; random graph theory; high-dimensional statistics; paired comparisons

Special Issue Information

Dear Colleagues,

With the appearance of high-throughput data, statistical models of complex data, such as high-dimensional data, have enhanced one of the hot research directions in modern statistical research. Emerging in various areas of science, a common confrontation is that the measurements fee is usually costly. The sizes of data sets are only in the order of tens or hundreds. Sometimes, the computer can only run a suitable finite sample due to limited computing power. Recently, computer scientists in machine learning have renewed interest in analyzing the rigorous error bounds with high probability for the desired learning procedure when the sample size of data is small; for example, the generalization bounds and excess risk bounds in deep learning. These settings motivated modern statisticians and data scientists to shift their interest from asymptotic analysis to non-asymptotic analysis, see “Zhang H., Chen, S. X. (2021). Concentration Inequalities for Statistical Inference. Communications in Mathematical Research. 37(1), 1-85” for a review. The non-asymptotic theory could deal with the case where the sample size is small, but the model dimension is very large.

The primary goal of this Special Issue is to collect the recent new and novel results on high-dimensional statistics, non-asymptotic inference, concentration inequalities, and small sample learning, in which the theory must be based on the non-asymptotic properties of the proposed estimator. The purely asymptotic results are not an interest for this Special Issue. In any stochastic model, we hope to also conduct inference for the proposed estimator, which is quantified by a set of credible values. Using the concentration of the measure, the mathematical setting for uncertainty quantification is conducted by constructing confidence intervals with high probability. This Special Issue also aims at publishing new advances and applications of small sample inference, exact confidence interval, exact test and uncertainty quantification for statistical models.

Dr. Huiming Zhang
Prof. Dr. Ting Yan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • oracle inequality and high-probability bounds
  • finite sample theory
  • small sample learning
  • concentration inequalities
  • exact confidence interval and test
  • non-asymptotic inference
  • uncertainty quantification and conformal inference

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 392 KiB  
Article
Optimal Non-Asymptotic Bounds for the Sparse β Model
by Xiaowei Yang, Lu Pan, Kun Cheng and Chao Liu
Mathematics 2023, 11(22), 4685; https://doi.org/10.3390/math11224685 - 17 Nov 2023
Viewed by 530
Abstract
This paper investigates the sparse β model with 𝓁1 penalty in the field of network data models, which is a hot topic in both statistical and social network research. We present a refined algorithm designed for parameter estimation in the proposed model. [...] Read more.
This paper investigates the sparse β model with 𝓁1 penalty in the field of network data models, which is a hot topic in both statistical and social network research. We present a refined algorithm designed for parameter estimation in the proposed model. Its effectiveness is highlighted through its alignment with the proximal gradient descent method, stemming from the convexity of the loss function. We study the estimation consistency and establish an optimal bound for the proposed estimator. Empirical validations facilitated through meticulously designed simulation studies corroborate the efficacy of our methodology. These assessments highlight the prospective contributions of our methodology to the advanced field of network data analysis. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
Show Figures

Figure 1

12 pages, 298 KiB  
Article
Sharper Concentration Inequalities for Median-of-Mean Processes
by Guangqiang Teng, Yanpeng Li, Boping Tian and Jie Li
Mathematics 2023, 11(17), 3730; https://doi.org/10.3390/math11173730 - 30 Aug 2023
Viewed by 787
Abstract
The Median-of-Mean (MoM) estimation is an efficient statistical method for handling data with contamination. In this paper, we propose a variance-dependent MoM estimation method using the tail probability of a binomial distribution. The bound of this method is better than the classical Hoeffding [...] Read more.
The Median-of-Mean (MoM) estimation is an efficient statistical method for handling data with contamination. In this paper, we propose a variance-dependent MoM estimation method using the tail probability of a binomial distribution. The bound of this method is better than the classical Hoeffding method under mild conditions. This method is then used to study the concentration of variance-dependent MoM empirical processes and sub-Gaussian intrinsic moment norm. Finally, we give the bound of the variance-dependent MoM estimator with distribution-free contaminated data. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
16 pages, 315 KiB  
Article
On a Low-Rank Matrix Single-Index Model
by The Tien Mai
Mathematics 2023, 11(9), 2065; https://doi.org/10.3390/math11092065 - 26 Apr 2023
Cited by 1 | Viewed by 739
Abstract
In this paper, we conduct a theoretical examination of a low-rank matrix single-index model. This model has recently been introduced in the field of biostatistics, but its theoretical properties for jointly estimating the link function and the coefficient matrix have not yet been [...] Read more.
In this paper, we conduct a theoretical examination of a low-rank matrix single-index model. This model has recently been introduced in the field of biostatistics, but its theoretical properties for jointly estimating the link function and the coefficient matrix have not yet been fully explored. In this paper, we make use of the PAC-Bayesian bounds technique to provide a thorough theoretical understanding of the joint estimation of the link function and the coefficient matrix. This allows us to gain a deeper insight into the properties of this model and its potential applications in different fields. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
14 pages, 295 KiB  
Article
Non-Asymptotic Bounds of AIPW Estimators for Means with Missingness at Random
by Fei Wang and Yuhao Deng
Mathematics 2023, 11(4), 818; https://doi.org/10.3390/math11040818 - 06 Feb 2023
Viewed by 1086
Abstract
The augmented inverse probability weighting is well known for its double robustness in missing data and causal inference. If either the propensity score model or the outcome regression model is correctly specified, the estimator is guaranteed to be consistent. Another important property of [...] Read more.
The augmented inverse probability weighting is well known for its double robustness in missing data and causal inference. If either the propensity score model or the outcome regression model is correctly specified, the estimator is guaranteed to be consistent. Another important property of the augmented inverse probability weighting is that it can achieve first-order equivalence to the oracle estimator in which all nuisance parameters are known, even if the fitted models do not converge at the parametric root-n rate. We explore the non-asymptotic properties of the augmented inverse probability weighting estimator to infer the population mean with missingness at random. We also consider inferences of the mean outcomes on the observed group and on the unobserved group. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
Show Figures

Figure 1

23 pages, 586 KiB  
Article
Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions
by Hengzhen Huang, Guangni Mo, Haiou Li and Hong-Bin Fang
Mathematics 2022, 10(14), 2507; https://doi.org/10.3390/math10142507 - 19 Jul 2022
Viewed by 1144
Abstract
We investigate a nonparametric, varying coefficient regression approach for modeling and estimating the regression effects caused by two functionally correlated datasets. Due to modern biomedical technology to measure multiple patient features during a time interval or intermittently at several discrete time points to [...] Read more.
We investigate a nonparametric, varying coefficient regression approach for modeling and estimating the regression effects caused by two functionally correlated datasets. Due to modern biomedical technology to measure multiple patient features during a time interval or intermittently at several discrete time points to review underlying biological mechanisms, statistical models that do not properly incorporate interventions and their dynamic responses may lead to biased estimates of the intervention effects. We propose a shared parameter change point function-on-function regression model to evaluate the pre- and post-intervention time trends and develop a likelihood-based method for estimating the intervention effects and other parameters. We also propose new methods for estimating and hypothesis testing regression parameters for functional data via reproducing kernel Hilbert space. The estimators of regression parameters are closed-form without computation of the inverse of a large matrix, and hence are less computationally demanding and more applicable. By establishing a representation theorem and a functional central limit theorem, the asymptotic properties of the proposed estimators are obtained, and the corresponding hypothesis tests are proposed. Application and the statistical properties of our method are demonstrated through an immunotherapy clinical trial of advanced myeloma and simulation studies. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
Show Figures

Figure 1

29 pages, 551 KiB  
Article
Sharper Sub-Weibull Concentrations
by Huiming Zhang and Haoyu Wei
Mathematics 2022, 10(13), 2252; https://doi.org/10.3390/math10132252 - 27 Jun 2022
Cited by 9 | Viewed by 1584
Abstract
Constant-specified and exponential concentration inequalities play an essential role in the finite-sample theory of machine learning and high-dimensional statistics area. We obtain sharper and constants-specified concentration inequalities for the sum of independent sub-Weibull random variables, which leads to a mixture of two tails: [...] Read more.
Constant-specified and exponential concentration inequalities play an essential role in the finite-sample theory of machine learning and high-dimensional statistics area. We obtain sharper and constants-specified concentration inequalities for the sum of independent sub-Weibull random variables, which leads to a mixture of two tails: sub-Gaussian for small deviations and sub-Weibull for large deviations from the mean. These bounds are new and improve existing bounds with sharper constants. In addition, a new sub-Weibull parameter is also proposed, which enables recovering the tight concentration inequality for a random variable (vector). For statistical applications, we give an 2-error of estimated coefficients in negative binomial regressions when the heavy-tailed covariates are sub-Weibull distributed with sparse structures, which is a new result for negative binomial regressions. In applying random matrices, we derive non-asymptotic versions of Bai-Yin’s theorem for sub-Weibull entries with exponential tail bounds. Finally, by demonstrating a sub-Weibull confidence region for a log-truncated Z-estimator without the second-moment condition, we discuss and define the sub-Weibull type robust estimator for independent observations {Xi}i=1n without exponential-moment conditions. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
Show Figures

Figure 1

15 pages, 409 KiB  
Article
Group Logistic Regression Models with lp,q Regularization
by Yanfang Zhang, Chuanhua Wei and Xiaolin Liu
Mathematics 2022, 10(13), 2227; https://doi.org/10.3390/math10132227 - 25 Jun 2022
Cited by 6 | Viewed by 1478
Abstract
In this paper, we proposed a logistic regression model with lp,q regularization that could give a group sparse solution. The model could be applied to variable-selection problems with sparse group structures. In the context of big data, the solutions for [...] Read more.
In this paper, we proposed a logistic regression model with lp,q regularization that could give a group sparse solution. The model could be applied to variable-selection problems with sparse group structures. In the context of big data, the solutions for practical problems are often group sparse, so it is necessary to study this kind of model. We defined the model from three perspectives: theoretical, algorithmic and numeric. From the theoretical perspective, by introducing the notion of the group restricted eigenvalue condition, we gave the oracle inequality, which was an important property for the variable-selection problems. The global recovery bound was also established for the logistic regression model with lp,q regularization. From the algorithmic perspective, we applied the well-known alternating direction method of multipliers (ADMM) algorithm to solve the model. The subproblems for the ADMM algorithm were solved effectively. From the numerical perspective, we performed experiments for simulated data and real data in the factor stock selection. We employed the ADMM algorithm that we presented in the paper to solve the model. The numerical results were also presented. We found that the model was effective in terms of variable selection and prediction. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
Show Figures

Figure 1

25 pages, 414 KiB  
Article
Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations
by Shaomin Li, Haoyu Wei and Xiaoyu Lei
Mathematics 2022, 10(10), 1700; https://doi.org/10.3390/math10101700 - 16 May 2022
Cited by 2 | Viewed by 1721
Abstract
Recently, the high-dimensional negative binomial regression (NBR) for count data has been widely used in many scientific fields. However, most studies assumed the dispersion parameter as a constant, which may not be satisfied in practice. This paper studies the variable selection and dispersion [...] Read more.
Recently, the high-dimensional negative binomial regression (NBR) for count data has been widely used in many scientific fields. However, most studies assumed the dispersion parameter as a constant, which may not be satisfied in practice. This paper studies the variable selection and dispersion estimation for the heterogeneous NBR models, which model the dispersion parameter as a function. Specifically, we proposed a double regression and applied a double 1-penalty to both regressions. Under the restricted eigenvalue conditions, we prove the oracle inequalities for the lasso estimators of two partial regression coefficients for the first time, using concentration inequalities of empirical processes. Furthermore, derived from the oracle inequalities, the consistency and convergence rate for the estimators are the theoretical guarantees for further statistical inference. Finally, both simulations and a real data analysis demonstrate that the new methods are effective. Full article
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)
Back to TopTop