Bias-Corrected Inference of High-Dimensional Generalized Linear Models

Tang, Shengfei; Shi, Yanmei; Zhang, Qi

doi:10.3390/math11040932

Open AccessArticle

Bias-Corrected Inference of High-Dimensional Generalized Linear Models

by

Shengfei Tang

,

Yanmei Shi

and

Qi Zhang

^*

School of Mathematics and Statistics, Qingdao University, Qingdao 266071, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(4), 932; https://doi.org/10.3390/math11040932

Submission received: 27 January 2023 / Revised: 8 February 2023 / Accepted: 10 February 2023 / Published: 12 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we propose a weighted link-specific (WLS) approach that establishes a unified statistical inference framework for high-dimensional Poisson and Gamma regression. We regress the parameter deviations as well as the initial estimation errors and utilize the resulting regression coefficients as correction weights to reduce the total mean square error (MSE). We also develop the asymptotic normality of the correction estimates under sparse and non-sparse conditions and construct associated confidence intervals (CIs) to verify the robustness of the new method. Finally, numerical simulations and empirical analysis show that the WLS method is extensive and effective.

Keywords:

generalized linear model; mean square error; bias-correction; link-specific

MSC:

62F12

1. Introduction

With the rapid development of modern computer technology, data collection and processing capabilities are also rapidly increasing. In such a high-dimensional environment, our focuses will become different—for example, what is the relationship between productivity and age in the context of big data [1], how air pollution affects the mortality rate of urban residents [1], and how does the spectrum of raindrops affect mobile dual-polarization radar [2]. Under such conditions, most traditional inference procedures, such as the maximum likelihood method, are no longer valid and yield significantly different results in sparse and non-sparse states. Our attention is thus turned to high-dimensional generalized linear models, such as Poisson regression, Gamma regression, etc., in the expectation that they can solve these problems.

In the face of count data, the Poisson regression model is one of the most commonly used tools, which is a regression model specifically designed to analyze the dependent variable as a count variable based on a Poisson distribution. In contrast, among generalized linear models, the Gamma regression model can usually only be used to model continuous data with response data greater than or equal to zero. This paper establishes a unified inference framework for Poisson regression and Gamma regression through different linkage functions, aiming to build a suitable model for either continuous or discrete data.

High-dimensional Generalized Linear Model (GLM) inferences have been studied by many scholars [3,4,5,6]. Deshpande [7] proposed a debiasing method for constructing CIs. Cai, Athey and Zhu [8,9,10] proposed a more general linear comparison method under the condition of special load vectors. For high-dimensional logistic regression models, Sur and Candès [11] have studied the likelihood ratio test procedure when setting the sparsity

k < 1 / 2, p / n \to k

. Among them, p is the parameter dimension, and n is the number of samples. Ma, Cai, and Li [12] put forward the testing procedure of the global null hypothesis and large-scale simultaneous hypothesis at the time of setting

p ≫ n

. Recently, Shi, C et al. [13] imposed certain strict constraints such as bounded individual probability conditions based on the logical link function to perform hypothesis testing. Under the high-dimensional sparse space, Lasso [14] usually leads to a large bias, and the current correction estimation is mainly performed under linear regression [15,16] and logistic regression models. In the work of Ma, R [12], a weighted low-dimensional projection method was proposed, which achieves the effect of debiasing and constructs its CIs by constructing specific weights and a dimensionality reduction process. In the work of Shi [13], a low-dimensional regression coefficient was proposed to reduce the dimensionality from high to medium dimensionality by recursively performing model selection for de-weighting. Cai and Guo [17] developed a weighted link function method that also constructs specific de-weighting and applies it to logistic regression. Generally speaking, the current research is mainly focused on logistic regression [18,19,20], but there are no other GLMs, such as Poisson and Gamma regression.

Motivated by literature such as [12,13,17], different link functions set their corresponding weights. In this paper, we propose a weighted link-specific (WLS) method to correct the bias of the penalty likelihood estimate of the Poisson and Gamma models. Now, we write

β

as the parameter to be estimated in the one-parameter exponential model and

\hat{β}

as the corresponding penalized likelihood estimate. However, in high-dimensional space, there is a large estimation error between the

\hat{β}

estimates obtained using the great likelihood method and the true values, so we propose a new bias correction method. Specifically, to optimize the overall mean squared error, we regress the

\hat{β}

, and the obtained regression coefficients are identified as correction weights. Finally, we establish the asymptotic normality of the WLS estimates and construct CIs, and this weight construction method has independent implications for studying other inference problems under high-dimensional GLMs.

Overall, we find that the WLS method proposed in this paper is superior to other methods in terms of interval coverage and power, and the method in this paper performs robustly and has good generalizability in both sparse and non-sparse scenarios.

The rest of this paper is organized as follows. In the second section, under the condition that the design distribution is unknown, we perform bias correction on individual regression coefficients under high-dimensional GLMs. In the third section, for our proposed confidence interval, its related theoretical properties are given, and asymptotic normality is proved. In Section 4, the superiority of our method is demonstrated by comparing the numerical performance of the proposed method with other methods through numerical simulations. In the fifth section, through the analysis of a real data set, the False Discovery Rate (FDR) and its power line chart of the four methods are drawn to indirectly illustrate the powerful performance of the method. The final section discusses the overall superiority of our approach as well as some areas for improvement.

Notation and terminology. Throughout, for a vector

a = {(a_{1}, \dots, a_{n})}^{T} \in R^{n}

, we define the

l_{p}

norm:

{∥ a ∥}_{p} = {(\sum_{i = 1}^{n} {|a_{i}|}^{p})}^{1 / p}

, the

l_{0}

norm:

{∥ a ∥}_{0} = \sum^{n} 1 \{a_{i} \neq 0\}

, and the

l_{\infty}

norm:

{∥ a ∥}_{\infty} = {max}_{1 \leq j \leq n} |a_{i}|

, and let

a_{- j} \in R^{n - 1}

stand for the subvector of a without the j-th component. For a matrix

A \in R^{p \times q}

,

λ_{i} (A)

stands for the i-th largest singular value of A and

λ_{max} (A) = λ_{1} (A), λ_{min} (A) = λ_{min {p, q}} (A)

. For a smooth function

f (x)

defined on R, we denote

f^{'} (x) = d f (x) / d x

and

f^{″} (x) = d^{2} f (x) / d x^{2}

. For any positive integer n, we denote the set

{1, 2, \dots, n}

as

[1 : n]

. For any

a, b \in R

, we denote

I_{x} (a, b) = B (x; a, b) / B (a, b)

, as the regularized incomplete beta function, where

B (x, a, b) = \int_{0}^{x} t^{a - 1} {(1 - t)}^{b - 1} d t

is the incomplete beta function. We define

ϕ (x)

and

Φ (x)

as the density function and the cumulative distribution function(CDF) of the standard Gaussian random variable, respectively. We denote

\to_{d}

as convergence in distribution. For positive sequences

\{a_{n}\}

and

\{b_{n}\}

, we write

a_{n} = o (b_{n})

,

a_{n} ≪ b_{n}

or

b_{n} ≫ a_{n}

if

{lim}_{n} a_{n} / b_{n} = 0

, and write

a_{n} = O (b_{n}), a_{n} \leq b_{n}

or

b_{n} \geq a_{n}

if there exists a constant

C

such that

a_{n} \leq C b_{n}

for all n, we write

a_{n} \approx b_{n}

if

a_{n} \leq b_{n}

and

a_{n} \geq b_{n}

.

2. Weighted Link-Specific Method of Generalized Linear Model

In GLM, the standard form of the exponential distribution family is as follows,

f (y; θ, ϕ) = exp \{\frac{θ y - b (θ)}{a (ϕ)} + c (y, ϕ)\},

where

a (ϕ)

is the known dispersion function,

θ

and

ϕ

denote the natural and dispersion parameters, respectively, while

θ

is only related to the expected u, i.e.,

θ = f (u)

. In addition, for the simplicity of subsequent sample split, we collect

2 n

samples

D = {\{(X_{i}, y_{i})\}}_{i = 1}^{2 n}

. In this paper, we mainly focus on Poisson and Gamma distributions and aim to establish a unified statistical inference framework for them in high-dimensional spaces.

2.1. Poisson Regression

For the Poisson distribution, we assume that its observations are independently generated by Formula (1),

y_{i} ∣ X_{i} \sim Poisson (f (X_{i}^{T} β)), X_{i} \sim P_{X},

(1)

where

f : R \to (0, 1)

is the link function given in advance,

β \in R^{p}

is a highly sparse regression vector, and its sparsity is denoted by k and

P_{x}

is some probability distribution. In the subsection, we denote the link function as

f (x) = exp (x) / (1 + exp (x))

and define

l_{f} (β)

as the likelihood function,

l_{f} (β) = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} ln f (X_{i}^{T} β) - f (X_{i}^{T} β)],

and

l_{1}

-penalized negative log-likelihood estimator under GLM is as follows,

\hat{β} = \underset{β}{arg min} \{l_{f} (β) + λ {∥ β ∥}_{1}\},

(2)

where

λ \approx \sqrt{log p / n}

. Although

\hat{β}

achieves the best convergence rate [21,22], there is a large error between the penalty likelihood estimator and the real value which could be reduced. Specifically, to optimize the overall MSE, we regress the deviation of parameters on the original estimation error and determine the obtained regression coefficients as the correction weights.

Since the correction method involves a penalized likelihood estimation step and weighting step, we divide the

2 n

samples into two independent data sets

D_{1} = {\{(X_{i}, y_{i})\}}_{i = 1}^{n},

D_{2} = {\{(X_{i}, y_{i})\}}_{i = n + 1}^{2 n}

and perform penalized likelihood estimation on

D_{1}

. The splitting process of samples can better promote the relevant theoretical analysis [17], and the numerical performance obtained is also good enough compared with the non-splitting sample. Perhaps this is not a practical limitation, and so we weighted it on

D_{2}

.

For a given

j \in [1 : p]

, we consider the general form of the bias correction estimate as follows [17,23]:

{\tilde{β}}_{j} = {\hat{β}}_{j} + u^{T} n^{- 1} \sum_{i}^{n} W_{i} \cdot X_{i} (y_{i} - f (X_{i}^{T} \hat{β})),

(3)

where we note that

\hat{R} = \frac{1}{n} \sum_{i = 1}^{n} W_{i} \cdot X_{i} (y_{i} - f (X_{i}^{T} \hat{β}))

, and

{\hat{β}}_{j}

is the penalty likelihood estimate defined in (2).

W_{i} \in R, 1 \leq i \leq n

and

u \in R^{p}

denote the link-specific weight vector and projection direction to be constructed, respectively, such that

\hat{R}

is an accurate estimate of the deviation

{\tilde{β}}_{j} - β_{j}

. Then, we decompose the error of Formula (3), which can help us to construct the weight vector and projection direction. The model (1) can be re-expressed as

y_{i} = f (X_{i}^{T} β) + ε_{i}, E (ε_{i} ∣ X_{i}) = 0,

(4)

and we carry out Taylor expansion near

X_{i}^{T} \hat{β}

as follows

\frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} (y_{i} - f (X_{i}^{T} \hat{β})) = \frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} ε_{i} + \frac{1}{n} \sum_{i = 1}^{n} W_{i} f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} (β - \hat{β}) + \frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} Δ_{i},

(5)

where

Δ_{i} = f^{″} (X_{i}^{T} \hat{β} + t X_{i}^{T} (β - \hat{β})) \cdot {[X_{i}^{T} (\hat{β} - β)]}^{2}, t \in (0, 1)

. Combining (3) and (5), we can get the estimated error expression of

{\tilde{β}}_{j} - β_{j}

\begin{array}{l} {\tilde{β}}_{j} - β_{j} \\ = {\hat{β}}_{j} - β_{j} + u^{T} n^{- 1} \sum_{i = 1}^{n} W_{i} \cdot X_{i} (y_{i} - f (X_{i}^{T} \hat{β})) \\ = ({\hat{β}}_{j} - β_{j}) + u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} ε_{i} + u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} (β - \hat{β}) + u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} Δ_{i} \\ = u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} ε_{i} + (u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} - e_{i}^{T}) (β - \hat{β}) + u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} X_{i} Δ_{i} . \end{array},

(6)

where

{\{e_{i}\}}_{i = 1}^{p}

is the standard basis of the Euclidean space

R^{p}

. In the above Formula (6), the bias can be divided into the sum of three errors. The first part is the random error due to the model error

ε_{i}

, the second part is the residual bias due to the penalized likelihood estimator

\hat{β}

, and the last part is the approximation error due to the nonlinearity of the link function f.

We give two optimization criteria to determine the weight vector and the projection vector. First, the random error in (6) must satisfy asymptotic normality, and its standard error must reach a minimum value. Second, the residual bias and approximation error of the latter two items are negligible compared to the random error.

To satisfy these two conditions, we first construct the weight vector and then the projection direction. The variance of the random error in Formula (6) is

Var (u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} \cdot X_{i} ε_{i} ∣ {\{X_{i}\}}_{i = 1}^{n}) = u^{T} \frac{1}{n^{2}} \sum_{i = 1}^{n} W_{i}^{2} \cdot Var (ε_{i} ∣ X_{i}) X_{i} X_{i}^{T} u,

(7)

and according to Hölder’s inequality, the upper bound of the residual deviation in (6) can be obtained:

|(u^{T} \frac{1}{n} \sum_{i = 1}^{n} W_{i} f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} - e_{j}^{T}) (β - \hat{β})| \leq {∥\frac{1}{n} \sum_{i = 1}^{n} W_{i} f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} u - e_{j}∥}_{\infty} {∥ β - \hat{β} ∥}_{1} .

(8)

According to [8,17], in order to achieve the constraint that the first random error in Formula (6) dominates and the next two terms are relatively negligible, it is sufficient to make the right-hand part in (7) approximately equal to the right-hand part in (8),

W_{i}^{2} \cdot Var (ε_{i} ∣ X_{i}) X_{i} X_{i}^{T} u \approx W_{i} f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} u,

also known as,

W_{i}^{2} \cdot Var (ε_{i} ∣ X_{i}) \approx W_{i} f^{'} (X_{i}^{T} \hat{β}),

(9)

then, the construction of the weighted vector is obtained as

W_{i} = w (X_{i}^{T} \hat{β})

with

w (z) = \frac{f^{'} (z)}{Var (ε_{i} ∣ X_{i})},

(10)

the above weight vector constructed from Formula (10) above ensures that the first random error dominates in Formula (6). Next, we record

\hat{Σ} = \frac{1}{n} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) \cdot f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T}

and use the optimization method to estimate the projection direction as

\hat{u} = arg min u^{T} \hat{Σ} u,

subject to

\begin{array}{l} {∥\hat{Σ} u - X_{j}∥}_{\infty} \leq {∥X_{j}∥}_{2} λ_{n}, \\ ∥X_{j}^{T} \hat{Σ} u -∥ X_{j} ∥_{2}^{2} ∣ \leq∥ X_{j} ∥_{2}^{2} λ_{n} \\ {∥ X u ∥}_{\infty} \leq {∥X_{j}∥}_{2} τ_{n}, \end{array},

(11)

where

λ_{n} \approx \sqrt{log p / n}, τ_{n} \approx \sqrt{log n}

. The constraints in Formula (11) above are used to control the errors in the second and third terms of Formula (6), so this projection could ensure that the deviation

{\tilde{β}}_{j} - β_{j}

is very small [23,24]. Then, the bias correction estimator is

{\tilde{β}}_{j} = {\hat{β}}_{j} + {\hat{u}}^{T} \frac{1}{n} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) \cdot (y_{i} - f (X_{i}^{T} \hat{β})) X_{i} .

(12)

2.2. Gamma Regression

For the Gamma distribution, we assume that its observations are independently generated by Formula (13),

y_{i} ∣ X_{i} \sim Gamma (α_{0}, β), X_{i} \sim P_{X},

(13)

where

α_{0}

is a known parameter. The correction estimates under Gamma regression are essentially the same as the Poisson regression correction method. First, we determine the link function, assuming that the samples come from (13); then, there is

f (y; β) = \frac{y^{α_{0} - 1} exp (- y / β)}{β^{α_{0}} Γ (α_{0})} .

(14)

It is easy to know that the expectation is

u = α_{0} \cdot β

; now, record

ϕ = 1 / α_{0}

, then

α_{0} = 1 / ϕ

as well as

β = u \cdot ϕ

. Then, (14) can be simplified to a standard exponential family of distributions as follows,

\begin{array}{l} f (y; β) = & f (y; u, ϕ) = \frac{y^{α_{0} - 1} exp (- y / β)}{β^{α_{0}} Γ (α_{0})} \\ = exp \{- \frac{y}{β} + (α_{0} - 1) ln y - α_{0} ln β - ln Γ (α_{0})\} \\ = exp \{- \frac{y}{u ϕ} - \frac{ln (u ϕ)}{ϕ} + (\frac{1}{ϕ} - 1) ln y - ln Γ (\frac{1}{ϕ})\} \\ = exp \{\frac{y \cdot u^{- 1} - (- ln u)}{- ϕ} - \frac{ln ϕ}{ϕ} + \frac{1 - ϕ}{ϕ} ln y - ln Γ (ϕ^{- 1})\} . \end{array}

(15)

The link function is

f_{1} (u) = θ = \frac{1}{u}

, but to satisfy the regularity assumption in this paper, the link function is constructed as

f (x) = \frac{1}{1 + x}

. Next, we compute its negative log-likelihood function as follows,

l_{f} (β) = - \frac{1}{n} \sum_{i = 1}^{n} \{\frac{y_{i} f (X_{i}^{T} β) + ln f^{- 1} (X_{i}^{T} β)}{- λ}\} .

Finally, from Equation (2) to Equation (12), the bias correction estimate can be calculated in the same form as (12).

3. Theoretical Properties

3.1. Asymptotic Normality

In the subsection, we prove the asymptotic normality of the parameters and construct the associated CIs. First, we give a set of mildness conditions [12,13,25] that the link function needs to satisfy. For link function

f : R \to (0, 1)

, we have the following:

The connection function f is a monotone, second-order differentiable concave function on $R (R_{+})$ ;
There is a normal constant $C_{1}, C_{2}$ such that for all $x \geq 0, f (x) \leq Φ (C_{1} x)$ , $Φ (x)$ is a standard Gaussian distribution, and $max \{\frac{f^{'} (x)}{x (1 - f (x))}, x^{2} f^{'} (x)\} < C_{2}$ ;
There exists a constant $C_{1}, C_{2}$ , such that $sup |x f^{″} (x + w)| \leq C_{1}, | w | < C_{2}$ ;
For $l_{f} (β)$ defined by Formula (2), there is a constant C, which can make the Hessian matrix expressed as $l_{f}^{″} (β) = \frac{1}{n} \sum_{i = 1}^{n} h (β; y_{i}, X_{i}) X_{i} X_{i}^{T}$ , with $h (β; y_{i}, X_{i}) > 0$ and

$max |log h (β + b; y_{i}, X_{i}) - log h (β; y_{i}, X_{i})| \leq C ({|X_{i}^{T} β|}^{2} + {|X_{i}^{T} b|}^{2} + |X_{i}^{T} b|) .$

None of the above four conditions is very strict, and a large number of link functions can satisfy them, where 1∼3 are relatively easy to verify, while condition 4 is from Huang [26]. Second, for the random design variables and their distributions, we assume that
${\{X_{i}\}}_{1 \leq i \leq 2 n}$ is an independent and identically distributed sub-Gaussian random vector, that is, there is a constant $C \in R$ that satisfies $E [exp (v^{T} X)] \leq exp ({∥ v ∥}_{2}^{2} c^{2} / 2), v \in R^{p}$ .

It can be found that if

X_{i 1} = 1, 1 \leq i \leq 2 n, Σ = E (X_{i} X_{i}^{T}) \in R^{p \times p}

,

β_{1}

is its intercept as well as k is its sparsity level, then there exists a parameter space associated with k as follows.

\begin{matrix} Θ (k) = \{θ = {(β, Σ) : ∥ β ∥}_{0} \leq k, {∥ β ∥}_{2} \leq C, \\ M^{- 1} \leq λ_{min} (Σ) \leq λ_{max} (Σ) \leq M\}, \end{matrix}

where

M > 1, C > 0

.

In a high-dimensional space, the size of sparsity k can easily affect the theoretical distribution of the bias corrector

\tilde{β}

, so we build a parameter space

Θ (k)

associated with k. Under this parameter space, we present Theorem 1, which establishes the asymptotic properties of a bias-corrected estimator

\tilde{β}

.

Theorem 1.

Assuming that the conditions 1∼5 are established,

(β, Σ) \in Θ (k)

, then for any

j \in [1 : p]

, if

(1): $k < < \frac{n}{log n log p}$ , then we have ${\tilde{β}}_{j} - β_{j} = A_{n} + o (p)$ , and $\frac{\sqrt{n} A_{n}}{v_{j}^{1 / 2}} \to_{d} N (0, 1)$ ,
(2): $k < < \frac{\sqrt{n}}{log p \sqrt{log n}}$ , then $\frac{\sqrt{n} ({\tilde{β}}_{j} - β_{j})}{v_{j}^{1 / 2}} \to_{d} N (0, 1)$ .

Proof of Theorem 1.

By the definition of

{\tilde{β}}_{j}

, we have

\begin{array}{l} {\tilde{β}}_{j} - β_{j} = & - (\frac{1}{n} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) f^{'} (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i} X_{i}^{T} \hat{β} - e_{j}^{T}) (\hat{β} - β) \\ + \frac{1}{n} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i} ε_{i} - \frac{1}{n} \sum_{i = 1}^{n} Δ_{i} w (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i}, \end{array}

(16)

where we denote

A_{n} = \frac{1}{n} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i} ε_{i},

and

B_{n} = - (\frac{1}{n} \sum_{i - 1}^{n} w (X_{i}^{T} \hat{β}) f^{'} (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i} X_{i}^{T} - e_{j}^{T}) (\hat{β} - β) - \frac{1}{n} \sum_{i - 1}^{n} Δ_{i} w (X_{i}^{T} \hat{β}) u^{T} X_{i} .

In what follows, on the one hand, we show that, in Formula (15), under the conditions of Theorem 1, with probability at least

1 - p^{- c} - n^{- c}

,

\begin{array}{l} {∥\frac{1}{n} {\hat{u}}^{T} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) f^{'} (X_{i}^{T} \hat{β}) X_{i} X_{i}^{T} - e_{j}^{T}∥}_{\infty} \leq λ_{n}, \\ |\frac{1}{n} \sum_{i = 1}^{n} Δ_{i} w (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i}| \leq τ_{n} \frac{k log p}{n}, \end{array}

(17)

furthermore, we have with probability at least

1 - p^{- c} - n^{- c}

,

\begin{array}{l} |(\frac{1}{n} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) f^{'} (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i} X_{i}^{T} - e_{j}^{T}) (\hat{β} - β)| \\ \leq λ_{n} {∥ \hat{β} - β ∥}_{1} \leq \frac{k log p}{n}, \end{array}

(18)

which along with (16) leads to the upper bound of

B_{n}

. On the other hand, the asymptotic normality of the stochastic term

A_{n}

in (16) can be derived from the following formulation.

Under the conditions of

D_{2}

and

{\{X_{i}\}}_{i = 1}^{n}

, it can be assumed that:

v_{j}^{- 1 / 2} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} w (X_{i}^{T} \hat{β}) {\hat{u}}^{T} X_{i} ε_{i} \to_{d} N (0, 1)

The major contribution of Theorem 1 is that it removes many of the strict conditions in high-dimensional GLM inference, such as the bounded individual probability condition and the bounded design condition, greatly relaxing the preconditions for a wider range of situations. □

3.2. Confidence Interval

Under the mild regularity conditions in Section 3.1,

{\tilde{β}}_{j}

in (12) has the following asymptotic variance:

v_{j} = {\hat{u}}^{T} [\frac{1}{n} \sum_{i = 1}^{n} w^{2} (X_{i}^{T} \hat{β}) Var (ε_{i} ∣ X_{i}) X_{i} X_{i}^{T}] \hat{u} .

At this point, the variance of the error

E_{i}

can be estimated as

\hat{Var} (ε_{i} ∣ X_{i}) = f (X_{i}^{T} \hat{β}) (1 - f (X_{i}^{T} \hat{β})),

then, the variance of

β_{j}

is estimated as follows,

{\hat{v}}_{j} = {\hat{u}}^{T} [\frac{1}{n} \sum_{i = 1}^{n} \frac{{[f^{'} (X_{i}^{T} \hat{β})]}^{2}}{\hat{V} a r (ε_{i} ∣ X_{i})} X_{i} X_{i}^{T}] \hat{u} .

Based on this, we construct CIs for the regression coefficient

β_{j}

at the

(1 - α)

confidence level as

C I (β_{j}, D) = [{\tilde{β}}_{j} - {\tilde{ρ}}_{j}, {\tilde{β}}_{j} + {\tilde{ρ}}_{j}],

(19)

where

{\tilde{ρ}}_{j} = \frac{z_{α / 2} {\hat{v}}_{j}^{1 / 2}}{\sqrt{n}}, z_{α / 2} = Φ^{- 1} (1 - α / 2)

. Note that it is necessary for sparsity in the confidence interval given in Formula (18):

k < < \frac{\sqrt{n}}{log p \sqrt{log n}}

. In this ultra-sparse state, Formula (18) holds, while for the non-ultra-sparse state, the above formula will not hold with high probability.

4. Simulations

In this section, we evaluate the numerical performance of the proposed method and compare it with other inference methods under high-dimensional GLMs. For the evaluation of CIs, we mainly consider the coverage of regression coefficients and the length of CIs.

CIs for High-Dimensional Poisson and Gamma Regression

We build CIs from Poisson regression and Gamma regression. We set n = 100; let p vary between 400 and 1300, and let the sparsity level k vary between 25 and 35. For the real regression coefficient, given a support set

S, | S | = k

, we set

|β_{j}| = ψ I_{{j \in S}}, j = 1, \dots, p

, and the ratio of

ψ

to

- ψ

is equal. For the design covariates

X_{i}

, generated by a multivariate Gaussian distribution, its covariance matrix

Σ = Σ_{M}

is a block diagonal matrix consisting of 10 identical unit diagonal Toeplitz matrices, with off-diagonal entries falling from 0.6 to 0.

In the subsection, we construct the confidence interval of the non-zero regression coefficient

β_{2} = - ψ

and the zero regression coefficient

β_{100} = 0

and set the desired confidence level to

95 %

. We compare three methods separately to evaluate their numerical performance: (i) the WLS method proposed in this paper; (ii) based on the weighted low-dimensional projection method (WLP) proposed by [12]; (iii) finally, the original Lasso estimate is used for comparison. The comparison regression results of zero coefficients and non-zero coefficients of Poisson regression are summarized in Table 1 and Table 2, and the comparison regression results of zero coefficients and non-zero coefficients of Gamma regression are summarized in Table 3 and Table 4, where each entry represents the average simulation results for 100 rounds.

In the non-sparse space setting, the results in Table 1 and Table 3 show that the coverage obtained by using the WLS method for bias correction is much better than that of the WLP and Lasso methods in both Poisson and Gamma regressions. Although the coverage obtained by the WLP method is longer, it is worthwhile to ensure its accuracy while sacrificing some precision. Moreover, it can be seen that the interval coverage obtained by the WLP and Lasso methods is so small that it fails almost completely, and perhaps there are other limitations in applying them in sparse scenarios.

In the sparse space setting, the results in Table 2 and Table 4 show that both the WLS method and the WLP method can achieve the interval coverage we expect. It can be seen that the interval lengths obtained by the WLP method are generally shorter than those of the WLS method, and their accuracy is higher. However, in our setting, the WLP method produces significantly higher Type II errors than the WLS method, and its precision decreases, while the Lasso method still performs poorly and does not meet our expectations.

In summary, our method is highly adaptable and robust. Under sparse conditions, our method is quite similar to existing comparison methods, but our method is more applicable to some cases where low Type II error rates are retained.

5. Real Data Analysis

We analyze a single-cell RNA-seq dataset in [27], which contains the expression estimates (transcripts per million) for all the 27,723 UCSC-annotated mouse genes of a total of 1861 primary mouse bone marrow-derived dendritic cells spanning over several experimental conditions. The complete dataset was downloaded from the Gene Expression Omnibus with the accession code GSE48968.

5.1. False Discovery Rate and Power Comparison

We apply the WLS method and the four comparison methods to the fit high-dimensional Poisson regression model (1) and Gamma regression model (13), respectively, to obtain FDR plots and power plots for the five methods. Specifically, taking model (1) as an example. WLS regression is the method proposed in this paper. Weighted low-dimensional projection regression (WLP) was proposed in [12], while the remaining “Lasso” regression, “rproj” regression, and “lproj” regression can be found in the relevant software packages of R language software. By comparing Figure 1 and Figure 2, we can see the advantages of our proposed method.

In Figure 1 and Figure 2, we can see that all the methods can control the FDR, but the original Lasso method is conservative and has the weakest efficacy, while methods such as “lproj” and “rproj” have improved but have not increased much, and the power obtained by WLS based on WLP is much higher than other methods. In addition, our proposed WLS results are the most outstanding, showing the great superiority of our approach, which is closely related to the fact that our proposed method has no strict limitation on the data, and this trend becomes increasingly obvious in the Gamma distribution.

5.2. CIs for the Different Stimuli

In the subsection, we use the above data set as a basis to derive the CI produced by different stimuli. This experiment is mainly used to analyre the effect of different stimuli on gene expression. Specifically, we focus on three pathogenic components, namely LPS (a component of Gram-negative bacteria), PIC (virulent double-stranded RNA), and PAM (a synthetic mimic of bacterial lipopeptides), and a set of unstimulated control cells. Sample screening is an important step, and we plot the gene expression profiles of stimulated cells at 0, 1, 2, 4, and 6. For better experimentation, we only consider data at 6 because the variance in this state is the most pronounced. We also remove gene samples with zero counts over

80 %

and those with significant variance changes, etc. Finally, we fit a high-dimensional Poisson regression model (1) and apply the proposed method to obtain the

95 %

CIs of each regression coefficient.

Specifically, there are three stimuli in the experiments in this paper: LPS, PIC, and PAM stimuli. Different stimuli are assigned to different sub-datasets: Sdata, Cdata, and Mdata. As an example, consider the PAM stimulus: n = 160, p = 768. The experiments of model (1) are performed on the Mdata data set, and parameter

β

represents the PAM stimulus. We use the WLS method and other comparison methods (WLP, “rproj,” “lproj”) to calculate the confidence interval for each gene under that stimulus, allowing us to compare the performance of different genes under different conditions, as shown in the figure below.

As can be seen in Figure 3, the PAM stimuli obtained by the WLS, WLP, and “lproj” methods all have CI values for one or more genes whose regression coefficients do not cover 0 (i.e., the red part of the interval). This indicates a potential functional response to the stimulus; specifically, for the PAM-stimulated cells, it identifies the specific gene that codes for the protein. However, for the “rproj” method, the regression coefficients for all genes in response to the PAM stimulus cover the value of 0, which is probably related to its specific stimulus.

In the PAM stimulation experiment, we draw the interval length images using four methods. The average interval lengths of all genes produced by WLS, WLP, “lproj”, and “rproj” were calculated to be 1.5, 1.2, 3.5, and 3.5, respectively. The above computational results and images show that the WLP method usually produces shorter CIs and does not cover many genes with a value of 0. On the other hand, the “lproj” method usually highlights similar genes and has a longer CI coverage. Finally, the “rproj” method also tends to produce long CIs and cover zero genes.

6. Discussion

In this paper, we propose a unified framework to perform debiased estimation for distributions with different connectivity functions while constructing confidence intervals under sparse and non-sparse conditions to ensure the validity of interval coverage and coverage length. For technical reasons, we use sample splitting means to establish the relevant theoretical properties. In Van de Geer et al. [28], the random errors (6) are guaranteed to be asymptotically normal if conditions similar to theirs are imposed, and such conditions allow us to establish the relevant theoretical properties without sample partitioning. However, these strict conditions limit the adaptability of the proposed method, so we use sample splitting means to remove other strong assumptions to present our results. Moreover, we cannot guarantee that the proposed method will perform worse without the sample segmentation condition. Therefore, it is of interest to develop a new technical tool to perform inference experiments without sample segmentation.

Author Contributions

Conceptualization, Q.Z.; Methodology, Q.Z.; Software, S.T.; Validation, Y.S.; Formal analysis, Q.Z.; Data curation, S.T.; Writing—original draft, S.T.; Supervision, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by National Social Science Fund project of China (No. 21BTJ045).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fotuhi, H.; Amiri, A.; Maleki, M.R. Phase I monitoring of social networks based on Poisson regression profiles. Qual. Reliab. Eng. Int. 2018, 34, 572–588. [Google Scholar] [CrossRef]
Ortega, E.M.; Bolfarine, H.; Paula, G.A. Influence diagnostics in generalized log-gamma regression models. Comput. Stat. Data Anal. 2003, 42, 165–186. [Google Scholar] [CrossRef]
Sørensen, Ø.; Hellton, K.H.; Frigessi, A.; Thoresen, M. Covariate selection in high-dimensional generalized linear models with measurement error. J. Comput. Graph. Stat. 2018, 27, 739–749. [Google Scholar] [CrossRef]
Piironen, J.; Paasiniemi, M.; Vehtari, A. Projective inference in high-dimensional problems: Prediction and feature selection. Electron. J. Stat. 2020, 14, 2155–2197. [Google Scholar] [CrossRef]
Liang, F.; Xue, J.; Jia, B. Markov neighborhood regression for high-dimensional inference. J. Am. Stat. Assoc. 2022, 117, 1200–1214. [Google Scholar] [CrossRef]
Liu, C.; Zhao, X.; Huang, J. A Random Projection Approach to Hypothesis Tests in High-Dimensional Single-Index Models. J. Am. Stat. Assoc. 2022, 1–21. [Google Scholar] [CrossRef]
Deshpande, Y.; Javanmard, A.; Mehrabi, M. Online debiasing for adaptively collected high-dimensional data with applications to time series analysis. J. Am. Stat. Assoc. 2021, 1–14. [Google Scholar] [CrossRef]
Cai, T.T.; Guo, Z. Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Stat. 2017, 45, 615–646. [Google Scholar] [CrossRef]
Athey, S.; Imbens, G.W.; Wager, S. Approximate residual balancing: Debiased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2018, 80, 597–623. [Google Scholar] [CrossRef]
Zhu, Y.; Bradic, J. Linear hypothesis testing in dense high-dimensional linear models. J. Am. Stat. Assoc. 2018, 113, 1583–1600. [Google Scholar] [CrossRef]
Sur, P.; Chen, Y.; Candès, E.J. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probab. Theory Relat. Fields 2019, 175, 487–558. [Google Scholar] [CrossRef]
Ma, R.; Tony Cai, T.; Li, H. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. J. Am. Stat. Assoc. 2021, 116, 984–998. [Google Scholar] [CrossRef] [PubMed]
Shi, C.; Song, R.; Lu, W.; Li, R. Statistical inference for high-dimensional models via recursive online-score estimation. J. Am. Stat. Assoc. 2021, 116, 1307–1318. [Google Scholar] [CrossRef] [PubMed]
Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
Oda, R.; Mima, Y.; Yanagihara, H.; Fujikoshi, Y. A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration. Commun. Stat.-Theory Methods 2021, 50, 3453–3476. [Google Scholar] [CrossRef]
Janková, J.; van de Geer, S. De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices. arXiv 2018, arXiv:1801.10567. [Google Scholar] [CrossRef]
Cai, T.T.; Guo, Z.; Ma, R. Statistical inference for high-dimensional generalized linear models with binary outcomes. J. Am. Stat. Assoc. 2021, 1–14. [Google Scholar] [CrossRef]
Belloni, A.; Chernozhukov, V.; Wei, Y. Post-selection inference for generalized linear models with many controls. J. Bus. Econ. Stat. 2016, 34, 606–619. [Google Scholar] [CrossRef]
Li, X.; Chen, F.; Liang, H.; Ruppert, D. Model Checking for Logistic Models When the Number of Parameters Tends to Infinity. J. Comput. Graph. Stat. 2022, 1–30. [Google Scholar] [CrossRef]
Ning, Y.; Liu, H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat. 2017, 45, 158–195. [Google Scholar] [CrossRef]
Buccini, A.; De la Cruz Cabrera, O.; Donatelli, M.; Martinelli, A.; Reichel, L. Large-scale regression with non-convex loss and penalty. Appl. Numer. Math. 2020, 157, 590–601. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, Y.; Zhang, J.; Xie, B.; Liao, J.; Liao, W. Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method. J. Appl. Stat. 2021, 48, 234–246. [Google Scholar] [CrossRef] [PubMed]
Cai, T.; Tony Cai, T.; Guo, Z. Optimal statistical inference for individualized treatment effects in high-dimensional models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2021, 83, 669–719. [Google Scholar] [CrossRef]
Javanmard, A.; Lee, J.D. A flexible framework for hypothesis testing in high dimensions. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2020, 82, 685–718. [Google Scholar] [CrossRef]
Guo, Z.; Rakshit, P.; Herman, D.S.; Chen, J. Inference for the case probability in high-dimensional logistic regression. J. Mach. Learn. Res. 2021, 22, 11480–11533. [Google Scholar]
Huang, J.; Zhang, C.H. Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 2012, 13, 1839–1864. [Google Scholar]
Shalek, A.K.; Satija, R.; Shuga, J.; Trombetta, J.J.; Gennert, D.; Lu, D.; Chen, P.; Gertner, R.S.; Gaublomme, J.T.; Yosef, N.; et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 2014, 510, 363–369. [Google Scholar] [CrossRef] [PubMed]
Van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 2014, 42, 1166–1202. [Google Scholar] [CrossRef]

Figure 1. The FDR versus power comparison under the Poisson mode.

Figure 2. The FDR versus Power comparison under the Gamma mode.

Figure 3. CIs under different methods.

Table 1. Empirical performances of CIs for

β_{2}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Poisson.

Table 1. Empirical performances of CIs for

β_{2}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Poisson.

p	Coverage(%)			Length
	WLS	WLP	Lasso	WLS	WLP	Lasso
k = 20
400	93.0	9.2	10.2	1.12	0.41	0.08
700	91.0	8.5	6.5	1.11	0.41	0.07
1000	96.0	10.2	4.5	1.13	0.40	0.07
1300	93.0	10.0	4.05	0.97	0.41	0.06
k = 25
400	95.0	3.0	10.3	1.15	0.40	0.07
700	90.0	1.9	3.04	1.14	0.41	0.07
1000	98.0	1.2	2.23	1.34	0.41	0.07
1300	91.0	1.0	1.05	0.98	0.40	0.06
k = 35
400	95.0	6.8	9.38	1.15	0.41	0.08
700	95.3	4.1	6.46	0.95	0.41	0.08
1000	96.8	1.2	5.31	1.35	0.41	0.06
1300	93.0	1.1	2.24	0.97	0.40	0.06

Table 2. Empirical performances of CIs for

β_{100}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Poisson.

Table 2. Empirical performances of CIs for

β_{100}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Poisson.

p	Coverage(%)			Length
	WLS	WLP	Lasso	WLS	WLP	Lasso
k = 20
400	95.0	99.0	16.3	1.11	0.41	0.08
700	99.3	99.2	13.2	1.23	0.40	0.07
1000	99.2	99.5	23.3	1.13	0.41	0.08
1300	99.6	99.8	24.2	1.02	0.40	0.06
k = 25
400	97.0	94.1	11.3	1.15	0.41	0.07
700	96.0	95.2	27.1	1.23	0.40	0.08
1000	99.5	96.1	28.6	1.26	0.41	0.09
1300	96.0	98.2	37.6	1.03	0.41	0.10
k = 35
400	95.0	94.1	21.56	1.16	0.41	0.09
700	96.9	96.4	34.8	1.03	0.41	0.08
1000	98.7	98.6	33.4	1.15	0.41	0.07
1300	99.5	98.9	35.3	1.03	0.40	0.06

Table 3. Empirical performances of CIs for

β_{2}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Gamma.

Table 3. Empirical performances of CIs for

β_{2}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Gamma.

p	Coverage(%)			Length
	WLS	WLP	Lasso	WLS	WLP	Lasso
k = 20
400	93.4	6.1	11.12	0.99	0.40	0.08
700	94.2	14.2	8.01	0.84	0.41	0.07
1000	92.5	13.2	3.66	0.85	0.41	0.08
1300	93.2	13.0	1.03	0.86	0.40	0.07
k = 25
400	95.1	3.2	4.23	0.98	0.41	0.07
700	89.3	2.1	2.04	0.84	0.41	0.07
1000	91.7	1.1	3.26	0.87	0.40	0.08
1300	92.3	1.0	2.06	0.89	0.40	0.07
k = 35
400	86.2	7.1	5.07	0.87	0.41	0.08
700	86.3	6.2	1.09	0.85	0.41	0.07
1000	80.4	1.2	1.03	0.78	0.41	0.07
1300	82.3	1.1	1.01	0.80	0.40	0.07

Table 4. Empirical performances of CIs for

β_{100}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Gamma.

Table 4. Empirical performances of CIs for

β_{100}

under

Σ = Σ_{M}, ψ = 0.5, α = 0.05, n = 100

in Gamma.

p	Coverage(%)			Length
	WLS	WLP	Lasso	WLS	WLP	Lasso
k = 20
400	96.2	99.1	10.02	0.98	0.41	0.08
700	98.3	98.2	24.02	0.97	0.41	0.07
1000	96.3	97.3	27.56	0.89	0.41	0.08
1300	98.7	98.0	26.04	0.87	0.40	0.07
k = 25
400	95.2	95.0	11.25	0.98	0.41	0.08
700	99.2	96.8	17.06	0.91	0.41	0.07
1000	99.3	97.3	26.48	0.94	0.40	0.08
1300	99.6	98.1	35.76	0.95	0.41	0.07
k = 35
400	98.6	96.2	20.0	0.88	0.40	0.07
700	99.2	97.3	26.3	0.90	0.40	0.08
1000	98.6	98.1	51.05	0.80	0.41	0.07
1300	99.3	98.3	35.32	0.85	0.41	0.07

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, S.; Shi, Y.; Zhang, Q. Bias-Corrected Inference of High-Dimensional Generalized Linear Models. Mathematics 2023, 11, 932. https://doi.org/10.3390/math11040932

AMA Style

Tang S, Shi Y, Zhang Q. Bias-Corrected Inference of High-Dimensional Generalized Linear Models. Mathematics. 2023; 11(4):932. https://doi.org/10.3390/math11040932

Chicago/Turabian Style

Tang, Shengfei, Yanmei Shi, and Qi Zhang. 2023. "Bias-Corrected Inference of High-Dimensional Generalized Linear Models" Mathematics 11, no. 4: 932. https://doi.org/10.3390/math11040932

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bias-Corrected Inference of High-Dimensional Generalized Linear Models

Abstract

1. Introduction

2. Weighted Link-Specific Method of Generalized Linear Model

2.1. Poisson Regression

2.2. Gamma Regression

3. Theoretical Properties

3.1. Asymptotic Normality

3.2. Confidence Interval

4. Simulations

CIs for High-Dimensional Poisson and Gamma Regression

5. Real Data Analysis

5.1. False Discovery Rate and Power Comparison

5.2. CIs for the Different Stimuli

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI