Next Article in Journal
SoK: Benchmarking the Performance of a Quantum Computer
Next Article in Special Issue
Analysis of Longitudinal Binomial Data with Positive Association between the Number of Successes and the Number of Failures: An Application to Stock Instability Study
Previous Article in Journal
A Parallelizable Task Offloading Model with Trajectory-Prediction for Mobile Edge Networks
Previous Article in Special Issue
Logistic Regression Model for a Bivariate Binomial Distribution with Applications in Baseball Data Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Variable Selection and Estimation in Semiparametric Simplex Mixed-Effects Models with Longitudinal Proportional Data

1
Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650091, China
2
Department of Mathematics and Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China
3
College of Mathematics and Information Science, Guiyang University, Guiyang 550005, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(10), 1466; https://doi.org/10.3390/e24101466
Submission received: 6 September 2022 / Revised: 8 October 2022 / Accepted: 9 October 2022 / Published: 14 October 2022
(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data)

Abstract

:
In the development of simplex mixed-effects models, random effects in these mixed-effects models are generally distributed in normal distribution. The normality assumption may be violated in an analysis of skewed and multimodal longitudinal data. In this paper, we adopt the centered Dirichlet process mixture model (CDPMM) to specify the random effects in the simplex mixed-effects models. Combining the block Gibbs sampler and the Metropolis–Hastings algorithm, we extend a Bayesian Lasso (BLasso) to simultaneously estimate unknown parameters of interest and select important covariates with nonzero effects in semiparametric simplex mixed-effects models. Several simulation studies and a real example are employed to illustrate the proposed methodologies.

1. Introduction

Various mixed-effects models based on simplex distribution have increasingly become popular tools in the analysis of longitudinal continuous proportional data over time in many biological, medical and clinical studies. Under the framework of generalized linear mixed models, see Qiu et al. [1] for information on developing a simplex generalized linear mixed model on the basis of the penalized quasi-likelihood (PQL) and restricted maximum likelihood (REML) inference; see Zhang and Wei [2] for information on using the maximum likelihood estimation combining the stochastic approximation (SA) algorithm and the MCMC method to infer on simplex distribution nonlinear mixed models; see Zhao et al. [3] for information on implementing the MCMC algorithm to obtain the joint Bayesian estimate of simplex distribution nonlinear mixed models from the Bayesian perspective; see Bonat et al. [4] for information on investigating the likelihood analysis for a class of simplex mixed models with logit, probit, complement log–log and Cauchy link functions; see Quintero [5] for information on presenting the sensitivity analysis for variance parameters of random effects in Bayesian simplex mixed models. The random effects in the abovementioned mixed-effects models are assumed to have a multivariate normal distribution. However, in some practical applications, it is questionable for the normal assumption for random effects to analyze the skewed, bimodal and heavy-tailed longitudinal data. Therefore, it is essential to incorporate a semiparametric hierarchical structure via a Dirichlet process prior distribution for the random effects into the simplex mixed-effects models to accommodate longitudinal proportional data.
The nonparametric Bayesian approach based on Dirichlet process (DP) prior for random effects in mixed-effects models has been receiving a lot of attention in recent years. For example, Kleinman and Ibrahim [6] used a Dirichlet process prior for the general distribution of the random effects in generalized linear mixed model. As a variant of Dirichlet process prior, the truncation approximation Dirichlet process with stick-breaking priors is widely incorporated into various mixed-effects models to specify the general distribution of random effects. For example, Tang and Duan [7] used this approach for a semiparametric Bayesian approach to generalized partial linear mixed model; Tang and zhao [8] used this approach for nonlinear reproductive dispersion mixed models; Zhao et al. [9] used this approach for a semiparametric Bayesian approach to binomial distribution logistic mixed-effects model. In particular, Duan et al. [10] used a truncated and centered Dirichlet process prior to specify random effects in semiparametric reproductive dispersion mixed model. However, the abovementioned DP with stick-breaking prior for random effects is inappropriate when the underlying density of random effects is continuous. In addition, this type of variant for Dirichlet process prior is rather time-consuming in the calculation process for complicated models. Therefore, to address the above issues, the goal of this paper is to propose a new semiparametric simplex mixed-effects models with the random effects distribution specified by the centered Dirichlet process mixture model (CDPMM).
Although various methodologies have been developed to make statistical inference on the aforementioned simplex mixed-effects models, little work has been performed for the variable selection of simplex mixed-effects models. Classical model-selection methods, such as the step-wise selection method [11], the model comparison via Bayes factor [12], the Akaike information criterion [13] and Deviance information criterion [14], are often used to identify the important covariates in regression analysis; however, these approaches are generally computationally intensive and unstable for complicated mixed models with many covariates. On the other hand, the regularization (penalization) method has increasingly become a popular tool for conducting variable selection in regression analysis. Commonly used regularization methods in the context of linear regression include least absolute shrinkage and selection operator (Lasso) [15], elastic net [16] and adaptive lasso [17]. In addition, Park and Casella [18] proposed the Bayesian version of the Lasso (BLasso) by assigning the conditional Laplace prior of regression coefficients and the gamma distribution of shrinkage parameter under the Bayesian framework. The BLasso procedure has been extended to various complex models including semiparametric structural equation models [19] and semiparametric joint models of multivariate longitudinal and survival data [20]. In particular, Erd et al. [21] pointed out that Bayesian penalization methods perform similarly or sometimes even better than frequentist penalization methods, since Bayesian penalization methods can easily provide credible intervals (CIs) for parameters of interest and obtain the estimate of the penalty parameter by assigning an appropriate prior distribution. Therefore, the other main purpose of this paper is to extend the BLasso procedure to the considered semiparametric simplex mixed-effects models.
The paper is organized as follows: In Section 2, we propose a new semiparametric simplex mixed-effects models with random effects following the centered Dirichlet process mixture model (CDPMM) and incorporate a BLasso procedure into the proposed simplex mixed-effects models. The required conditional distributions are derived in Section 3. Two simulation studies and a real example are used to illustrate the proposed methodologies in Section 4. Some concluding remarks are given in Section 5.

2. Model and Notation

The simplex distribution was firstly proposed by Barndorff-Nielsen and Jørgensen [22], whose probability density function is specified as
p ( y ; μ , σ 2 ) = [ 2 π σ 2 { y ( 1 y ) } 3 ] 1 / 2 e x p { d ( y ; μ ) 2 σ 2 } , i f 0 < y < 1 0 , o t h e r w i s e
where μ ( 0 , 1 ) denotes the mean parameter; σ 2 > 0 represents the dispersion parameter; and d ( y ; μ ) = ( y μ ) 2 y ( 1 y ) μ 2 ( 1 μ ) 2 . For simplicity of notation, we denote y S ( μ , σ 2 ) if a random variable, y, is distributed as a simplex distribution with mean parameter, μ , and dispersion parameter, σ 2 , in the rest of this paper.
In the context of longitudinal data analysis, let y i j denote the longitudinal percentage outcome for the ith individual at the jth follow-up time t i j , and 0 < y i j < 1 , i = 1 , , n , j = 1 , , n i . We assume that, given a q × 1 random effects b i corresponding to the ith individual, the responses y i j are conditionally independent and each y i j | b i is distributed as a simplex distribution with conditional means, μ i j = E ( y i j | b i ) , and constant dispersion parameter, σ 2 : that is, y i j | b i S ( μ i j , σ 2 ) . Under the framework of GLMM, the conditional mean is linked to explanatory variables and random effects as follows:
f ( μ i j ) = Δ η i j = x i j T β + z i j T b i ,
where an unknown and monotone link function f ( · ) is chosen as the logit link; x i j is a ( p + 1 ) × 1 vector of covariates which consist of the constant 1 and time-dependent covariates observed at time point t i j ; β is a ( p + 1 ) × 1 vector of unknown regression parameters; z i j is a q × 1 vector of time-dependent variables which may include some elements of x i j corresponding to random effects b i . In classical random-effects models, the random effects in (2) are generally assumed to be a multivariate normal distribution, which may give rise to biased estimates of parameters or even misleading conclusions. Thus, inspired by Ohlssen and Spiegelhalter [23], we used the DP mixture of normals to specify the random effects: that is, b i i . i . d . g = 1 π g N q ( μ g , Ω g ) with ( μ g , Ω g ) P , where P is an unknown random probability. Clearly, it is rather difficult and inefficient to make Bayesian estimates for regression parameter β and dispersion parameter σ 2 in Equation (2) since an unknown form of P is involved. To address the difficulty, the Dirichlet process (DP) prior is usually introduced to approximate P , i.e., P DP ( τ F 0 ) , in which F 0 is a given base distribution such as multivariate normal distribution that serves as a starting point for constructing the nonparametric distribution, and τ is a weight that indicates the researcher’s certainty of F 0 as the distribution of P . In particular, Sethuraman [24] showed that the DP prior DP ( τ F 0 ) has the stick-breaking prior representation; however, this approach causes a nonzero mean of random effects [25] and a discrete probability distribution of random effects [23]. Generally, the variants of Dirichlet Process proposed by Ishwaran and Zarepour [26] and Yang et al. [25] were regarded as discrete Dirichlet processes (discrete DPs). A discrete DP with stick-breaking prior for random effects is inappropriate when the underlying density of random effects is continuous. Furthermore, violation of zero mean assumption on the random effects may lead to non-identifiability in the aforementioned random effects model. In addition, the discrete DP methods with stick-breaking prior for random effects are generally computationally intensive for the complicated models.
To overcome the above issues, inspired by Ohlssen and Spiegelhalter [23] and Yang et al. [25], we incorporated the following variant of Dirichlet process into the above model in (2) to specify random effects. That is,
b i i . i . d . g = 1 π g N q ( μ g , Ω g ) with μ g = μ g * g = 1 π g μ g * and ( μ g * , Ω g ) i . i . d . F 0 ,
where π g is a random probability weight satisfying 0 π g 1 and g = 1 π g = 1 . In addition, π g is assumed to be be independent of ( μ g * , Ω g ) . This variant of Dirichlet process is referred to as the centered Dirichlet process mixture model (CDPMM). As in Ishwaran and Zarepour [26], we adopt the following mixture model of the truncated approximation DP for P :
b i i . i . d . g = 1 G π g N q ( μ g , Ω g ) with μ g = μ g * g = 1 G π g μ g * and ( μ g * , Ω g ) i . i . d . F 0 ,
where G is a limited integer satisfying 1 G < . As for the selection of G, Ishwaran and Zarepour [26] pointed out that a moderate value of G such as 25 may be enough to capture a good approximation in practical application. Thus, the value of G is chosen to be 25 in the rest of this paper. Furthermore, the random probability weight, π g , is specified by the following stick-breaking procedure:
π 1 = ϑ 1 and π g = ϑ g ι = 1 g 1 ( 1 ϑ ι ) for g = 2 , , G ,
where ϑ g i . i . d . Beta ( 1 , τ ) for g = 1 , , G 1 , and ϑ G = 1 so that g = 1 G π g = 1 . The prior distribution for the unknown parameter τ is chosen as τ Γ ( a 1 , a 2 ) , such that the posterior distribution for τ is conjugated. Here, we set the hyperparameters a 1 and a 2 to be 25 and 5, respectively, such that large value of τ is generated, which results in more unique b i values.
It is rather difficult and inefficient to generate observations from posterior distributions of b i with the above DP prior via MCMC algorithm. Furthermore, a latent variable L i { 1 , , G } is introduced to solve sample issue since this latent variable can record each b i ’s cluster membership and convey its parametric value to the distribution of b i . Let L = { L 1 , , L n } , π = { π 1 , , π G } , μ * = { μ 1 * , , μ G * } and Ω = { Ω 1 , , Ω G } , in which Ω g = diag ( ω g 1 , , ω g q ) for g = 1 , , G . As in Ishwaran and Zarepour [26], the hierarchical structure defined in (4) can be written as
L i | π i . i . d g = 1 G π g δ g ( · ) and ( π , μ * , Ω ) f 1 ( π ) f 2 ( μ * ) f 3 ( Ω ) ,
where δ g ( · ) denotes a discrete probability measure concentrated at g, f 1 ( π ) is defined in Equation (5), the prior for μ g * associated with f 2 ( μ * ) = g = 1 G f 2 ( μ g * ) is defined by
μ g * | ξ , Ψ i . i . d N q ( ξ , Ψ ) , ξ | ξ 0 , Ψ 0 N q ( ξ 0 , Ψ 0 ) , ψ j 1 | c 1 , c 2 Γ ( c 1 , c 2 ) for j = 1 , , q ,
and the prior for ω g j related to f 3 ( Ω ) = g = 1 G j = 1 q f 3 ( ω g j ) is defined by
ω g j 1 | ω j a , ϖ j Γ ( ω j a , ϖ j ) and ϖ j | ϖ j a , ϖ j b Γ ( ϖ j a , ϖ j b ) ,
where Ψ = diag ( ψ 1 , , ψ q ) , Γ ( c 1 , c 2 ) denotes the Gamma distribution with parameters c 1 and c 2 , and ξ 0 , Ψ 0 , c 1 , c 2 , ω j a , ϖ j a and ϖ j b are pre-specified hyperparameters: that is, ξ 0 = 0 q × 1 , Ψ 0 = I q , c 1 = 11 , c 2 = 2.5 , ω j a = 3 , ϖ j a = n and ϖ j b = 10 . Thus, given the values of L i , μ * and Ω , the prior for random effect b i is assumed to be N q ( μ L i , Ω L i ) with μ L i = μ L i * Σ g = 1 G μ g * .
To estimate the unknown parameters β and σ 2 in Equation (2) from the Bayesian perspective, it is necessary to specify priors for β and σ 2 . In order to alleviate the computational burden, the conjugate prior distribution for dispersion parameter σ 2 is taken to be
σ 2 Γ ( σ a 2 , σ b 2 ) ,
where the values of hyperparameters σ a 2 and σ b 2 are taken to be 1 and 0.01 , respectively. In this paper, the main goal is to incorporate the Bayesian version of lasso into our proposed model (2) to conduct parameter estimation and model selection simultaneously. Similar to Park and Casella [18] and Tang et al. [20], the following Laplace prior on β is given by
π ( β ) = k = 0 p ν 2 exp ν | β k | ,
where ν is the regularization parameter. Because the mass of the above presented Laplace prior is quite highly concentrated around zero with a distinct peak at zero, posterior means or modes of β k ’s are shrunk towards zero, which is the key principle in using BLasso method to select the important covariates. Following Robert [15], the Laplace distribution with the form a exp a | x | / 2 can be represented as a scale mixture of normal distributions with independent exponentially distributed variance: that is,
a 2 exp a | x | = 0 1 2 π u exp x 2 2 u a 2 2 exp a 2 u 2 d u , for a > 0 .
Therefore, the aforementioned prior for β can be reformulated as the following hierarchical structure:
β | H β N p ( 0 , H β ) w i t h H β = diag ( h β 0 2 , , h β p 2 ) , h β 0 2 , , h β p 2 j = 0 p ν 2 2 exp ν 2 2 h β k 2 ν 2 Gamma ( ν a 2 , ν b 2 ) ,
where the hyperparameters ν a 2 and ν b 2 are selected as 1 and 0.1 , respectively, which imply diffuse prior. Similar to Park and Casella [18], the posterior distribution for h β k 2 and ν 2 in the hierarchical structure (10) have closed expressions, such that this hierarchical representation greatly simplifies the computation. Therefore, it follows from Equation (10) that the posterior distribution of ν 2 is distributed as the following Gamma distribution
ν 2 | β k , H β k Gamma ν a 2 + p + 1 , ν b 2 + 1 2 k = 0 p h β k 2 .
In addition, the posterior distributions for h β 0 2 , , h β p 2 are derived as
h β k 2 | β k , ν 2 IG ν β k , ν 2 for k = 0 , , p ,
where IG ( a , b ) denotes the inverse Gaussian distribution with parameter a and the shape parameter b. As for sampling from the inverse Gaussian distribution, Tang et al. [20] gave a detailed procedure.

3. Bayesian Analysis of Model

Let Y = { y i j : i = 1 , , n , j = 1 , , n i } , X = { x i j : i = 1 , , n , j = 1 , , n i } , Z = { z i j : i = 1 , , n , j = 1 , , n i } , and random effects b = { b i : i = 1 , , n } . To obtain joint Bayesian estimates of unknown parameters β and σ 2 and the random effects, as well as to select important covariates in our considered models, a hybrid algorithm combining the block Gibbs sampler and the Metropolis–Hastings algorithm is employed to draw a sequence of random observations from the joint posterior distribution p ( β , σ 2 , b | Y , X , Z ) , as follows. In this hybrid algorithm, observations { β , σ 2 , b } are iteratively drawn from the following conditional distributions: p ( β | σ 2 , b , Y , X , Z ) , p ( σ 2 | β , b , Y , X , Z ) and p ( b | β , σ 2 , Y , X , Z ) .
Block Gibbs Sampler (A): Conditional distribution related to β
It follows from Equations (2) and (10) that the conditional distribution p ( β | σ 2 , b , Y , X , Z ) is proportional to
exp 1 2 i = 1 n j = 1 n i 1 σ i 2 d ( y i j ; μ i j ) + ( β β 0 ) T H β 1 ( β β 0 ) ,
which is an unfamiliar distribution. Therefore, we used the well-known Metropolis–Hastings (MH) algorithm to generate observations from the aforementioned conditional distribution as follows. Given the current value β ( l ) , new candidate β is generated from the proposal distribution N ( β ( l ) , σ β 2 Σ β ) and is accepted with probability
m i n 1 , p ( β | σ 2 , b , Y , X , Z ) p ( β ( l ) | σ 2 , b , Y , X , Z ) ,
where
Σ β = i = 1 n j = 1 n i d ¨ ( y i j ; μ ¯ i j ) 2 σ 2 { f ˙ ( μ i j ) } 2 x i j T x i j + H β 1 1
with f ˙ ( μ i j ) = f / μ i j and d ¨ ( y i j ; μ ¯ i j ) = E y i j ( 2 d ( y i j ; μ i j ) / μ i j 2 ) β = β ( l ) , and the variance coefficient σ β 2 can be chosen, such that the average acceptance rates are approximately 0.25 or more.
Block Gibbs Sampler (B): Conditional distribution related to σ 2
The conditional distribution p ( σ 2 | β , b , Y , X , Z ) can be derived as
p ( σ 2 | β , b , Y , X , Z ) ( σ 2 ) 0.5 i = 1 n n i + σ a 2 1 exp 0.5 i = 1 n j = 1 n i d ( y i j ; μ i j ) + σ b 2 σ 2 ,
which can be simplified as
σ 2 | β , b , Y , X , Z Γ 0.5 i = 1 n n i + σ a 2 , 0.5 i = 1 n j = 1 n i d ( y i j ; μ i j ) + σ b 2 .
Clearly, it is straightforward and efficient to draw observations for σ 2 from the Gamma distribution via any statistical software.
Block Gibbs Sampler (C): Conditional distribution related to θ b
Let θ b denote all unknown parameters associated with distribution of random effects b i , i = 1 , , n . θ b can be iteratively sampled by using the following nine steps:
Step (a). Conditional distribution of ξ given ( μ * , Ψ , b ) is given
ξ | μ * , Ψ , b N q ( A , B )
where B = ( G Ψ 1 + ( Ψ 0 ) 1 ) 1 and A = B ( ( Ψ 0 ) 1 ξ 0 + Ψ 1 g = 1 G μ g * ) .
Step (b). For j = 1 , , q , the diagonal elements of Ψ is conditionally distributed as
ψ j 1 | μ * , ξ Γ c 1 + G 2 , c 2 + 1 2 g = 1 G ( μ g j * ξ j ) 2 ,
where μ g j * is the jth element of μ g * and ξ j is the jth element of ξ .
Step (c). For j = 1 , , q , ϖ j | Ω is conditionally distributed as
ϖ j | Ω Γ ϖ j a , ϖ j b + g = 1 G ω g j 1 ,
where ω g j is the jth diagonal element of Ω g .
Step (d). Following Ishwaran and Zarepour [26], the conditional distribution of τ | π can be expressed as
τ | π Γ a 1 + G 1 , a 2 g = 1 G 1 log ( 1 ν g * ) ,
where ν g * is a random weight sampled from the beta distribution and it is sampled with step (e).
Step (e). It is easily obtained that the conditional distribution of π | L , τ is distributed as the following generalized Dirichlet distribution:
π | L , τ Dir ( a 1 * , b 1 * , , a G 1 * , b G 1 * ) ,
where a g * = 1 + d g , b g * = τ + ι = g + 1 G d ι for g = 1 , , G 1 , and d g is the number of L i s (and thus individuals) whose values equal to g. Simulating observation from the conditional distribution π | L , τ can be conducted as follows. First, ν g * is independently generated from a Beta distribution ( a g * , b g * ) . Then, π 1 , , π G are obtained from the following formulae:
π 1 = ν 1 * , π G = 1 g = 1 G 1 π g , and π g = ι = 1 g 1 ( 1 ν ι * ) ν g * , for g 1 or G .
Step (f). Conditional distribution of μ * | ξ , Ψ , Ω , L , b .
Let L 1 * , , L d * be the d unique values of { L 1 , , L n } (i.e., unique number of “clusters”), for g = 1 , , G ; μ g * is conditionally distributed as follows:
μ g * | ξ , Ψ N q ( ξ , Ψ ) for g { L 1 * , , L d * } ,
μ g * | ξ , Ψ , Ω , L , b N q ( E g , F g ) for g { L 1 * , , L d * } ,
where F g = ( Ψ 1 + Σ { i : L i = g } Ω i 1 ) 1 and E g = F g ( Ψ 1 ξ + Σ { i : L i = g } Ω i 1 b i ) for g { L 1 * , , L d * } . Given μ g * , μ g = μ g * Σ g = 1 G π g μ g * , μ * = { μ 1 * , , μ G * } and μ = { μ 1 , , μ G } .
Step (g). Conditional distribution of Ω | μ , ϖ , L , τ .
Similar to the notation of step (f), given g, for j = 1 , , q , the jth diagonal element of Ω g is conditionally distributed as
ω g j Γ ( ω j a , ϖ j ) for g { L 1 * , , L d * } ,
ω g j Γ d g 2 + ω j a , ϖ j + { i : L i = g } 1 2 ( b i j μ g j ) 2 for g { L 1 * , , L d * } ,
where b i j is the jth element of b i and μ g j is the jth element of μ g . Given ω g j , Ω g = diag ( ω g 1 , , ω g q ) and Ω = { Ω 1 , , Ω G } .
Step (h). The conditional distribution of L i | π , μ , Ω , b is given by
L i | π , μ , Ω , b i . i . d multinomial ( π i g * , g = 1 , , G ) ,
where π i g * is proportional to ( π g p ( b i | μ g , Ω g ) ) with b i | μ g , Ω g N q ( μ g , Ω g ) , and π g ( g = 1 , , G ) are sampled from step (e). Given L i , μ and Ω , the prior of b i is distributed as N q ( μ L i , Ω L i ) , with μ L i and Ω L i being the L i elements of sets μ and Ω , respectively.
Step (i). The conditional distribution for b = { b i : i = 1 , , n }
The conditional distribution p ( b i | β , σ 2 , Y , X , Z ) is non-standard and cannot be derived directly via Gibbs sampling for i = 1 , , n . Specifically,
p ( b i | β , σ 2 , Y , X , Z ) p ( b i | μ L i , Ω L i ) p ( Y i | β , σ 2 , b i , X , Z ) .
where Y i = { y i j : j = 1 , , n i } , p ( Y i | β , σ 2 , b i , X , Z ) = j = 1 n i p ( y i j ; μ , σ 2 ) with p ( y i j ; μ , σ 2 ) specified by Equation (1) and μ by Equation (2). The Metropolis–Hastings algorithm used to sample observation b i is implemented as follows. At the th iteration with a current value b i ( ) , a new candidate b i is drawn from the normal distribution N q ( b i ( ) , σ b 2 Σ b i ) , where Σ b i = ( Ω L i 1 + Ξ i ) 1 and Ξ i = 2 ( ln ( p ( Y i | β , σ 2 , b i , X , Z ) ) / b i b i T | b i = b i ( ) . The new b i is accepted with probability
min 1 , p ( b i | μ L i , Ω L i ) p ( Y i | β , σ 2 , b i , X , Z ) p ( b i ( ) | μ L i , Ω L i ) p ( Y i | β , σ 2 , b i ( ) , X , Z ) .
The variance, σ b 2 , can be chosen such that the average acceptance rate is approximately 0.25 or more.
Then, we can obtain a series of sample observations— { ( β ( l ) , σ 2 ( l ) , b ( l ) ) : l = 1 , 2 , , L } —via the above iterative process. Then, Bayesian estimates of β , σ 2 and b i for given i can be obtained by sample mean as follows:
β ^ = 1 L = 1 L β ( ) , σ 2 ^ = 1 L = 1 L σ 2 ( ) , b ^ i = 1 L = 1 L b i ( ) ,
Similarly, the consistent estimates of the posterior covariance matrices of β and σ 2 can be obtained via the sample covariance matrices.

4. Numerical Examples

To investigate the behavior of our proposed model and the BLasso method under the Bayesian framework, we conduct four simulation studies and a real example related to a prospective ophthalmology study.

4.1. Simulation Study

In the first simulation study, we assume that, given the random effects b i = ( b i 1 , b i 2 ) T , the longitudinal percentage responses, y i j , are conditionally independent and each y i j | b i ( i = 1 , , 100 , j = 1 , , 6 ) follows the simplex distribution—that is, y i j | b i S ( μ i j , σ 2 ) . The conditional mean parameter μ i j = E ( y i j | b i ) is specified as follows:
l o g i t ( μ i j ) = x i j T β + z i j T b i = β 0 + x 1 i j β 1 + x 2 i j β 2 + x 3 i j β 3 + t i j β 4 + b i 1 + t i j b i 2 ,
where x 1 i j randomly takes 1 or -1 with equal probability— x 2 i j and x 3 i j i . i . d N ( 0 , 1 ) , t i j = 0.2 j for j = 0 , , 5 . Moreover, the true values of the parameters are specified as follows: β = ( β 0 , , β 4 ) T = ( 0.45 , 0.00 , 0.45 , 0.00 , 0.45 ) T . This implies that a covariate corresponding to 0 is unimportant, and that σ 2 = 1 . The true distribution of random effect, b i , is assumed to be
b i 1 i . i . d N ( 0 , 0.8 ) , b i 2 = b i 2 * 2 with b i 2 * i . i . d Γ ( 4 , 2 ) ,
where the random effects cover the symmetric and skewed features with mean 0. A total of 500 Monte Carlo replications were conducted on the basis of the above-simulated setup.
In the second simulation study, 500 simulated datasets were generated by using the same setup as specified in the first simulation study except for the distribution of random effects. That is, random effects are distributed as
b i 1 i . i . d 0.6 N ( 0.8 , 0.5 ) + 0.4 N ( 1.2 , 0.5 ) and b i 2 i . i . d 0.6 N ( 0.8 , 0.5 ) + 0.4 N ( 1.2 , 0.5 ) ,
where random effects have bimodal features with 0.
Fore each dataset generated from the abovementioned two simulation studies, the hybrid algorithm combining the block Gibbs sampler and the Metropolis–Hastings algorithm in conjunction with the BLasso method and the stick-break prior of CDPMM was used to produce Bayesian estimates of parameters and random effects as well as simultaneously select the important covariates. To investigate the convergence for these Bayesian algorithms, we computed the estimated potential scale reduction (EPSR, proposed by Gelman et al. [27]) of parameters via three parallel sequences of observations based on three different initial values. It can be seen from Figure 1 that the EPSR values were less than 1.2 after about 7000 iterations in both simulations for all the test runs. Therefore, L = 5000 observations collected after 7000 iterations were used to compute the simulation results for all replications. Results obtained under simulations 1 and 2 are reported in Table 1, where ‘Bias’ is the difference between the true value and the mean of the estimates based on 500 replications; ‘RMS’ is the root mean square of differences between the true values and their corresponding estimates based on 500 replications. Compared with the Lasso from the frequentist view, the BLasso would not shrink the non-significant elements of β exactly toward 0 since the sampling-based method is involved. Thus, as suggested by Tang et al. [20], the criterion for variable selection is that a coefficient is viewed as 0 if its 95% confidence interval includes zeros. In Table 1, ‘F0’ denotes the proportion that the number of 95% confidence interval for regression parameter including zero in 500 replications is divided by 500. The larger the values of F0 corresponding to non-significant regression parameters, and the smaller the values of F0 corresponding to significant parameters, the better the performance of the posited model.
Examination of Table 1 indicated that (i) the Bayesian estimates of the unknown parameters β and σ 2 were reasonably accurate under the two abovementioned simulation studies since their absolute biases were less than 0.10 and their RMS values were less than 0.16; (ii) BLasso could correctly identify the zero and nonzero coefficients in most cases because the F0 values corresponding to important covariates were less than 10%, whilst the F0 values corresponding to unimportant covariates were near to 90%. On the other hand, to investigate the effectiveness of using the CDPMM prior for the random effects, we introduced the following RMSE (root of mean squared error) criterion in term of random effects,
RMSE ( b ^ i ) = 1 2 L m = 1 2 l = 1 L ( p m ( h l ) p ^ m ( h l ) ) 2 ,
where p m ( · ) and p ^ m ( · ) denote, respectively, the true density function for random effect b i m and kernel density estimation for the estimated values of random effect { b ^ i m : i = 1 , , n } ; h l is chosen to be the l L th quantile of the dataset { b ^ i m : i = 1 , , n } . The sample quantiles for the estimated values of RMSE are reported in Table 2. Furthermore, we chose a typical replication whose RMSE value is equal to the median in the 500 replications. Therefore, on the basis of the selected replication, the estimated densities of b i 1 and b i 2 based on the CDPMM prior against their corresponding true densities are plotted in Figure 2 and Figure 3, which indicated that the finite mixture of normal distributions can flexibly capture the symmetric, skewed and bimodal shapes of random effect b i . From Table 2, based on the results of 500 replication in both simulations, the estimated means and standard deviation (SD) of random effects b i 1 and b i 2 is approximate to their corresponding true values, the 25%, 50% and 75% quantiles of RMSE ( b ^ i ) values are small and close enough, which indicated that it is robust to apply CDPMM method to estimate random effects. All these findings indicated that (i) our proposed Bayesian procedure could capture the true information of b i well, regardless of their true distributions and forms, and (ii) BLasso could identify the true model with a high probability.
To compare the performance of the CDPMM prior for the random effects with the discrete DP given by Ishwaran and Zarepour [26] and Yang et al. [25], we conducted the following third simulation study. In this simulation study, 500 simulated datasets were generated by using the same setup as specified in the first simulation study, and fitted by the model with the discrete DP for the random effects. In the fourth simulation, we reanalyzed the aforementioned 500 datasets generated in the second simulation by using a parametric Bayesian approach with the random effects distribution specified by a multivariate normal distribution. The aim of this simulation was to compare the semiparametric approach based on the CDPMM prior with the parametric approach based on the Gaussian prior from the Bayesian perspective. Results obtained under the third and fourth simulations are reported in Table 3. Our programs were written in Matlab. It roughly took 119.3 s and 186.9 s in an Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz (Intel, Santa Clara, CA, USA) serve to run 12,000 iterations for our proposed CDPMM and discrete DP, respectively; this indicates that the CDPMM method is much more efficient than the discrete DP in our considered simulations. It can be seen from Table 2 and Table 3 that (i) our proposed CDPMM and discrete DP methods have the same performance in term of the ’Bias’ and ’RMS’ values, but the F0 values corresponding to the non-significant parameters under the proposed CDPMM prior are higher than those under the discrete DP prior; (ii) the RMS values and the correct rates of variable selection based on the F0 values under the semiparametric CDPMM prior are better than those under the parametric Gaussian prior.

4.2. Real Example

In this section, the application of our proposed approach to the skewed longitudinal proportional data is illustrated by the analysis of a prospective ophthalmology study [28] from the Bayesian perspective. The prospective ophthalmology data used in the study were obtained from the Supplementary Materials of a paper by Song and Tan [29] and were available from https://biometrics.biometricsociety.org/home/archive/supplementary-materials, accessed on 5 september 2022. This prospective ophthalmology study described that the eyes of 31 patients before surgery were injected by three gas concentration levels of C 3 F 8 , and all patients after surgery were followed up three–eight times over a three-month period. The outcome variable was the percentage of remaining gas volume relative to the initial volume of gas injected. These longitudinal proportional data from a prospective ophthalmology study were analyzed by Qiu et al. [1], Song and Tan [29] and Song et al. [30], respectively. However, these authors did not consider to conduct variable selection in the analysis of this dataset. Our scientific interest in this study is to investigate the effect of three initial gas concentration levels of C 3 F 8 and time on the percentage of remaining gas volume, while accounting for selecting the important covariates based on the BLasso method. Let the response y i j denote the percentage of gas left in the eye for patient i at the jth follow-up day, t i j , and y i j | b i S ( μ i j , σ 2 ) . Thus, the conditional mean in our proposed semiparametric mixed-effects model is given by
l o g i t ( μ i j ) = β 0 + log ( t i j ) β 1 + log 2 ( t i j ) β 2 + β 2 w i j + b i 1 + log ( t i j ) b i 2
where t i j is the time covariate for days after the gas injection; w i j is the covariate of gas concentration levels equal to −1, 0 and 1, corresponding to the concentration levels of 15%, 20% and 25%, respectively; and random effects b i 1 and b i 2 are specified by CDPMM in (4), which characterize the effect fluctuations of interception and logarithmic time among patients.
The abovementioned MCMC algorithm was used to produce the joint Bayesian estimates of parameters and random effects in this real example. In the implementation of MCMC process, the hyperparameter values were taken to be the same as those given in simulation. Similarly, we used the EPSR method given in simulation to investigate the convergence for these algorithms. The EPSR values of all parameters against the iteration numbers was plotted in Figure 4, which indicated that the MCMC algorithm converged after 4000 iterations since their EPSR values were less than 1.2 after about 4000 iterations. Hence, L = 4000 observations collected after 4000 iterations were used to calculate Bayesian estimates of parameters and random effects. Results are reported in Table 4 and Figure 5, which indicated that (i) the estimated densities of random effects b i 1 and b i 2 were bimodal and skewed, which indicated that traditional normality assumption for random effects is inappropriate in this real example; (ii) the square of logarithmic time ( log 2 ( t i j ) ) was detected to be an important covariate with a significantly negative effect on the percentage of gas left in the eye, since its corresponding 95% confidence interval did not include zero. The gas concentration levels ( w i j ) and the logarithmic time ( log 2 ( t i j ) ) were insignificant at significance level 0.05 because their 95% confidence interval included zero.

5. Conclusions

In this paper, we introduced a new semiparametric simplex mixed-effects models with the random effects following the centered Dirichlet process mixture model (CDPMM). The advantages of the proposed model based on CDPMM are the following: (i) it can capture the features of skewed and bimodal longitudinal proportional data; (ii) it can characterize absolutely continuous distributions for random effects. The novelty of our approach is that we adopted the BLasso procedure to simultaneously estimate parameters of interest, provide credible intervals (CIs) for parameters and conduct both shrinkage and variable selection for our considered models. A hybrid algorithm combining the Gibbs sampler and the MH algorithm was used to simultaneously obtain Bayesian estimates of unknown parameters, random effects and their standard errors and credible intervals. Empirical results show that (i) the proposed semiparametric Bayesian method provides quite accurate estimates of parameters (see Table 1); (ii) the average frequencies of correctly identifying unimportant predictors were near to 90%; (iii) CDPMM can effectively capture the potential features of normal, gamma and mixture normal distributions (see Table 2 and Figure 2, Figure 3 and Figure 5).

Author Contributions

Conceptualization, A.T. and X.D.; methodology, X.D. and A.T.; software, A.T. and Y.Z.; validation, A.T., X.D. and Y.Z.; formal analysis, A.T. and X.D.; investigation, A.T., X.D. and Y.Z.; Preparation of the original work draft, X.D. and A.T.; visualization, A.T. and Y.Z.; supervision, funding acquisition, A.T., X.D. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 11961079, No. 12161014, No. 11761016), the Guizhou Provincial Science and Technology Project ([2020]1Y009), the Natural Science Research Project of Education Department of Guizhou Province (KY[2021]134), the Project of High Level Creative Talents in Guizhou Province of China, and Guiyang University Multidisciplinary Team Construction Projects in 2021[2021-xk04].

Data Availability Statement

The research data are available on the website: https://biometrics.biometricsociety.org/home/archive/supplementary-materials, accessed on 5 september 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Qiu, Z.G.; Song, P.X.K.; Tan, M. Simplex mixed-effects models for longitudinal proportional data. Scand. J. Stat. 2008, 35, 577–596. [Google Scholar] [CrossRef]
  2. Zhang, W.; Wei, H. Maximum Likelihood Estimation for Simplex Distribution Nonlinear Mixed Models via the Stochastic Approximation Algorithm. Rocky Mt. J. Math. 2008, 38, 1863–1875. [Google Scholar] [CrossRef]
  3. Zhao, Y.Y.; Xu, D.K.; Duan, X.D.; Dai, L. Bayesian estimation of simplex distribution nonlinear mixed models for longitudinal data. Int. J. Appl. Math. Stat. 2014, 52, 1–10. [Google Scholar]
  4. Bonat, W.H.; Lopes, J.E.; Shimakura, S.E.; Ribeiro, P.J., Jr. Likelihood analysis for a class of simplex mixed models. Chil. J. Stat. 2018, 8, 3–7. [Google Scholar]
  5. Quintero, F.O.L. Sensitivity analysis for variance parameters in Bayesian simplex mixed models for proportional data. Commun. Stat. Simul. Comput. 2017, 46, 5212–5228. [Google Scholar] [CrossRef]
  6. Kleinman, K.P.; Ibrahim, J.G. A semiparametric Bayesian approach to generalized linear mixed models. Stat. Med. 1998, 17, 2579–2596. [Google Scholar] [CrossRef]
  7. Tang, N.S.; Duan, X.D. A semiparametric Bayesian approach to generalized partial linear mixed models for longitudinal data. Comput. Stat. Data Anal. 2012, 56, 4348–4365. [Google Scholar] [CrossRef]
  8. Tang, N.S.; Zhao, Y.Y. Semi-parametric Bayesian analysis of nonlinear reproductive dispersion mixed models for longitudinal data. J. Multivar. Anal. 2013, 115, 68–83. [Google Scholar] [CrossRef]
  9. Zhao, Y.Y.; Xu, D.K.; Duan, X.D.; Du, J. A semiparametric Bayesian approach to binomial distribution logistic mixed-effects models for longitudinal data. J. Stat. Comput. Simul. 2022, 92, 1438–1456. [Google Scholar]
  10. Duan, X.D.; Fung, W.K.; Tang, N.S. Bayesian semiparametric reproductive dispersion mixed models for non-normal longitudinal data: Estimation and case influence analysis. J. Stat. Comput. Simul. 2017, 87, 1925–1939. [Google Scholar] [CrossRef]
  11. Hocking, R.R. The analysis and selection of variables in linear regression. Biometrics 1976, 32, 1–51. [Google Scholar] [CrossRef]
  12. Kass, R.E.; Raftery, A.E. Bayes factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [Google Scholar] [CrossRef]
  13. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  14. Spiegelhalter, D.J.; Best, N.; Carlin, B.P.; Linde, A. Bayesian measures of model complexity and fit (with discussion). J. R. Stat. Soc. Ser. B 2002, 64, 583–639. [Google Scholar] [CrossRef] [Green Version]
  15. Robert, T. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar]
  16. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 61, 301–320. [Google Scholar] [CrossRef] [Green Version]
  17. Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  18. Park, T.; Casella, G. The Bayesian Lasso. J. Am. Statal Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
  19. Guo, R.; Zhu, H.; Chow, S.M.; Ibrahim, J.G. Bayesian lasso for semiparametric structural equation models. Biometrics 2012, 68, 567–577. [Google Scholar] [CrossRef]
  20. Tang, A.M.; Zhao, X.; Tang, N.S. Bayesian variable selection and estimation in semiparametric joint models of multivariate longitudinal and survival data. Biom. J. 2017, 59, 57–78. [Google Scholar] [CrossRef]
  21. Erp, S.V.; Oberski, D.L.; Mulder, J. Shrinkage priors for Bayesian penalized regression. J. Math. Psychol. 2019, 89, 31–50. [Google Scholar] [CrossRef] [Green Version]
  22. Barndorff-Nielsen, O.E.; Jørgensen, B. Some parametric models on the simplex. J. Multivar. Anal. 1991, 39, 106–116. [Google Scholar] [CrossRef] [Green Version]
  23. Ohlssen, D.I.; Sharples, L.D.; Spiegelhalter, D.J. Flexible random-effects models using Bayesian semiparametric models: Applications to institutional comparisons. Stat. Med. 2007, 26, 2088–2112. [Google Scholar] [CrossRef] [PubMed]
  24. Sethuraman, J. A constructive definition of Dirichlet priors. Stat. Sin. 1994, 4, 639–650. [Google Scholar]
  25. Yang, M.G.; Dunson, D.B.; Baird, D. Semiparametric Bayes hierarchical models with mean and variance constraints. Comput. Stat. Data Anal. 2010, 54, 2172–2186. [Google Scholar] [CrossRef]
  26. Ishwaran, H.; Zarepour, M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. J. Am. Stat. Assoc. 2000, 87, 371–390. [Google Scholar] [CrossRef]
  27. Gelman, A. Inference and monitoring convergence. In Markov Chain Monte Carlo in Practice; Gilks, W.R., Richardson, S., Spiegelhalter, D.J., Eds.; Chapman and Hall: London, UK, 1996. [Google Scholar]
  28. Meyers, S.M.; Ambler, J.S.; Tan, M.; Werner, J.C.; Huang, S.S. Variation of perfluorpropane disapperance after vitrectomy. Retina 1992, 4, 359–363. [Google Scholar] [CrossRef]
  29. Song, P.X.K.; Tan, M. Marginal models for longitudinal continuous proportional data. Biometrics 2000, 56, 496–502. [Google Scholar] [CrossRef]
  30. Song, P.X.K.; Qiu, Z.G.; Tan, M. Modelling heterogeneous dispersion in marginal models for longitudinal continuous proportional data. Biom. J. 2004, 46, 540–553. [Google Scholar] [CrossRef]
Figure 1. EPSR values of all parameters against iteration numbers for a randomly selected replication in the first simulation (left panel) and second simulation (right panel).
Figure 1. EPSR values of all parameters against iteration numbers for a randomly selected replication in the first simulation (left panel) and second simulation (right panel).
Entropy 24 01466 g001
Figure 2. Estimated densities versus true densities for random effects b i 1 and b i 2 in the first simulation.
Figure 2. Estimated densities versus true densities for random effects b i 1 and b i 2 in the first simulation.
Entropy 24 01466 g002
Figure 3. Estimated densities versus true densities for random effects b i 1 and b i 2 in the second simulation.
Figure 3. Estimated densities versus true densities for random effects b i 1 and b i 2 in the second simulation.
Entropy 24 01466 g003
Figure 4. EPSR values of all parameters against iteration numbers in the ophthalmology study.
Figure 4. EPSR values of all parameters against iteration numbers in the ophthalmology study.
Entropy 24 01466 g004
Figure 5. Estimated densities for random effects b i 1 and b i 2 in the ophthalmology study.
Figure 5. Estimated densities for random effects b i 1 and b i 2 in the ophthalmology study.
Entropy 24 01466 g005
Table 1. Bayesian estimates of parameters in the first and second simulation studies.
Table 1. Bayesian estimates of parameters in the first and second simulation studies.
Simulation 1Simulation 2
Par.TrueBiasRMSF0 (%)BiasRMSF0 (%)
β 0 −0.450.0540.1171.600.0310.1435.40
β 1 0.000.0070.10989.200.0020.10588.20
β 2 0.450.0170.1161.400.0180.1323.00
β 3 −0.00−0.0010.14194.60−0.0030.15393.40
β 4 0.45−0.0740.1363.80−0.0510.1468.20
σ 2 1.000.0060.070—–0.0090.073—–
Note: Bias denotes the difference between the true value and the mean of the estimates based on 500 replications; RMS denotes the root mean square of differences between the true values and their corresponding estimates based on 500 replications; F0 denotes the proportion of the 95% confidence interval for regression parameter including zero in 500 replications.
Table 2. Estimated means, standard deviations and RMSE quantile of random effects in the first and second simulation studies.
Table 2. Estimated means, standard deviations and RMSE quantile of random effects in the first and second simulation studies.
Random EffectsSimulation 1Simulation 2
MeanEst MeanSDEst SDMeanEst MeanSDEst SD
b i 1 0.000
0.040
0.8940.8310.000−0.0301.1001.032
b i 2 0.000
0.063
1.0000.9030.000
0.031
1.1000.995






Quantile of Simulation 1Quantile of Simulation 2







5%25%75% 5%25%75%
RMSE( b i )
0.031

0.039

0.050

0.079

0.084

0.095
Note: ’Mean’ denotes true empirical mean of the distribution; ’Est mean’ denotes mean of the posterior samples; ’SD’ denotes true empirical standard deviation of the distribution; ’Est SD’ denotes standard deviation of the posterior samples.
Table 3. Bayesian estimates of parameters in the third and fourth simulation studies.
Table 3. Bayesian estimates of parameters in the third and fourth simulation studies.
Simulation 3Simulation 4
Par.TrueBiasRMSF0 (%)BiasRMSF0 (%)
β 0 −0.450.0050.1011.200.0640.1518.40
β 1 0.000.0000.10486.60−0.0090.13084.40
β 2 0.45−0.0010.1231.400.0330.1483.80
β 3 −0.000.0050.16689.200.0070.18590.00
β 4 0.45−0.0030.1192.00−0.0680.14311.40
σ 2 1.000.0930.134—–0.0060.078—–
Note: Bias denotes the difference between the true value and the mean of the estimates based on 500 replications; RMS denotes the root mean square of differences between the true values and their corresponding estimates based on 500 replications; F0 denotes the proportion of the 95% confidence interval for regression parameter including zero in 500 replications.
Table 4. Bayesian estimates (BEs) and standard deviations (SDs) and 95 % credible intervals (CIs) for parameters in the ophthalmology study.
Table 4. Bayesian estimates (BEs) and standard deviations (SDs) and 95 % credible intervals (CIs) for parameters in the ophthalmology study.
Par.BE
SD
CI
β 0 2.5790.257(2.200, 3.031)
β 1 0.0930.172(−0.232, 0.427)
β 2 −0.3220.045(−0.412, −0.241)
β 3 0.3680.281(−0.131, 0.697)
σ 2 9.6641.317(7.406, 12.516)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Tang, A.; Duan, X.; Zhao, Y. Bayesian Variable Selection and Estimation in Semiparametric Simplex Mixed-Effects Models with Longitudinal Proportional Data. Entropy 2022, 24, 1466. https://doi.org/10.3390/e24101466

AMA Style

Tang A, Duan X, Zhao Y. Bayesian Variable Selection and Estimation in Semiparametric Simplex Mixed-Effects Models with Longitudinal Proportional Data. Entropy. 2022; 24(10):1466. https://doi.org/10.3390/e24101466

Chicago/Turabian Style

Tang, Anmin, Xingde Duan, and Yuanying Zhao. 2022. "Bayesian Variable Selection and Estimation in Semiparametric Simplex Mixed-Effects Models with Longitudinal Proportional Data" Entropy 24, no. 10: 1466. https://doi.org/10.3390/e24101466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop