Next Article in Journal
On an Indefinite Metric on a Four-Dimensional Riemannian Manifold
Previous Article in Journal
Applications Laguerre Polynomials for Families of Bi-Univalent Functions Defined with (p,q)-Wanas Operator
Previous Article in Special Issue
Mixture of Shanker Distributions: Estimation, Simulation and Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Longitudinal Data Analysis Based on Bayesian Semiparametric Method

1
School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China
2
Department of Statistics and Data Science, BNU-HKBU United International College, Zhuhai 519087, China
3
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai 519087, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2023, 12(5), 431; https://doi.org/10.3390/axioms12050431
Submission received: 31 December 2022 / Revised: 23 April 2023 / Accepted: 24 April 2023 / Published: 27 April 2023
(This article belongs to the Special Issue Computational Statistics & Data Analysis)

Abstract

:
A Bayesian semiparametric model framework is proposed to analyze multivariate longitudinal data. The new framework leads to simple explicit posterior distributions of model parameters. It results in easy implementation of the MCMC algorithm for estimation of model parameters and demonstrates fast convergence. The proposed model framework associated with the MCMC algorithm is validated by four covariance structures and a real-life dataset. A simple Monte Carlo study of the model under four covariance structures and an analysis of the real dataset show that the new model framework and its associated Bayesian posterior inferential method through the MCMC algorithm perform fairly well in the sense of easy implementation, fast convergence, and smaller root mean square errors compared with the same model without the specified autoregression structure.

1. Introduction

Longitudinal data arise from repeated observations from the same individual or group of individuals at different time points. The structure of longitudinal data is shown in the following Table 1. The basic tasks of analyzing longitudinal data can be summarized as (1) studying the trend in the covariance structure of the observed variables with respect to time; (2) discovering the influence of covariates on the observable variables; and (3) determining the within-group correlation structures [1].
Longitudinal data are often highly unbalanced. It is usually difficult to apply traditional multiple regression techniques to analyze highly unbalanced data directly. Statisticians developed various Bayesian statistical inference models for longitudinal data analysis [2]. A parametric assumption may result in modeling bias and may relax the assumption about parametric structure. Nonparametric methods have the characteristics of robustness because they do not require model assumptions. Semiparametric models integrate the characteristics of parametric and nonparametric models and have the characteristics of flexibility and ease of interpretation. Nonparametric statistical methods and semiparametric statistical methods are not only hot spots in current statistical research but also widely used in many practical applications. Xiang et al. [3] summarize some common outcomes of nonparametric regression analysis of longitudinal data. Bayesian methods for parametric linear mixed models have been widely used in different areas. Assuming a normal random effect and using the standard Gibbs sampling to realize some simple posterior inference, Quintana et al. [4] extend the general class of mixed models for longitudinal data by generalizing the GP part to the nonparametric case. The GP is a probabilistic approach to learning nonparametric models. Cheng et al. [5] propose the so-called LonGP, which is a flexible and interpretable nonparametric modeling framework. It provides a versatile software implementation that can solve commonly faced challenges in longitudinal data analysis. It also develops a fully Bayesian, predictive inference for LonGP, which can be employed to carry out model selection. Kleinman and Ibrahim [6] relax the normal assumption by assuming a Dirichlet process prior with a Gaussian measure of zero mean, semiparametrically modeling the random effects distribution [7]. Although a parametric model may have limitations in some applications, it is simpler than a nonparametric model whose scope may be too wide to draw a concise conclusion. Nonparametric regression has a fatal weakness, known as the “curse of dimensionality”, which refers to the fact that when the independent variable X is multivariate, the estimation accuracy of the regression function becomes very poor as the dimension of X increases. To solve this problem, the semiparametric model is a good compromise maintaining some excellent characteristics of parametric and nonparametric models [8]. There is a lot of literature on the semiparametric modeling of longitudinal data. Most of the existing literature employs random effects to model within-group correlations [9].
Semiparametric linear mixed models generalize traditional linear mixed models by modeling a covariate effect with a nonparametric function and parametrically modeling other covariate effects. Semiparametric linear mixed models mainly use frequentist methods for statistical inference and assume normal random effects [10]. In this paper, we employ a Bayesian semiparametric framework for linear mixed models given by Quintana et al. [4] by imposing stronger conditions on the random effect to obtain an explicit solution to posterior distributions of the model parameters and fast convergence of the MCMC algorithm in the posterior inference of model parameters. The proposed framework generalizes the default priori of variance components and adjusts the inference of fixed effects associated with nonparametric random effects. The latter includes the extrapolation of nonparametric mean functions over time [11,12]. The stochastic process approach in [4] is a good choice for characterizing the intragroup correlation. The Gaussian process (GP) covariance has an exponential form and is uniquely determined by two parameters. GP can specify autoregressive correlation (the AR covariance structure, [13]). A nonparametric Dirichlet process (DP) priori assigned to the covariance parameter results in an Ornstein–Uhlenbeck process (OUP). A partial Dirichlet process mixture (DPM) is performed on the OUP. By imposing stronger conditions on the random effect and decomposing the OUP into some single Gaussian variables, we can substantially simplify the MCMC sampling in the posterior inference. The framework of our proposed semiparametric autoregression model (SPAR) is demonstrated in Figure 1.
This paper is organized as follows. Section 2 briefly introduces some basic theoretical background and principles. Section 3 introduces the model setup using a partial Dirichlet process mixture in terms of the OUP, and the Bayesian semiparametric autoregressive model is proposed with a recommended solution. In Section 4, the Dirichlet process and Dirichlet process mixture (DPM) are simply introduced. The formulation of the semiparametric autoregressive model is given in Section 5. Section 6 derives the marginal likelihood, prior determination, and posteriori inference. Section 7 gives a simple Monte Carlo study and a real dataset application based on the proposed model. Some concluding remarks are given in the last section.

2. Theoretical Basis

2.1. A General Linear Hybrid Model Containing an AR Structure

For an observation object i ( i = 1 , 2 , , n ) , we denote the observation time points by t i 1 , t i 2 , , t i n i . Let y i = ( y i 1 , y i 2 , , y i n i ) ( n i × 1 ) be the observation vector from object i. At time point t i j , we consider the possible time-dependent covariate vector x i j = ( 1 , x i 1 ( t i j ) , x i 2 ( t i j ) , , x i p ( t i j ) ) . Let y i = ( y i 1 , y i 2 , , y i n i ) ( n i × 1 ) be the observation vector from object i. At time point t i j , we consider the possible time-dependent covariate vector x i j = ( 1 , x i 1 ( t i j ) , x i 2 ( t i j ) , , x i p ( t i j ) ) . Let E ( y i j ) = x i j β . Define the n i × ( p + 1 ) -dimensional fixed-effect design matrix X i = ( x i 1 , x i 2 , , x i n i ) . Assume E ( y i ) = X i β . Define the corresponding n i × q ( q p ) -dimensional random effects design matrix Z i = ( z i 1 , z i 2 , , z i n i ) , where z i j = ( 1 , z i 1 ( t i j ) , z i 2 ( t i j ) , , z i q ( t i j ) ) . The general linear mixed model containing an AR covariance structure is
y i = X i β + Z i b i + S i + ϵ i , b i | τ N ( 0 , D ( τ ) ) , S i | θ N ( 0 , C i ( θ ) ) , ϵ i N ( 0 , σ 2 I n i ) ,
where X i β stands for the fixed effect and Z i b i for the random effect, S i is a stochastic process characterizing the correlation among the n i observations from object i, ϵ i is the error term, i = 1 , 2 , , n i . C i ( θ ) is an n i × n i structural covariance matrix, and vectors τ and θ contain the variance–covariance parameters of the random vector b i and the stochastic process S i , respectively. I n i stands for the identity matrix of dimension n i × n i . The AR structure is usually specified by a GP with zero mean. The GP is uniquely determined by a covariance function containing parameter θ = ( σ s , ρ ) . The vector S i is a GP corresponding to the i-th object. S i is generated sequentially by the GP { s i ( t ) : t > 0 } at the observation time points { t i 1 , t i 2 , , t i n i } , i.e.,
S i = s i ( t i 1 ) , s i ( t i 2 ) , , s i ( t i n i ) .
To specify the AR structure, it is assumed that the above GP possesses stationarity:
Cov ( s i ( t i l ) , s i ( t i k ) ) = σ s 2 ρ ( | t i l t i k | ) .
When the model has a structured covariance function, the covariance matrix of y i is
Cov ( y i ) = Z i D ( τ ) Z i + C i ( θ ) + σ 2 I n i .

2.2. The Principle for Bayesian Inference

The Bayesian method assumes a piori distribution on the unknown parameter θ and a joint distribution p ( x , θ ) between an observable variable X and the parameter θ . The Bayesian method is based on the posterior distribution π ( θ | x ) of the unknown parameter θ after observed data are available. The joint distribution, the priori distribution, and the posterior distribution are related to each other as follows:
p ( x , θ ) = π ( θ | x ) p X ( x ) , p X ( x ) = Θ p ( x , θ ) d θ = Θ p ( x | θ ) π ( θ ) d θ ,
where p X ( x ) stands for the sample marginal density of x . We use the posterior distribution π ( θ | x ) to carry out statistical inference for the parameter θ :
π ( θ | x ) = p ( x , θ ) p X ( x ) = p ( x | θ ) π ( θ ) Θ p ( x | θ ) π ( θ ) d θ
Equation (4) is the well-known Bayesian formula. Bayesian statistical inference assumes that the posterior distribution of the unknown parameter θ , π ( θ | x ) contains all the available information. As a result, the point estimation, interval estimation, hypothesis testing, and predicting inference of θ can be implemented as usual. Because
π ( θ | x ) = π ( θ ) p ( x | θ ) Θ π ( θ ) p ( x | θ ) d θ π ( θ ) p ( x | θ ) ,
we construct the Bayesian estimate for θ after observing data x by their conditional expectation:
θ ^ = E ( θ | x ) = Θ π ( θ ) p ( x | θ ) θ d θ Θ π ( θ ) p ( x | θ ) d θ Θ π ( θ ) p ( x | θ ) θ d θ .
In general, the Bayesian estimate g ^ ( θ ) for a function of θ , g ( θ ) can be obtained as follows:
g ^ ( θ ) = E [ g ( θ ) | x ] = Θ π ( θ ) p ( x | θ ) g ( θ ) d θ Θ π ( θ ) p ( x | θ ) d θ Θ π ( θ ) p ( x | θ ) g ( θ ) d θ .
Obviously, the estimator g ^ ( θ ) is the usual expectation of g ( θ ) with respect to the posterior distribution π ( θ | x ) :
g ^ ( θ ) = Θ g ( θ ) π ( θ | x ) d θ = E g ( θ ) | x .

2.3. The MCMC Sampling and Its Convergence

When the posterior distribution π ( θ | x ) in (6) is difficult to compute, estimating multiple parameters given by g ( θ ) can be realized by the Markov Chain Monte Carlo (MCMC) method [14] following these steps: (1) establish a Markov chain, the stationary distribution for π ( θ | x ) ; (2) use the Markov chain for the posterior distribution π ( θ | x ) to carry out sampling to obtain an MCMC sample { θ 0 , θ 1 , , θ n } ; and (3) obtain the MCMC estimator g ^ ( θ ) for g ( θ ) by
g ^ ( θ ) = g ¯ ( θ ) = 1 n i = 1 n g ( θ i ) .
The MCMC sample can estimate the parameter function g ( θ ) more and more effectively when the sample size n becomes larger and larger. The MCMC sample { θ 0 , θ 1 , θ 2 , , } possesses some properties, such as stability, normal recurrence, periodicity, and irreducibility. The choice of the initial value θ 0 has little impact on the estimate of θ t . Gibbs sampling is one of the MCMC algorithms. It is used to construct a multivariate probability distribution of a random sample. The defect of the standard Gibbs sampling is that it cannot process the nonconjugated distribution. Because the prior distribution of parameter ρ in the model in this paper is nonconjugated, the improved version of the Gibbs algorithm is adopted [15].
With regard to the convergence of the MCMC sampling, we consider three convergence criteria: (1) Combine the posterior sampling trend graph with the energy graph of the MCMC sampling process for convergence assessment. By observing the posterior sampling trend graph of a random variable, we can determine if the information of the sampling tends to be stable as the number of iterations increases. (2) Compare the overall distribution of energy levels in real data with energy changes between successive samples. If the two distributions are similar to each other, we can conclude that the algorithm converges. (3) Draw the trajectory of the negative evidence lower bound (ELBO, [16]) obtained in the optimization process to verify the convergence of the algorithm. Minimizing the KL (Kullback–Leibler) divergence is equivalent to minimizing the negative ELBO [17,18].

3. The OU Process

The random process S i in the general linear mixed model (1) is a Gaussian process (GP). A GP can be regarded as an infinite dimensional extension of the multivariate Gaussian distribution, and its probability characteristics are uniquely determined by the mean function and covariance function. In particular, a GP with zero mean is completely determined by its covariance function [18]. To give a simple review of GP and keep the notation S i specially used in model (1), we return to a general notation X ( t ) for GP to avoid a mix-up with the model parameter in model (1). X ( t ) in this section can be considered as a copy of S i in model (1). If random process { X ( t ) , t T } is a GP, any finite observation vector x = X t 1 , , X t n has a multivariate Gaussian distribution. Let E [ X ( t ) ] = m X ( t ) , v a r [ X ( t ) ] = σ X 2 ( t ) . The joint density function of the multivariate Gaussian random vector is
f x t 1 , x t n = ( 2 π ) n 2 C X 1 2 exp 1 2 x m X C X 1 x m X ,
where x = x ( t 1 ) , , x ( t n ) , mean vector m X = m X t 1 , , m X t n , and C X stands for the covariance matrix. Let c i j = cov X ( t i ) , X ( t j ) = E ( X ( t i ) m X ( t i ) ) ( X ( t j ) m X ( t j ) ) . When the Gaussian process { X t = X ( t ) , t T } has a zero mean and it is smooth, we have c i j = E [ X ( t i ) X ( t j ) ] and
E [ X ( t ) ] = 0 var [ X ( t ) ] = σ X 2 R X X t , X s = E X t X s = R X ( t s ) , t > s , C X X t , X s = σ X 2 R X ( t s ) = C X ( t s ) , t > s ,
where R X X t , X s and C X X t , X s stand for the GP autocorrelation function and the autocovariance function, respectively, which only depend on the time interval t s . Thus, R X ( t s ) = ρ ( | t s | ) , and c i j = cov X t i , X t j = σ X 2 ρ ( | t i t j | ) . The correlation coefficient between X ( t i ) and X ( t j ) is denoted by ρ ( | t i t j | ) . It is assumed that ρ ( | t i t j | ) = ρ | t i t j | in this paper. A zero-mean smooth GP { X ( t ) , t T } is an Ornstein–Uhlenbeck (OU) process [19]. An OU process can be regarded as a continuous analogy of the discrete-time first-order autoregressive AR(1) process.
To understand some properties of the AR(1) process, we express the AR(1) process as
X t = ϕ 1 X t 1 + ϵ t , t 1 , X 0 = ϵ 0 ,
where | ϕ 1 | < 1 is a weight parameter to ensure stability, and ϵ 0 , ϵ 1 , ϵ 2 , are uncorrelated random variables satisfying
E ( ϵ t ) = 0 , t 0 , V a r ( ϵ t ) = σ 2 1 ϕ 1 2 , t = 0 , σ 2 , t 1 . .
It is assumed that the random error satisfies cov ( ϵ t , ϵ t + k ) = 0 , k 0 . Therefore,
cov ( X t , X t + k ) = cov i = 0 t ϕ t i ϵ i , i = 0 t + k ϕ t + k i ϵ i = i = 0 t ϕ t i ϕ t + k i cov ( ϵ i , ϵ i ) = σ 2 ϕ 1 2 t + k 1 1 ϕ 1 2 + i = 1 t ϕ 1 2 i = σ 2 ϕ 1 k 1 ϕ 1 2 .
The autocorrelation function ρ k is assumed to follow the first-order difference equation
ρ k = ϕ 1 ρ k 1 , k 1 , ρ 0 = 1 ,
so that we have the following solution:
ρ k = ϕ 1 k , k 0 .
As shown in Figure 2 below, when | ϕ 1 | < 1 , the absolute correlation between X ( t i ) and X ( t j ) of X ( t ) at two different time points t i and t j approaches 0 when the time is increasing.
The above AR(1) process (7) is a special case of an OU process by taking ϕ 1 = 1 θ and μ = 0 , that is,
X t + 1 = X t + θ ( μ X t ) + ϵ t + 1 , X 0 = ϵ 0 .
Therefore, the exponential covariance matrix constructed by the OU process can be used to describe the correlation in longitudinal data. An OU process is showed in Figure 3. It is obvious that when X ( t ) moves toward its mean position with the increase in time t, the correlation between X ( t i ) and X ( t j ) at two different time points becomes weaker and weaker.

4. Dirichlet Process and Dirichlet Process Mixture

4.1. Dirichlet Process

A Dirichlet process (DP) is a class of stochastic processes whose trace is a probability distribution. DP is often used in Bayesian inference to describe the prior knowledge of random variables.
Definition 1.
Given a measurable set C , a basis distribution G 0 , and a positive real number α, a Dirichlet process DP ( G 0 , α ) is a random process whose realization is a probability distribution on C . For any measurable finite partition of C , B i i = 1 n , if G DP ( G 0 , α ) , then
( G ( B 1 ) , , G ( B n ) ) D i r ( α G 0 ( B 1 ) , , α G 0 ( B n ) ) .
where D i r stands for the Dirichlet distribution. A DP is specified by the basis distribution G 0 and the concentration parameter α.
A DP can also be regarded as an infinite dimensional generalization of the n-dimensional Dirichlet distribution. A DP is the conjugate priori of an infinite, nonparametric discrete distribution. An important application of DP is to use it as a prior distribution for infinite mixture models. A statistically equivalent description of DP is based on the stick-breaking process, described as follows. Given a discrete distribution
G ( ρ ) = i = 1 w k δ ρ k ( ρ )
where δ ρ k is an indicator function, namely
δ ρ k = 1 , θ = ρ k ; 0 , θ ρ k .
ρ k i i d G 0 , w k = v k i = 1 k 1 1 v i , v i i i d Beta ( 1 , α ) , the probability distribution G defined in this way is said to obey the Dirichlet process, denoted as G D P ( G 0 , α ) . A DP can be constructed by a stochastic process. It possesses some advantages in its application in Bayesian model evaluation, density estimation, and mixed model clustering [20].

4.2. Dirichlet Process Mixture

We consider a set of i.i.d. sample data { y 1 , , y n } as a part of an infinitely exchangeable sequence. y i follows the probability distribution F ( y ; θ ) with parameter θ Θ , that is, y i | θ F ( y ; θ ) . It may be assumed that the prior distribution of θ is an unknown random probability measure G, and G can be constructed by DP, that is, θ | G G , G D P ( G 0 , α ) . Thus, a Dirichlet process can be obtained using the hybrid (DPM) model definition:
y i | θ F ( y ; θ ) , θ | G G , G DP ( G 0 , α ) ,
where G 0 and α are the basis distribution and model parameter, respectively. In general, the distributions F and G 0 will depend on additional hyperparameters not mentioned above. Since the trace of DP is discretized with probability 1, the above model can be regarded as a countably infinite mixture. We can integrate the distribution G in the above DPM to obtain the prior distribution of θ :
θ i | ( θ 1 , , θ i 1 ) 1 α + i 1 j = 1 i 1 δ θ j + α α + i 1 G 0 = i 1 α + i 1 1 i 1 j = 1 i 1 δ θ j + α α + i 1 G 0 ,
where δ θ j ( j = 1 , , i 1 , i = 2 , 3 , ) represents the point quality at θ j . This is a mixture of two distributions with weights p = ( i 1 ) / ( α + i 1 ) and 1 p = α / ( α + i 1 ) .
Let f ( · | θ ) : θ Θ d be a family of density functions with parameter θ . We assume that the density function of the observed data y i follows the probability distribution family F = f ( · | G ) : G P defined by
f ( y i | G ) = f ( y i | θ ) d G ( θ ) ,
which is called the DPM density of y i , where G ( θ ) is the distribution of θ . F is the nonparametric family of mixed distributions. The distribution G can be made random by assuming that G comes from a DP. In practice, to obtain the DPM density functions of y 1 , , y n , it is necessary to introduce the latent variable θ i related to y i . That is, after introducing θ i , the joint density function of { y i : i = 1 , , n } is i = 1 n f ( y i | θ i ) , where { θ i : i = 1 , , n } is a sample from the distribution G. The joint density function of { y i : i = 1 , , n } can be expressed as i = 1 n f ( y i | G ) . We assign the prior distribution D P ( G 0 , α ) to the distribution G with G D P ( G 0 , α ) . The Bayesian model is set up as follows:
y i | ( β , S i , b i , θ , σ ) F ( θ , β , S i , b i , σ ) , θ k | G G , k = 1 , 2 , G ( · ) = k = 1 w k δ θ k ( · ) , w k = ν k i = 1 k 1 ( 1 ν i ) , ν i i i d B e t a ( 1 , α ) , θ k i i d G 0 , k = 1 , 2 ,
where { y 1 , , y n } is a set of observations and { θ 1 , , θ n } is a set of latent variables. Based on the discretization of DP, we can obtain the DPM density function of S i by
p ( S i | G ) = p ( S i | θ ) G ( d θ ) = k = 1 w k p ( S i | θ k ) .
A semiparametric model can be set up by replacing the DP mixture on θ with the DP mixture on any subset of w = { w k : k 1 } (given by (9)), depending on θ . Let θ i = ( w i , η ) with a prior distribution η . The above DP mixture is only performed on the parameter w i . The joint density function i = 1 n f ( y i | η , G ) of { y i : i = 1 , , n } can be obtained. The prior distribution of η and G are prespecified. As a result, the semiparametric model can be determined [21].

5. Formulation of the Semiparametric Autoregressive Model

5.1. The Partial Dirichlet Process Mixture of Stochastic Process

The stochastic process in (1) is semiparameterized to generalize the general linear mixed model for longitudinal data. A nonparametric DP prior is assigned to the covariance parameter ( σ s 2 , ρ ) of S i . To reduce the number of unknown parameters, a partial Dirichlet process mixture is performed on S i , i.e., only a nonparametric DP prior is assigned to the parameter ρ . The semiparametric process is created as follows. First, we consider the OU process S i associated with object i to have a covariance matrix C i ( θ ) = σ s 2 C ˜ i ( ρ ) , where θ = ( σ s 2 , ρ ) . The ( k , l ) -element of the matrix C ˜ i ( ρ ) is given by ρ | t i l t i k | . The matrix C ˜ i ( ρ ) associated with the random process S i has the following form:
1 ρ t i 2 t i 1 ρ t i 3 t i 1 ρ t i n i t i 1 ρ t i 1 t i 2 1 ρ t i 3 t i 2 ρ t i n i t i 2 ρ t i 1 t i n i ρ t i 2 t i n i ρ t i 3 t i n i 1 ;
Second, we give the parameter ρ a nonparametric DP prior:
ρ | G G , G DP ( G 0 , α ) ,
where G 0 is a known probability distribution, α is a constant, and the probability distribution G is generated by DP ( G 0 , α ) satisfying
G ( · ) = k = 1 w k δ ϕ k ( · ) , w k = ν k i = 1 k 1 ( 1 ν i ) , ν i i i d B e t a ( 1 , α ) , ϕ k i i d G 0 ,
where δ ϕ ( · ) is the point quality of ϕ , and G is a random probability distribution.
Using the discretization of DP, for any parameter ϕ , we assume that f ( · | ϕ ) is a density function depending on the parameter ϕ . Let ϕ | G G , G DP ( G 0 , α ) . The DPM density function can be obtained:
p ( · | G ) = p ( · | ϕ ) G ( d ϕ ) = k = 1 w k p ( · | ϕ k ) ,
where ϕ k i i d G 0 . After embedding the above conditional prior into the distribution of the random process S i , we formulate the DPM model for S i as
S i | ( σ s 2 , ρ ) N ( 0 , σ s 2 C ˜ i ( ρ ) ) , ρ | G G , G DP ( G 0 , α ) .
Using the discretization of the distribution function G, we obtain the DPM density of the random process S i as follows.
p ( S i | σ s 2 , G ) = N ( S i | 0 , σ s 2 C ˜ i ( ρ ) ) d G ( ρ ) = k = 1 w k N S i | 0 , σ s 2 C ˜ i ( ρ ˜ k ) .
This is an infinitely countable mixture of multivariate Gaussian density functions, where ρ ˜ k i i d G 0 , N ( · | 0 , σ s 2 C ˜ i ( ρ ) ) represents a multivariate Gaussian density function with a zero-mean vector and a covariance matrix of σ s 2 C ˜ i ( ρ ) . It can be seen that after the semiparametric treatment of the stochastic process S i , its distribution is a mixture of stochastic processes, which is a more general mixture of OU processes.
Since G is discretized with probability 1, it provides an automatic clustering effect on the autocorrelation structure of the objects. After the OU process S i is semiparameterized, the covariance between any two time points t i l , t i k can be obtained by
cov ( s i ( t i l ) , s i ( t i k ) | σ s 2 , G ) = cov ( s i ( t i l ) , s i ( t i k ) | σ s 2 , ρ ) d G ( ρ ) = σ s 2 ρ | t i l t i k | d G ( ρ ) = k = 1 w k σ s 2 ρ ˜ k | t i l t i k | .
After semiparameterization of the OU process S i , it not only contains an AR structure but also has an automatic clustering effect. If the observations { y i : i = 1 , , n } from model (1) are obtained at equal interval time points, the covariance matrix structure of y i becomes AR(1).

5.2. The Framework of a Hierarchical Model

We introduce potential parameters ρ 1 , ρ 2 , , ρ n (corresponding to n objects) and semiparameterize the general linear mixed model (1) with observations y 1 , y 2 , , y n satisfying the following model:
y i = X i β + Z i b i + S i + ϵ i , b i | τ N ( 0 , D ( τ ) ) , S i | ( σ s 2 , ρ i ) N 0 , σ s 2 C ˜ i ( ρ i ) , ρ i | G G , G DP ( G 0 , α ) , ϵ i N ( 0 , σ 2 I n i ) .
This is converted to a hierarchical model as follows:
y i | ( β , b i , S i , σ 2 ) i n d N ( X i β + Z i b i + S i , σ 2 I ) , S i | ( σ s 2 , ρ i ) i n d N 0 , σ s 2 C ˜ i ( ρ i ) , { ρ 1 , ρ 2 , , ρ n } | G i i d G , G DP ( G 0 , α ) , b i | τ i i d N ( 0 , D ( τ ) ) , ( σ 2 , β , τ , σ s 2 ) p ( σ 2 ) × N ( β 0 , B ) × p ( τ ) × p ( σ s 2 ) ,
where “ind” means independent only, S i and b i ( 1 i n ) are independent of each other, and p ( σ 2 ) , N ( β 0 , B ) , p ( τ ) , and p ( σ s 2 ) are the prior distributions of parameters σ 2 , β , τ , and σ s 2 , respectively. If b i is a scalar quantity corresponding to the random intercept of the model, we have τ = σ b 2 . This model generalizes the exponential covariance function of the OU process. It realizes the automatic clustering effect between objects through parameter ρ i . We call model (18) the Bayesian semiparametric autoregressive model, or simply the SPAR (semiparametric autoregressive) model.
Note that the SPAR model (18) is a parallel version of Quintana et al.’s [4] with additional conditions that the random effects { S i : i = 1 , , n } have a common variance component σ S 2 , and the autocorrelation parameters ( ρ 1 , ρ 2 , , ρ n ) are assumed to be conditional i.i.d. with D P ( G 0 , α ) . This is different from model (3) in Quintana et al. [4], which assumes that the random-effect parameters ( ϕ 1 , , ϕ n ) are conditional i.i.d. with D P ( G 0 , α ) , where ϕ i = ( σ i 2 , ρ i ) with v a r ( S i ) = σ i 2 C ˜ i ( ρ i ) in our notation. The simpler assumptions help simplify the posterior inference. Quintana et al.’s [4] model assumptions do not lead to explicit posterior distributions of model parameters. Their posterior inference on model parameters may have to be performed by nonparametric MCMC algorithms. Because no explicit expressions related to posterior inference on model parameters are given by [4], we are not able to conclude if Quintana et al.’s [4] Bayesian posterior inference is a complete semiparametric MCMC method or a combination of semiparametric and nonparametric MCMC methods. By imposing simpler assumptions on the parameters in model (18), we are able to obtain the explicit posterior distributions of all model parameters in the subsequent context and conclude that our approach to handling the SPAR model (18) is a complete semiparametric MCMC method.
In the SPAR model (18), the random process part is an OU process mixture. The correlation matrix C ˜ i ( ρ i ) possesses the following form:
C ˜ i ( ρ i ) = 1 ρ i | t i 2 t i 1 | ρ i | t i 3 t i 1 | ρ i | t i l t i 1 | ρ i | t i n i t i 1 | ρ i | t i 1 t i 2 | 1 ρ i | t i 3 t i 2 | ρ i | t i l t i 2 | ρ i | t i n i t i 2 | ρ i | t i 1 t i k | ρ i | t i 2 t i k | ρ i | t k 3 t i k | ρ i | t i l t i k | ρ i | t i n i t i k | ρ i | t i 1 t i n i | ρ i | t i 2 t i n i | ρ i | t i 3 t i n i | ρ i | t i l t i n i | 1 .
Using the property of the OU process structure, C ˜ i ( ρ i ) can be analyzed backwards. The inverse matrix of C ˜ i ( ρ i ) is a tridiagonal. Denote by r i k = ρ i | t i , k + 1 t i k | ( k = 1 , 2 , , n i 1 ). We have the ( k , l ) -th element of C ˜ i ( ρ i ) given by
C ˜ i ( ρ i ) k l = ρ i | t i l t i k | = ρ i | t i , k + 1 t i k | + | t i , k + 2 t i , k + 1 | + + | t i l t i , l 1 | = r i k r i , k + 1 r i , l 1 .
So, C ˜ i ( ρ i ) can be rewritten as
1 r i 1 r i 1 r i 2 r i 1 r i , l 1 r i 1 r i , n i 1 r i 1 1 r i 2 r i 2 r i , l 1 r i 2 r i , n i 1 r i 1 r i , k 1 r i 2 r i , k 1 r i 3 r i , k 1 r i k r i , l 1 r i k r i , n i 1 r i 1 r i , n i 1 r i 2 r i , n i 1 r i 3 r i , n i 1 r i l r i , n i 1 1 .
Using the correlation theory of the anticorrelation random variables [22], we can compute the elements of the inverse matrix C ˜ i 1 ( ρ i ) of C ˜ i ( ρ i ) as follows:
C ˜ i 1 ( ρ i ) 11 = 1 1 r i 1 2 , C ˜ i 1 ( ρ i ) k k = 1 r i , k 1 2 r i k 2 1 r i , k 1 2 r i k 2 + r i , k 1 2 r i k 2 , k = 2 , 3 , , n i 1 , C ˜ i 1 ( ρ i ) k k + 1 = r i k 1 r i k 2 , k = 1 , 2 , , n i 1 , C ˜ i 1 ( ρ i ) n i n i = 1 1 r i , n i 1 2 .
C ˜ i 1 ( ρ i ) turns out to be a tridiagonal matrix.
The random process corresponding to the i-th object S i = ( s i 1 , s i 2 , , s i n i ) satisfies the conditional distribution:
S i | σ s 2 , ρ i N n i 0 , σ s 2 C ˜ i ( ρ i ) .
So, the conditional density function is
f ( S i | σ s 2 , ρ i ) = f ( s i 1 , , s i n i | σ s 2 , ρ i ) = f ( s i 1 | σ s 2 , ρ i ) f ( s i n i | s i 1 , , s i , n i 1 , σ s 2 , ρ i ) .
When the inverse matrix of the covariance matrix C ˜ i ( ρ i ) of the random process S i has the above tridiagonal form, based on the property of the multivariate normal distribution, the conditional density function f ( S i | σ s 2 , ρ i ) of S i can be decomposed into the product of n i univariate Gaussian density functions as follows:
s i 1 N ( 0 , σ s 2 ) , s i 2 | ( s i 1 = s ˜ 1 ) N s ˜ 1 r i 1 , σ s 2 ( 1 r i 1 2 ) , s i k | ( s i 1 = s ˜ 1 , , s i , k 1 = s ˜ k 1 ) N s ˜ k 1 r i , k 1 , σ s 2 ( 1 r i , k 1 2 ) , s i n i | ( s i 1 = s ˜ 1 , s i , n i 1 = s ˜ n i 1 ) N s ˜ n i 1 r i , n i 1 , σ s 2 ( 1 r i , n i 1 2 ) .
It is known that r i k = ρ i | t i , k + 1 t i k | , k = 1 , 2 , , n i 1 . Hence,
s i 1 N ( 0 , σ s 2 ) , s i 2 | ( s i 1 = s ˜ 1 ) N s ˜ 1 ρ | t i 2 t i 1 | , σ s 2 1 ρ 2 | t i 2 t i 1 | , s i k | ( s i , k 1 = s ˜ k 1 ) N s ˜ k 1 ρ i | t i k t i , k 1 | , σ s 2 1 ρ i 2 | t i k t i , k 1 | , s i n i | ( s i , n i 1 = s ˜ n i 1 ) N s ˜ n i 1 ρ i | t i n i t i , n i 1 | , σ s 2 1 ρ i 2 | t i n i t i , n i 1 | .
As a result, the conditional density function of S i in the random process f ( S i | σ s 2 , ρ i ) can be decomposed as follows:
f ( S i | σ s 2 , ρ i ) = N s i 1 | 0 , σ s 2 N s i k | s ˜ k 1 ρ i | t i k t i , k 1 | , σ s 2 1 ρ i 2 | t i k t i , k 1 | N s i n i | s ˜ n i 1 ρ i | t i n i t i , n i 1 | , σ s 2 1 ρ i 2 | t i n i t i , n i 1 | .
The above decomposition greatly simplifies the computation of the distribution of the random process S i , which can be obtained by direct computation of the distribution of a single Gaussian variable.

6. The Marginal Likelihood, Prior Determination, and Posteriori Inference

Let y = ( y 1 , y 2 , , y n ) , b = ( b 1 , b 2 , , b n ) , S = ( S 1 , S 2 , , S n ) , and ρ = ( ρ 1 , ρ 2 , , ρ n ) . Denote by Ψ = ( β , σ 2 , σ s 2 , ρ , τ ) , which represents the parameter vector in model (18). A random process that specifies the AR structure in the SPAR model (18) is
S i | σ s 2 , ρ i N 0 , σ s 2 C ˜ i ( ρ i ) , ρ i | G G 0 , G DP ( G 0 , α ) .
The marginal distribution of the covariance parameter ρ i can be obtained from the joint distribution of all terms in (26) as follows:
ρ 1 G 0 , ρ i | ( ρ 1 , ρ 2 , , ρ i 1 ) α α + i 1 G 0 + 1 α + i 1 k = 1 i 1 δ ρ k
for i = 2 , , n . We assume that { S 1 , S 2 , , S n } is a sample from the multivariate Gaussian distribution. It can be regarded as a part of an exchangeable sequence. By using the exchangeable property, the corresponding order i of the observation y i can be considered as the last one in all n observations from n objects. Then, y i is the corresponding vector S i . Or we can consider it as the last one that gives us ρ ( i ) . The conditional prior score of ρ i is
ρ i | ρ ( i ) α α + i 1 G 0 + 1 α + i 1 k i δ ρ k ,
where ρ ( i ) represents all other ρ ’s except ρ i . According to the product formula, we have
p ( ρ ) = p ( ρ 1 , ρ 2 , , ρ n ) = p ( ρ 1 ) p ( ρ 2 | ρ 1 ) p ( ρ n | ρ 1 , ρ 2 , , ρ n 1 ) .
Based on the above conditional prior distributions, we can obtain the prior distribution p ( ρ ) of parameter ρ . Compute the likelihood function of the SPAR model (18) as follows:
L ( Ψ | y , b , S ) = f ( y , b , S | Ψ ) = f ( y | b , S , Ψ ) f ( b | Ψ ) f ( S | Ψ ) = f ( y | b , S , β , σ 2 ) f ( b | τ ) f ( S | σ s 2 , ρ ) .
Let
p ( Ψ ) = p ( β , σ 2 , σ s 2 , ρ , τ ) = p ( β ) p ( σ 2 ) p ( σ s 2 ) p ( ρ ) p ( τ ) ,
This implies that the prior distributions of each parameter are independent of each other. The joint posterior of the SPAR model is computed as follows:
p ( Ψ , b , S | y ) p ( y , b , S | Ψ ) p ( Ψ ) = i = 1 n f ( y i , b i , S i | Ψ ) p ( Ψ ) = i = 1 n f y i | β , S i , b i , σ 2 f ( b i | τ ) f S i | σ s 2 , ρ i p ( Ψ ) .
That is, the joint posterior distribution is the product of the likelihood function and the prior distribution. The first term of the likelihood function is the density function of the estimated n-dimensional Gaussian distribution N ( X i β + Z i b i + S i , σ 2 I ) at y i . The second term is in b i , the density function of the estimated q-dimensional Gaussian distribution N ( 0 , D ( τ ) ) . The third term is the product of the n i univariate Gaussian distribution density functions.
To estimate the joint posterior distribution of the SPAR model (18), it is necessary to use the Bayesian theorem to obtain the conditional distribution of parameters in the model:
p ( β | y , b , S , σ 2 , σ s 2 , ρ , τ ) , p ( b i | y , β , b i , S , σ 2 , σ s 2 , ρ , τ ) , p ( S i | y , β , b , S i , σ 2 , σ s 2 , ρ , τ ) , p ( σ 2 | y , β , b , S , σ s 2 , ρ , τ ) , p ( σ s 2 | y , β , b , S , σ 2 , ρ , τ ) , p ( ρ i | y , β , b , S , σ 2 , σ s 2 , ρ i , τ ) , p ( τ | y , β , b , S , σ 2 , σ s 2 , ρ ) .
The MCMC algorithm is employed to estimate these conditional distributions. The conditional distribution of a parameters is denoted by p ( · | ) in the subsequent context.

7. A Monte Carlo Study

7.1. Simulation Design

To verify that the SPAR model (18) can effectively simulate the correlation structure in the longitudinal data, the empirical sample data were generated under the four situations of zero mean and covariance structure being compound symmetric (CS), autoregressive (AR), mixed CS and AR, and nonstructured, respectively. The MCMC method was employed to estimate the covariance matrix and the correlation matrix in the four different cases, respectively, and compared with the traditional Bayesian inverse-Wishart estimation method.
Consider a brief form of the SPAR model (18):
y i = ( β + b i ) e + S i + ϵ i ,
where β represents a fixed intercept, e is an n i × 1 vector of ones, b i N ( 0 , σ b 2 ) is a random intercept, and S i is an OU process corresponding to y i . Convert the above model into a hierarchical model:
y i | ( β , b i , S i , σ 2 ) i n d N ( β + b i ) e + S i , σ 2 I , S i | ( σ s 2 , ρ i ) i n d N 0 , σ s 2 C ˜ ( ρ i ) , { ρ 1 , ρ 2 , , ρ n } | G i i d G , G DP ( G 0 , α ) , b i i i d N ( 0 , σ b 2 ) , β N ( μ 0 , σ 0 2 ) , σ 2 I G ( α 0 , β 0 ) , σ s 2 I G ( α 1 , β 1 ) .
For the special case of balanced sample design with n i = m ( i = 1 , 2 , , n ) and t i j = t j for i = 1 , 2 , , n , the joint posterior distribution of model (18) can be easily computed by
p ( β , σ 2 , σ s 2 , ρ , b , S | y ) = p ( y | β , b , S , σ 2 ) p ( b ) p ( S | σ s 2 , ρ ) p ( β , σ 2 , σ s 2 , ρ ) p ( y ) p ( y | β , b , S , σ 2 ) p ( b ) p ( S | σ s 2 , ρ ) p ( β , σ 2 , σ s 2 , ρ ) = i = 1 n f y i | β , b i , S i , σ 2 f ( b i ) f S i | σ s 2 , ρ i p ( β ) p ( σ 2 ) p ( σ s 2 ) p ( ρ ) .
To realize the Bayesian inference of the model, it is necessary to use the Bayesian theorem to obtain the conditional distribution of each parameter. The lengthy derivations of all conditional probability distributions are given in the Appendix A at the end of the paper.
Performing a posteriori inference on the covariance matrix Σ = ( σ i j ) ( m × m ) is equivalent to estimating the posteriori mean of Σ :
Σ ^ = E Σ | y = E σ b 2 A + σ s 2 C ˜ ( ρ ) + σ 2 I | y , m × m
where A = e e . The posteriori estimate for the correlation matrix R can be obtained by using Σ ^ to calculate the estimated R ^ of the correlation matrix R :
R ^ = R 0 Σ ^ R 0 , R 0 = d i a g σ ^ 11 1 2 , , σ ^ m m 1 2 , Σ ^ = σ ^ i j : m × m .
The mean square error loss function is used to evaluate the performance of the posterior mean:
M S E ( Σ ) = 1 m 2 i = 1 m j = 1 m ( σ ^ i j σ i j ) 2 1 2 .
Compared with the Bayesian estimates of the covariance matrix and correlation matrix under other loss functions [23], the most common one is the entropy loss function defined by
L 1 ( Σ ^ , Σ ) = tr ( Σ ^ Σ 1 ) log | Σ ^ Σ 1 | m .
The quadratic loss function is given by
L 2 ( Σ ^ , Σ ) = tr ( Σ ^ Σ 1 I ) 2 .
The Bayesian estimates of the covariance matrices based on the loss functions L 1 ( Σ ^ , Σ ) and L 2 ( Σ ^ , Σ ) are given by
Σ ^ = E Σ 1 | y 1 and vec ( Σ ^ ) = E Σ 1 Σ 1 | y 1 vec E Σ 1 | y ,
respectively, where vec stands for the column vectorization of a matrix, and “⊗” denotes the Kronecker product of matrices. Similarly, the Bayesian estimate of correlation moment R under various loss functions can be obtained.

7.2. Simulation Specification and Display of Empirical Results

In the simulation, we firstly set up the four different covariance structures as in the following Equations (41) and the priors as given in the following Equations (43)–(46). Then, we run the MCMC training trial 2000 times. After seeing the convergence trend approach relative stability after 2000 training trials, we generated 20 datasets consisting of 100 sample points of length 6 for each object. This is equivalent to the longitudinal data structure in Table 1, with n = 100 , p = 6 , and k = 1 for each generated longitudinal dataset.
The four covariance matrices are designed as follows:
Σ 1 ( i , j ) = 10 I ( i = j ) + 7 I ( i j ) Σ 2 ( i , j ) = 10 × 0 . 4 | i j | Σ 3 ( i , j ) = 0.3 Σ 1 ( i , j ) + 0.7 Σ 2 ( i , j ) Σ 4 ( i , j ) = 10 1 + | i j | .
The root mean square error (RMSE) for estimating Σ is computed by
R M S E = 1 20 k = 1 20 MSE , M S E = 1 36 i = 1 6 j = 1 6 ( Σ i j Σ ^ i j ) 2 .
The RMSE for estimating the correlation matrix R is computed similarly.
For each of the four different covariance structures, we used the R-package (called Pandas, available upon request) to perform a preliminary analysis on the simulated data with a specified prior distribution, respectively. Then, we employed the MCMC algorithm to perform the sampling estimation on each parameter in the model. We used the three methods mentioned before to perform the convergence assessment on the sampling results. Finally, we obtained the estimates for the four types of covariance and correlation matrices. The estimation error was computed based on three types of loss functions. The inverse-Wishart estimation error was also obtained for comparison.
For the case of the CS covariance structure, the prior distribution of the parameter in the model is specified as follows:
G 0 = N ( 0 , 10 ) , α = 0.75 , b i i i d N ( 0 , 7 ) , β N ( 0 , 25 ) , σ 2 I G ( 3 , 2 ) , σ s 2 I G ( 6 , 10 ) .
Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 4.
The negative ELBO loss histogram and energy graph estimated by the model with the CS structure are shown in Figure 5 and Figure 6, respectively, where the energy graph in Figure 6 was generated by the Python package PyMC3 (https://www.pymc.io/projects/docs/en/v3/pymc-examples/examples/getting_started.html) (accessed on 26 April 2023), which displays two simulated density curves: the blue one stands for the energy value at each MCMC sample point subtracted by the average energy (like conducting data centerization); the green one stands for the difference of the energy function (like deriving the derivative of a differential function). A normal energy distribution from an MCMC sampling indicates the sampling process tends to a stable point. It implies convergence of the MCMC sampling. More details on energy computation in MCMC sampling by PyMC3 can be found on this website (https://www.pymc.io/projects/docs/en/v3/api/inference.html#module-pymc3.sampling). Figure 5 and Figure 6 incorporate both popular methods for evaluating the convergence of the MCMC sampling in our Monte Carlo study. All energy graphs in the subsequent context have the same interpretation as they do here. We skip the tedious interpretations for all other energy graphs to save space.
As can be seen from Figure 5 and Figure 6, after several iterations of the MCMC algorithm, the negative ELBO loss is stable between 0 and 25, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the sampling distribution and the trend graph, we can conclude that the MCMC algorithm is convergent.
For the case of the AR covariance structure, the prior distribution of each parameter in the model is specified as follows:
G 0 = N ( 0.4 , 10 ) , α = 0.75 , b i i i d N ( 0 , 0.01 ) , β N ( 0 , 10 ) σ 2 I G ( 1.25 , 0.01 ) σ s 2 I G ( 3.76 , 9.566 )
Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 7. Note that the DP-estimated density curves show different central locations from those in Figure 4, because they are generated from different prior distributions with different covariance structures (see the difference between Equations (43) and (44)).
The histogram of negative ELBO loss and the energy graph for when the AR structure is estimated are shown in Figure 8 and Figure 9, respectively. They show that the histogram of the negative ELBO loss tends to be stable after several iterations, which is basically below 50, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly.
For the covariance of the mixed structure of CS and AR, the prior distribution of each parameter in the model is specified as follows:
G 0 = N ( 0.4 , 10 ) , α = 0.75 , b i i i d N ( 0 , 2.1 ) , β N ( 0 , 10 ) , σ 2 I G ( 15.2 , 14 ) , σ s 2 I G ( 3.82 , 6.867 ) .
Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend graph, as shown in Figure 10. The negative ELBO loss histogram and the model energy graph are shown in Figure 11 and Figure 12, respectively.
As can be seen from the histogram of the negative ELBO loss in Figure 11, the negative ELBO loss is basically under control between 0 and 30, after several iterations of the algorithm, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent.
For the case of independent structure covariance, the prior distributions of the parameters in the model are specified as follows:
G 0 = N ( 0.672 , 10 ) , α = 0.75 , b i i i d N ( 0 , 1.106 ) , β N ( 0 , 20 ) , σ 2 I G ( 2.22 , 7.48 ) , σ s 2 I G ( 3.20 , 3.443 ) .
The MCMC method was used to sample and estimate each parameter in the model. The last 1000 iterations were taken to draw the posterior sampling trend graph shown in Figure 13.
The negative ELBO loss histogram and the model energy graph are shown in Figure 14 and Figure 15, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent.
The above model is applied to the estimation of covariance matrices, correlation matrices, and model errors. We compare it with with the inverse-Wishart method. The outcomes of the estimation errors are shown in Table 2:
Similarly, the estimation results of the four corresponding correlation matrices are shown in Table 3:
In Table 2 and Table 3, the covariance models are compared with each other based on three types of loss functions. In estimating the four covariance structures, the SPAR model performs better than the traditional inverse-Wishart method for each covariance structure. The estimation error of the SPAR model is much smaller than that of the inverse-Wishart method. When estimating the correlation matrix, except for the relatively poor SPAR performance under the strict AR structure, all other models perform roughly the same based on the quadratic loss function L 2 ( Σ ^ , Σ ) . Based on the comparison of the estimation errors in Table 2 and Table 3, the SPAR model shows fairly good performance in estimating the covariance matrix and correlation matrix.

7.3. Analysis of a Real Wind Speed Dataset

To verify the effectiveness of the SPAR model built in this paper in practical application, we employ the Hongming data, which contain the ground meteorological data of Dingxin Station, Jinta County, Jiuquan City, Gansu Province, China. We use the SPAR model and the MCMC method to estimate the covariance matrix and correlation matrix under four different covariance structures (CS, AR, the mixture of CS and AR, and unstructured) and compare the estimates with those from the traditional Bayesian estimation by the inverse-Wishart method. The covariance structure of the four cases is shown in Equation (41), and the mean square error is shown in Equation (42). The real data are arranged into a longitudinal data structure, as shown in Table 1 with n = 100 , p = 6 , and k = 1 . The following graphs are plotted in the same way as in the corresponding graphs in the Monte Carlo study in Section 7.2 but with real data input in the same Python program.
For the case of CS covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 16.
The negative ELBO loss histogram and the model energy graph are shown in Figure 17 and Figure 18, respectively. It can be seen that the negative ELBO loss approaches 0 quickly, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent.
For the case of the AR covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 19:
The negative ELBO loss histogram and the energy graph estimated by the model under the AR structure are shown in Figure 20 and Figure 21, respectively. It shows that the histogram of the negative ELBO loss approaches 0 quickly. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly.
For the covariance of the mixed structure of CS and AR, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend graph, as shown in Figure 22. The negative ELBO loss histogram and the model energy graph are shown in Figure 23 and Figure 24, respectively.
As can be seen from the histogram of the negative ELBO loss in Figure 23, the negative ELBO loss approaches 0 quickly after several iterations of the algorithm. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent.
For the case of independent structure covariance, the MCMC method is used to sample and estimate each parameter in the model. The last 1000 iterations are taken to draw the posterior sampling trend graph shown in Figure 25.
The negative ELBO loss histogram and the model energy graph are shown in Figure 26 and Figure 27, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and trend diagram, we can conclude that the algorithm is convergent.
In Table 4 and Table 5, the covariance models are compared with each other based on three types of loss functions, RMSE, L 1 , and L 2 . When using the four covariance structures based on either the covariance matrix or the correlation matrix, the SPAR model always performs better than the traditional inverse-Wishart method—it always has a smaller value of RMSE, L 1 , or L 2 when comparing the SPAR model with the Inv-W model under the same covariance structure C1, C2, C3, or C4 in Table 4 and Table 5.

8. Concluding Remarks

The SPAR model (18) provides an explicitly complete semiparametric solution to the estimation of model parameters through the MCMC algorithm. Compared with the model formulation in Quintana et al. [4], the MCMC algorithm for posterior inference on model (18) is easier to implement and may converge faster because of the explicit simple posterior distributions of the model parameters. An effective and fast-converging MCMC algorithm plays an important role in Bayesian statistical inference. The SPAR model (18) gains some trade-off in easy implementation and fast convergence in the MCMC algorithm by imposing simpler assumptions on model parameters.
With regard to the option of choosing the initial values for estimating model parameters by the MCMC algorithm, we recommend using a numerical optimization method such as the maximum posteriori (MAP) estimation to obtain an estimator as the initial value of a parameter. It is likely to speed up the convergence speed of the sampling parameter. We employ the Gibbs sampling algorithm when estimating the parameters in the model. The convergence diagnosis of Markov chains generated by the MCMC algorithm is assessed by the posterior sampling trend plot, negative ELBO histogram, and the energy graph, which show the observation of fast convergence of the MCMC sampling process. By applying the SPAR model to four different covariance structures through both Monte Carlo study and a real dataset, we illustrate its effectiveness in handling nonstationary forms of covariance structures and its domination over the traditional inverse-Wishart method.
It should be pointed out that the effectiveness and fast convergence of the MCMC algorithm depend on both model assumption and the priors of model parameters. Our Monte Carlo study was carried out by choosing normal priors for the model parameters and the inverse Gamma distribution for the variance components. This choice led to the easy implementation of the MCMC algorithm. It will be an interesting future research direction to develop some meaningful criteria for model and algorithm comparison in the area of Bayesian nonparametric longitudinal data analysis. The main purpose of our paper is to give an easily implementable approach to this area with a practical illustration. We can conclude that the complete semiparametric approach to Bayesian longitudinal data analysis in this paper is a significant complement to the area studied by some influential peers, such as Mukhopadhyay and Gelfand (1997, [21]), Quintana et al. (2016, [4]), and others.

Author Contributions

G.J. developed the Bayesian semiparametric method for longitudinal data. J.L. helped validate the methodology and finalize the manuscript. F.W. conducted major data analysis. X.C. conducted initial research in the Bayesian semiparametric modeling study on longitudinal data analysis under the guidance of G.J. in her master’s thesis [24]. H.L. helped with data analysis. J.C. found data and reference resources. S.C. and F.Z. finished the simulation study and also helped with data analysis and edited all figures. G.J. and J.J. wrote the initial draft. G.J. and F.W. completed the final writing—review and editing. All authors have read and agreed to the final version of the manuscript.

Funding

The research was supported by National Natural Science Foundation of China Under Grant No.41271 038, Jiajuan Liang’s research is supported by a UIC New Faculty Start-up Research Fund R72021106, and in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (UIC), project code 2022B1212010006.

Data Availability Statement

The real data and Python code presented in the present study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Derivations of Conditional Probability Distributions in Section 7.1

p ( β | ) p ( y | β , b , S , σ 2 ) p ( β ) = i = 1 n j = 1 m f ( y i j | β , b i , s i j , σ 2 ) p ( β ) = exp 1 2 σ 2 i = 1 n j = 1 m ( y i j β b i s i j ) 2 ( β μ 0 ) 2 2 σ 0 2 · ( 2 π ) ( n m + 1 ) 2 σ n m σ 0 1 exp 1 2 i = 1 n j = 1 m ( y i j β b i s i j ) 2 σ 2 + ( β μ 0 ) 2 2 σ 0 2 ,
where
i = 1 n j = 1 m ( y i j β b i s i j ) 2 σ 2 + ( β μ 0 ) 2 σ 0 2 = n m σ 2 + 1 σ 0 2 β 2 2 1 σ 2 i = 1 n j = 1 m ( y i j b i s i j ) + μ 0 σ 0 2 β + 1 σ 2 i = 1 n j = 1 m ( y i j b i s i j ) 2 2 β μ 0 σ 0 2 .
Combining (A1) with (A2), we have
p ( β | ) exp 1 2 n m σ 2 + 1 σ 0 2 β 2 2 1 σ 2 i = 1 n j = 1 m ( y i j b i s i j ) + μ 0 σ 0 2 β .
Let
σ β 2 = n m σ 2 + 1 σ 0 2 , ߓ μ β = σ β 2 1 σ 2 i = 1 n j = 1 m ( y i j b i s i j ) + μ 0 σ 0 2 .
The complete posterior distribution of β can be expressed as
β N ( μ β , σ β 2 ) .
The conditional distribution of σ 2 is computed as follows:
p ( σ 2 | ) p ( y | β , b , S , σ 2 ) p ( σ 2 ) = exp 1 σ 2 1 2 i = 1 n j = 1 m ( y i j β b i s i j ) 2 + β 0 · ( 2 π ) n m 2 β 0 α 0 Γ ( α 0 ) ( σ 2 ) n m + 2 α 0 + 2 2 ( σ 2 ) n m + 2 α 0 + 2 2 exp 1 σ 2 1 2 i = 1 n j = 1 m ( y i j β b i s i j ) 2 + β 0 .
This shows that the posterior distribution of σ 2 is an inverse gamma distribution. We use the following notation:
σ 2 I G n m 2 + α 0 , 1 2 i = 1 n j = 1 m ( y i j β b i s i j ) 2 + β 0 .
The conditional distribution of b i is computed as follows:
p ( b i | ) f ( y i | β , b i , S i , σ 2 ) p ( b i ) = ( 2 π ) m + 1 2 σ m σ b 1 exp 1 2 σ 2 j = 1 m ( y i j β b i s i j ) 2 b i 2 2 σ b 2 exp 1 2 j = 1 m ( y i j β b i s i j ) 2 σ 2 + b i 2 σ b 2 ,
where
j = 1 m ( y i j β b i s i j ) 2 σ 2 + b i 2 σ b 2 = 1 σ 2 j = 1 m ( y i j β b i s i j ) 2 + m b i 2 σ 2 2 b i σ 2 j = 1 m ( y i j β s i j ) + b i 2 σ b 2 = m σ 2 + 1 σ b 2 b i 2 2 σ 2 j = 1 m ( y i j β b i s i j ) b i .
Combining (A8) with (A9), we have
exp 1 2 j = 1 m ( y i j β b i s i j ) 2 σ 2 + b i 2 σ b 2 exp 1 2 m σ 2 + 1 σ b 2 b i 2 2 1 σ 2 j = 1 m ( y i j β b i s i j ) b i .
Let
σ b i 2 = m σ 2 + 1 σ b 2 , μ b = σ b 2 σ 2 j = 1 m ( y i j β b i s i j ) .
The posterior distribution of b i is given by
b i N ( μ b , σ b 2 ) .
/The conditional distribution of S i = ( s i 1 , , s i m ) is obtained by
p ( S i | ) p ( y i | β , b i , S i , σ 2 ) p ( S i | σ s 2 , ρ i ) .
It is necessary to compute the conditional distribution of each component of S i separately:
p ( s i 1 | y , β , b , σ 2 , σ s 2 , ρ ) = p ( y | β , b , s i 1 , σ 2 ) p ( b ) p ( s i 1 | σ s 2 , ρ ) p ( β , σ 2 , σ s 2 , ρ ) p ( y | β , b , σ 2 ) p ( b ) p ( β , σ 2 , σ s 2 , ρ ) = i = 1 n f ( y i 1 | β , b , s i 1 , σ 2 ) p ( s i 1 | σ s 2 , ρ ) = ( 2 π ) n + 1 2 σ n σ s 1 exp 1 2 σ 2 i = 1 n ( y i 1 β b i s i 1 ) 2 s i 1 2 2 σ s 2 exp 1 2 i = 1 n ( y i 1 β b i s i 1 ) 2 σ 2 + s i 1 2 2 σ s 2 exp 1 2 n σ 2 + 1 σ s 2 s i 1 2 2 σ 2 i = 1 n ( y i 1 β b i ) s i 1 .
Let
σ s i 1 2 = n σ 2 + 1 σ s 2 , μ s i 1 = σ s i 1 2 σ 2 i = 1 n ( y i 1 β b i ) .
We obtain the posteriori distribution for s i 1 :
s i 1 N ( μ s i 1 , σ s i 1 2 ) .
Similarly, the posterior distribution of s i 2 is obtained by
p ( s i 2 | ) i = 1 n p y i 2 | β , b , s i 2 , σ 2 p s i 2 | s ^ i 1 , σ s 2 , ρ = exp 1 2 σ 2 i = 1 n ( y i 2 β b i s i 2 ) 2 exp s i 2 s ^ i 1 ρ i | t i 2 t i 1 | 2 2 σ s 2 1 ρ i 2 | t i 2 t i 1 | · ( 2 π ) n + 1 2 σ n σ s 1 1 ρ i | t i 2 t i 1 | 1 exp 1 2 σ 2 i = 1 n ( y i 2 β b i s i 2 ) 2 exp s i 2 s ^ i 1 ρ i | t i 2 t i 1 | 2 2 σ s 2 1 ρ i 2 | t i 2 t i 1 | = exp 1 2 1 σ 2 i = 1 n ( y i 2 β b i ) 2 + n s i 2 2 σ 2 2 s i 2 σ 2 i = 1 n ( y i 2 β b i ) · exp 1 2 s i 2 2 + s ^ i 1 ρ i | t i 2 t i 1 | 2 2 s i 2 s ^ i 1 ρ i | t i 2 t i 1 | σ s 2 1 ρ i 2 | t i 2 t i 1 | .
After removing the constant term, the posterior distribution of s i 2 is proportional to
exp 1 2 n σ 2 + 1 σ s 2 1 ρ i 2 | t i 2 t i 1 | s i 2 2 2 1 σ 2 i = 1 n ( y i 2 β b i ) + s ^ i 1 ρ i | t i 2 t i 1 | σ s 2 1 ρ i 2 | t i 2 t i 1 | s i 2 .
Let
σ s i 2 2 = n σ 2 + 1 σ s 2 1 ρ i 2 | t i 2 t i 1 | μ s i 2 = σ s i 2 2 σ s 2 σ 2 i = 1 n ( y i 2 β b i ) + s ^ i 1 ρ i | t i 2 t i 1 | σ s 2 1 ρ i 2 | t i 2 t i 1 | .
We obtain the posteriori distribution for s i 2 :
s i 2 N ( μ s i 2 , σ s i 2 2 ) .
Similarly, the posteriori distribution of s i j ( j = 1 , , m ) can be obtained:
μ s i j = σ s i j 2 1 σ 2 i = 1 n ( y i j β b i ) + s ^ i , j 1 ρ i | t i j t i , j 1 | σ s 2 1 ρ i 2 | t i j t i , j 1 | σ s i j 2 = n σ 2 + 1 σ s 2 1 ρ i 2 | t i j t i , j 1 | s i j N ( μ s i j , σ s i j 2 ) .
Combining the above derivations, we can obtain the conditional distribution of σ s 2 as follows:
p ( σ s 2 | ) f ( S | σ s 2 , ρ ) p ( σ s 2 ) = i = 1 n f ( s i 1 ) f ( s i 2 | s i 1 ) f ( s i m | s i 1 , s i 2 , , s i , m 1 ) = ( 2 π ) m 2 σ s m j = 2 m 1 ρ i 2 | t i j t i , j 1 | β 1 α 1 Γ ( α 1 ) ( σ s 2 ) α 1 1 exp β 1 σ s 2 · exp 1 2 σ s 2 s i 1 2 + j = 2 m s i j s ^ i , j 1 ρ i | t i j t i , j 1 | 2 1 ρ i 2 | t i j t i , j 1 | = exp 1 σ s 2 1 2 s i 1 2 + j = 2 m s i j s ^ i , j 1 ρ i | t i j t i , j 1 | 2 1 ρ i 2 | t i j t i , j 1 | + β 1 · ( 2 π ) m 2 β 1 α 1 Γ ( α 1 ) j = 2 m 1 ρ i 2 | t i j t i , j 1 | ( σ s 2 ) m 2 α 1 1 ( σ s 2 ) m 2 α 1 1 exp 1 σ s 2 1 2 s i 1 2 + j = 2 m s i j s ^ i , j 1 ρ i | t i j t i , j 1 | 2 1 ρ i 2 | t i j t i , j 1 | + β 1 .
The posterior distribution of σ s 2 is actually an inverse gamma distribution:
σ s 2 I G m 2 + α 1 , 1 2 s i 1 2 + j = 2 m s i j s ^ i , j 1 ρ i | t i j t i , j 1 | 2 1 ρ i 2 | t i j t i , j 1 | + β 1 .
The posterior distribution of ρ i is obtained as follows:
p ( ρ i | ) = p ( ρ i | S , σ s 2 , ρ i ) = p ( ρ i | S i , σ s 2 , ρ i ) = p ( S i | σ s 2 , ρ i ) p ( ρ i | ρ i ) p ( σ s 2 , ρ i ) p ( S i | σ s 2 ) p ( σ s 2 , ρ i ) f ( S i | σ s 2 , ρ i ) p ( ρ i | ρ i ) ,
which can be expressed as follows:
p ( ρ i | S i , σ s 2 , ρ i ) j i q i j δ ρ j + r i H i , q i j = b f ( S i | σ s 2 , ρ j ) r i = b α f ( S i | σ s 2 , ρ ) g 0 ( ρ ) d ρ , H i f ( S i | σ s 2 , ρ ) g 0 ( ρ ) f ( S i | σ s 2 , ρ ) g 0 ( ρ ) d ρ ,
where g 0 is the probability density function of the base distribution G 0 , b is the constant satisfying the equation j i q i j + r i = 1 , and H i is the marginal distribution of the parameter ρ based on the prior G 0 and the variable S i .

References

  1. Pullenayegum, E.M.; Lim, L.S. Longitudinal data subject to irregular observation: A review of methods with a focus on visit processes, assumptions, and study design. Statist. Methods Med. Res. 2016, 25, 2992–3014. [Google Scholar] [CrossRef] [PubMed]
  2. Chakraborty, R.; Banerjee, M.; Vemuri, B.C. Statistics on the space of trajectories for longitudinal data analysis. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia, 18–21 April 2017; pp. 999–1002. [Google Scholar]
  3. Xiang, D.; Qiu, P.; Pu, X. Nonparametric regession analysis of multivariate longitudinal data. Stat. Sin. 2013, 23, 769–789. [Google Scholar]
  4. Quintana, F.A.; Johnson, W.O.; Waetjen, L.E.; BGold, E. Bayesian nonparametric longitudinal data analysis. J. Am. Statist. Assoc. 2016, 111, 1168–1181. [Google Scholar] [CrossRef]
  5. Cheng, L.; Ramchandran, S.; Vatanen, T.; Lietzén, N.; Lahesmaa, R.; Vehtari, A.; Lähdesmäki, H. An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nat. Commun. 2019, 10, 1798. [Google Scholar] [CrossRef]
  6. Kleinman, K.P.; Ibrahim, J.G. A semiparametric Bayesian approach to the random effects model. Biometrics 1998, 54, 921–938. [Google Scholar] [CrossRef] [PubMed]
  7. Gao, F.; Zeng, D.; Couper, D.; Lin, D.Y. Semiparametric regression analysis of multiple right- and interval-censored events. J. Am. Statist. Assoc. 2019, 114, 1232–1240. [Google Scholar] [CrossRef] [PubMed]
  8. Lee, Y. Semiparametric regression. J. Am. Statist. Assoc. 2006, 101, 1722–1723. [Google Scholar] [CrossRef]
  9. Sun, Y.; Sun, L.; Zhou, J. Profile local linear estimation of generalized semiparametric regression model for longitudinal data. Lifetime Data Anal. 2013, 19, 317–349. [Google Scholar] [CrossRef] [PubMed]
  10. Zeger, S.L.; Diggle, P.J. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 1994, 50, 689–699. [Google Scholar] [CrossRef] [PubMed]
  11. Li, Y.; Lin, X.; Müller, P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics 2010, 66, 70–78. [Google Scholar] [CrossRef] [PubMed]
  12. Hunag, Y.X. Quantile regression-based Bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. J. Statist. Comput. Simul. 2016, 86, 1183–1202. [Google Scholar] [CrossRef]
  13. Li, J.; Zhou, J.; Zhang, B.; Li, X.R. Estimation of high dimensional covariance matrices by shrinkage algorithms. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 955–962. [Google Scholar]
  14. Doss, H.; Park, Y. An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes. Ann. Statist. 2018, 46, 1630–1663. [Google Scholar] [CrossRef]
  15. Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 2000, 9, 249–265. [Google Scholar]
  16. Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  17. Csiszar, I. I-Divergence geometry of probability distributions and minimization problems. Ann. Probab. 1975, 3, 146–158. [Google Scholar] [CrossRef]
  18. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; pp. 63–71. [Google Scholar]
  19. Barndorff-Nielsen, O.E.; Shephard, N. Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J. Roy. Statist. Soc. (Ser. B) 2001, 63, 167–241. [Google Scholar] [CrossRef]
  20. Griffin, J.E.; Steel, M.F.J. Order-based dependent Dirichlet processes. J. Am. Statist. Assoc. 2006, 101, 179–194. [Google Scholar] [CrossRef]
  21. Mukhopadhyay, S.; Gelfand, A.E. Dirichlet process mixed generalized linear models. J. Am. Statist. Assoc. 1997, 92, 633–639. [Google Scholar] [CrossRef]
  22. Zimmerman, D.L.; Núñez-Antón, V.A. Antedependence Models for Longitudinal Data; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  23. Hsu, C.W.; Sinay, M.S.; Hsu, J.S. Bayesian estimation of a covariance matrix with flexible prior specification. Ann. Inst. Statist. Math. 2012, 64, 319–342. [Google Scholar] [CrossRef]
  24. Chen, X. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Master’s Thesis, Lanzhou University, Lanzhou, China, 2019. (In Chinese). [Google Scholar]
Figure 1. Bayesian semiparametric autoregression model (SPAR model).
Figure 1. Bayesian semiparametric autoregression model (SPAR model).
Axioms 12 00431 g001
Figure 2. Example of an AR(1) process.
Figure 2. Example of an AR(1) process.
Axioms 12 00431 g002
Figure 3. Example of an OU process.
Figure 3. Example of an OU process.
Axioms 12 00431 g003
Figure 4. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed for p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The orange line stands for sampling distribution under the SPAR model (18), and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Each graph in the right column shows the relationship between each estimated parameter (the ordinate) versus the number of random samples in the abscissa, ranging from 1 to 1000. Each graph in the left column presents the kernel-estimated density function of the parameter from the last 1000 samples.
Figure 4. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed for p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The orange line stands for sampling distribution under the SPAR model (18), and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Each graph in the right column shows the relationship between each estimated parameter (the ordinate) versus the number of random samples in the abscissa, ranging from 1 to 1000. Each graph in the left column presents the kernel-estimated density function of the parameter from the last 1000 samples.
Axioms 12 00431 g004
Figure 5. Negative ELBO loss histogram: in Figure 5, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 5. Negative ELBO loss histogram: in Figure 5, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g005
Figure 6. Energy graph: in Figure 6, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 6. Energy graph: in Figure 6, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g006
Figure 7. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 7 have the same structure in both axes for the two columns of graphs.
Figure 7. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 7 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g007
Figure 8. Negative ELBO loss histogram: in Figure 8, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 8. Negative ELBO loss histogram: in Figure 8, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g008
Figure 9. Energy graph: In Figure 9, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 9. Energy graph: In Figure 9, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g009
Figure 10. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 10 have the same structure in both axes for the two columns of graphs.
Figure 10. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 10 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g010
Figure 11. Negative ELBO loss histogram: in Figure 11, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 11. Negative ELBO loss histogram: in Figure 11, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g011
Figure 12. Energy graph: in Figure 12, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 12. Energy graph: in Figure 12, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g012
Figure 13. Sampling results under the independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 13 have the same structure in both axes for the two columns of graphs.
Figure 13. Sampling results under the independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 13 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g013aAxioms 12 00431 g013b
Figure 14. Negative ELBO loss histogram: in Figure 14, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 14. Negative ELBO loss histogram: in Figure 14, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g014
Figure 15. Energy graph: in Figure 15, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 15. Energy graph: in Figure 15, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g015
Figure 16. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 16 have the same structure in both axes for the two columns of graphs.
Figure 16. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 16 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g016
Figure 17. Negative ELBO loss histogram: in Figure 17, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 17. Negative ELBO loss histogram: in Figure 17, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g017
Figure 18. Energy graph: In Figure 18, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 18. Energy graph: In Figure 18, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g018
Figure 19. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 19 have the same structure in both axes for the two columns of graphs.
Figure 19. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 19 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g019aAxioms 12 00431 g019b
Figure 20. Negative ELBO loss histogram: in Figure 20, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 20. Negative ELBO loss histogram: in Figure 20, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g020
Figure 21. Energy graph: in Figure 21, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 21. Energy graph: in Figure 21, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g021
Figure 22. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 22 have the same structure in both axes for the two columns of graphs.
Figure 22. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 22 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g022
Figure 23. Negative ELBO loss histogram: in Figure 23,the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 23. Negative ELBO loss histogram: in Figure 23,the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g023
Figure 24. Energy graph: in Figure 24, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 24. Energy graph: in Figure 24, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g024
Figure 25. Sampling results under an independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 25 have the same structure in both axes for the two columns of graphs.
Figure 25. Sampling results under an independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 25 have the same structure in both axes for the two columns of graphs.
Axioms 12 00431 g025aAxioms 12 00431 g025b
Figure 26. Negative ELBO loss histogram: in Figure 26, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Figure 26. Negative ELBO loss histogram: in Figure 26, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100 , the vertical axis for the negative ELBO loss.
Axioms 12 00431 g026
Figure 27. Energy graph: in Figure 27, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Figure 27. Energy graph: in Figure 27, the estimated distribution of energy is based on 1000 samples with size n = 100 .
Axioms 12 00431 g027
Table 1. Longitudinal data structures.
Table 1. Longitudinal data structures.
Observation ObjectNumber of Observations
1 k
1 t 11 , x 1 , 11 , , x p , 11 , y 11 t 1 k , x 1 , 1 k , , x p , 1 k , y 1 k
 ⋮ ⋮  ⋮
i t i 1 , x 1 , i 1 , , x p , i 1 , y i 1 t i k , x 1 , i k , , x p , i k , y i k
 ⋮ ⋮  ⋮
n t n 1 , x 1 , n 1 , , x p , n 1 , y n 1 t n k , x 1 , n k , , x p , n k , y n k
Note: In Table 1, t i j stands for the j-th observation time of the individual i; ( x 1 , i j , , x p , i j ) for the p-dimensional covariate of individual i at t i j ; and y i j for the j-th observation of individual i.
Table 2. Estimated RMSE, L 1 , and L 2 based on covariance matrix Σ .
Table 2. Estimated RMSE, L 1 , and L 2 based on covariance matrix Σ .
Loss FunctionModelC1C2C3C4
RMSESPAR0.99600.10370.89951.5358
Inv-W2.96912.93523.48043.3285
L 1 SPAR0.18100.12011.32170.9558
Inv-W1.83891.46323.05161.2926
L 2 SPAR0.12890.13451.75750.6044
Inv-W0.24050.95762.80831.7565
C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; C4 represents independent structure covariance; and Inv-W = inverse-Wishart.
Table 3. Estimated RMSE, L 1 , and L 2 based on correlation matrix R .
Table 3. Estimated RMSE, L 1 , and L 2 based on correlation matrix R .
Loss FunctionModelC1C2C3C4
RMSESPAR0.16281.14170.37750.1542
Inv-W0.23082.41520.91150.2771
L 1 SPAR0.09560.54820.06311.2354
Inv-W0.15191.14240.17561.9137
L 2 SPAR0.10130.98090.49370.5854
Inv-W0.12381.19350.52050.6484
C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; and C4 represents independent structure covariance.
Table 4. Estimated RMSE, L 1 , and L 2 based on covariance matrix Σ .
Table 4. Estimated RMSE, L 1 , and L 2 based on covariance matrix Σ .
Loss FunctionModelC1C2C3C4
RMSESPAR0.08310.09620.088370.0981
Inv-W0.41200.68260.31810.4064
L 1 SPAR0.13600.19960.14650.2508
Inv-W46.674316.896719.026014.3543
L 2 SPAR0.08700.08260.20840.3517
Inv-W661.4643142.3157450.6686340.6994
C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; C4 represents independent structure covariance; and Inv-W = inverse-Wishart.
Table 5. Estimated RMSE, L 1 , and L 2 based on correlation matrix R .
Table 5. Estimated RMSE, L 1 , and L 2 based on correlation matrix R .
Loss FunctionModelC1C2C3C4
RMSESPAR0.01870.53590.19760.1985
Inv-W0.65850.64860.65150.7580
L 1 SPAR0.00660.16700.06620.1846
Inv-W0.09684.82450.31740.8035
L 2 SPAR0.00360.01320.00800.0939
Inv-W0.00791.36830.12020.3488
C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; and C4 represents independent structure covariance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiao, G.; Liang, J.; Wang, F.; Chen, X.; Chen, S.; Li, H.; Jin, J.; Cai, J.; Zhang, F. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Axioms 2023, 12, 431. https://doi.org/10.3390/axioms12050431

AMA Style

Jiao G, Liang J, Wang F, Chen X, Chen S, Li H, Jin J, Cai J, Zhang F. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Axioms. 2023; 12(5):431. https://doi.org/10.3390/axioms12050431

Chicago/Turabian Style

Jiao, Guimei, Jiajuan Liang, Fanjuan Wang, Xiaoli Chen, Shaokang Chen, Hao Li, Jing Jin, Jiali Cai, and Fangjie Zhang. 2023. "Longitudinal Data Analysis Based on Bayesian Semiparametric Method" Axioms 12, no. 5: 431. https://doi.org/10.3390/axioms12050431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop