Next Article in Journal
The Sufficient Conditions for Orthogonal Matching Pursuit to Exactly Reconstruct Sparse Polynomials
Next Article in Special Issue
Bayesian Inference Algorithm for Estimating Heterogeneity of Regulatory Mechanisms Based on Single-Cell Data
Previous Article in Journal
On Strictly Positive Fragments of Modal Logics with Confluence
Previous Article in Special Issue
Mixture Modeling of Time-to-Event Data in the Proportional Odds Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data

1
School of Mathematics, Yunnan Normal University, Kunming 650092, China
2
School of Mathematics, Minnan Normal University, Zhangzhou 363000, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(19), 3702; https://doi.org/10.3390/math10193702
Submission received: 16 August 2022 / Revised: 26 September 2022 / Accepted: 28 September 2022 / Published: 10 October 2022
(This article belongs to the Special Issue Recent Advances in Computational Statistics)

Abstract

:
For analyzing multiple events data, the illness death model is often used to investigate the covariate–response association for its easy and direct interpretation as well as the flexibility to accommodate the within-subject dependence. The resulting estimation and inferential procedures often depend on the subjective specification of the parametric frailty distribution. For certain frailty distributions, the computation can be challenging as the estimation involves both the nonparametric component and the parametric component. In this paper, we develop efficient computational methods for analyzing semi-competing risks data in the illness death model with the general frailty, where the Minorization–Maximization (MM) principle is employed for yielding accurate estimation and inferential procedures. Simulation studies are conducted to assess the finite-sample performance of the proposed method. An application to a real data is also provided for illustration.

1. Introduction

In biomedical studies, one subject may experience multiple types of event times. For example, a subject can experience a terminal event such as death and a non-terminal event such as disease recurrence where the terminal event can censor the non-terminal event but not vice versa. The two event times are usually dependent. This gives rise to semi-competing risks data in which the follow-up of a non-terminal event could be stopped by a terminal event [1]. Such dependency can be modeled by copula; for instance, Ref. [1] investigated the degree of association between the two events under the Clayton copula model, and [2] extended their works to a class of more general copula models and established the asymptotic normality of this estimator. In addition, the Archimedean copula was applied by [3] to model such dependency.
In addition to the copula approach, the semi-competing risk data can be modeled by illness–death model by using the shared frailty to describe the dependence of two events’ time. Many researchers ([4,5,6,7,8]) have developed semi-parametric frailty models to analyze the semi-competing risks data, and semi-parametric regression analyses ([9,10]) were also discussed by many researchers. In particular, Ref. [4] proposed a marginal-likelihood approach under the semi-parametric gamma frailty model. Ref. [11] proposed a joint frailty–copula model in the meta-analysis of individual patient data with semi-competing risks. Moreover, Refs. [5,6] proposed semi-parametric Bayesian approaches for the semi-competing risks data. Unlike the classical likelihood, which only involves fixed parameters, the hierarchical likelihood is constructed for both fixed parameters and unobserved frailties at the same time for the semi-parametric shared frailty model by [7]. In addition, Ref. [12] proposed a broad class of semi-parametric transformation models with random effects for the joint analysis of recurrent events and a terminal event. Ref. [8] developed a novel semi-parametric transformation model for the analysis of semi-competing risks data. In general, the computation involved in the semi-parametric frailty models is usually intensive, and the essential cause comes from the calculation of multiple non-parametric baseline cumulative hazard functions and intractable integrals over the frailty distributions.
Inferences for such semi-competing risks data are generally focused on the covariate treatment effects on the rates of terminal and non-terminal events and the association between the two types of events ([9,13,14,15]). Some recent research also focused on the parameter estimation procedure of semi-competing risk models.
Ref. [16] provided R packages with many algorithms to deal with independent and cluster-correlated semi-competing risk data. Ref. [17] applied the Bayesian approach to their proposed algorithm to conduct variable selection along with the parameter estimation, but their method is computationally intensive. Similarly, a penalized approach has been applied to this model by [18], and they applied a more efficient proximal gradient method which requires a proper warm start. In addition, Ref. [19] proposed an MCEM scheme to update the parameter estimates. The drawback of this method is that the performance of estimation is highly reliant on the MCMC sample approximates in the E-step, which also requires an accurate starting value. Different from their methods, for the parameter estimation part, we propose applying the MM algorithm in order to achieve better computational efficiency.
The MM algorithm is an important and powerful tool for optimization problems because it can simplify a difficult optimization problem by decomposing a high-dimensional objective function into separable low-dimensional functions. So far, the MM algorithm has a broad range of applications in the field of statistics, such as proportional odds model ([20]), the shared frailty model ([21]), the quantile regression ([22]), and so on. In this paper, we used the MM principle and proposed a profile MM algorithm for the semi-competing risk shared frailty model, which facilitates its pertinent use in high-dimensional situations.
The rest of the paper is organized as follows. In Section 2, we introduce the semi-competing risk shared frailty model. Section 3 presents the estimation procedures based on the MM method. In Section 4, we provide two simulation studies to assess their practical performance. Section 5 illustrates the method by a real data analysis. Some concluding remarks and discussions are given in Section 6.

2. The Semi-Competing Risk Model with Gamma Frailty

Let C i , T i 1 , and T i 2 be the censoring, non-terminal, and terminal event times for the i-th subject, respectively, i = 1 , , n . Denote by x i a p-dimensional vector of covariates of the i-th subject. Assume that the censoring time C i is independent of T i 1 and T i 2 given x i . The observations can be summarized as Y o b s = { Y i 1 = T i 1 Y i 2 , Y i 2 = T i 2 C i , δ i 1 = I ( T i 1 Y i 2 ) , δ i 2 = I ( T i 2 C i ) , x i ; i = 1 , n } . If the subject fails before the non-terminal event occurs, then we define T i 1 = . The illness–death multi-state model for semi-competing risks data has three states, which are characterized by three intensity or hazard functions:
λ 1 ( t 1 ) = lim Δ 0 P [ T 1 [ t 1 , t 1 + Δ ) | T 1 t 1 , T 2 t 1 ] / Δ , t 1 > 0 ,
λ 2 ( t 2 ) = lim Δ 0 P [ T 2 [ t 2 , t 2 + Δ ) | T 1 t 2 , T 2 t 2 ] / Δ , t 2 > 0 ,
λ 12 ( t 2 | t 1 ) = lim Δ 0 P [ T 2 [ t 2 , t 2 + Δ ) | T 1 = t 1 , T 2 t 2 ] / Δ , 0 < t 1 < t 2 .
Equations (2.1) and (2.2) are the usual rough hazard functions for the competing risk portion of the model where a non-terminal or terminal event occurs first. Then, Equation (2.3) defines the hazard rate of the terminal event following the occurrence of the non-terminal event. Usually, λ 12 ( t 2 | t 1 ) depend on both t 1 and t 2 . Followed by [4], since the dependence between non-terminal and terminal event times will be modeled by shared frailty later, here we assume a Markov process where the transition probability from the non-terminal event to the terminal event does not depend on the duration of the non-terminal event, i.e., λ 12 ( t 2 | t 1 ) = λ 12 ( t 2 ) only depends on t 2 .
In practice, Ref. [4] extended the classical semi-competing risk model by incorporating a common frailty (random effect) to model the dependency between non-terminal and terminal event times. That is, given the random effect ω and covariates x, the Cox-type multiplicative models can be expressed in the following form:
λ 1 ( t 1 | x , ω ) = ω λ 01 ( t 1 ) exp { β 1 x } , t 1 > 0 ,
λ 2 ( t 2 | x , ω ) = ω λ 02 ( t 2 ) exp { β 2 x } , t 2 > 0 ,
λ 12 ( t 2 | t 1 , x , ω ) = ω λ 03 ( t 2 ) exp { β 3 x } , 0 < t 1 < t 2 ,
where λ 01 ( t 1 ) , λ 02 ( t 2 ) , and λ 03 ( t 2 ) are three baseline hazard functions and ω is a subject-specific random effect or frailty. Usually, the frailties ω i ( i = 1 , , n ) are assumed to be independent and identically distributed with a density function with a frailty parameter θ . The common distributions assumed for ω are Gamma ( 1 / θ , 1 / θ ), Inverse Gaussian ( θ , θ 2 ), and Log-normal (0, θ ). Since the likelihood of semi-competing risk model with Gamma frailty has the explicit form of expression, we assume that the frailty ω has a Gamma distribution with mean 1, variance θ in the following section, and only show the estimation procedures for the semi-competing risk models (2.4)–(2.6) with ω Gamma ( 1 / θ , 1 / θ ), i.e.,
g ( ω ) = ω ( 1 / θ 1 ) exp ( ω / θ ) Γ ( 1 / θ ) θ 1 / θ , θ > 0 .
Note that θ measures the dependence between non-terminal and terminal event times and a larger θ indicates a stronger dependence.
For simplicity of expression, we sum up the regression parameters by β = ( β 1 , β 2 , β 3 ) and summarize the three baseline hazard by Λ 0 = ( Λ 01 , Λ 02 , Λ 03 ) . Then, the parameters of the semi-competing risk model consists of three parts, and we sum up all parameters by α = ( θ , β , Λ 0 ) .

3. The Estimation via MM Method

3.1. Philosophy of the MM Principle

Assume arg max α Θ ( α | Y o b s ) is our maximization problem, where ( α | Y o b s ) is the objective log-likelihood function, α = ( α 1 , , α q ) T Θ are the vector of parameters to be estimated, and Θ is the parameter space.
For such maximization problems, the MM principle provides a general frame for constructing iterative algorithms with monotone convergence which involves two M steps. The first M step aims to construct a surrogate function Q ( α | α ( k ) ) by a series of algebraic inequalities under the following conditions:
Q ( α | α ( k ) ) ( α | Y o b s ) , α , α ( k ) Θ , Q ( α ( k ) | α ( k ) ) = ( α ( k ) | Y o b s ) ,
where α ( k ) denotes the current estimate of α in the k-th iteration. Once the surrogate function is constructed, the second M step is to maximize the surrogate function Q ( · | α ( k ) ) instead of the objective log-likelihood function ( α | Y o b s ) . Then, we update α ( k ) by α ( k + 1 ) as follows:
α ( k + 1 ) = arg max α Θ Q ( α | α ( k ) ) .

3.2. The Estimation Procedure

From Equations (2.4)–(2.6), the observed likelihood function of the semi-competing risk Gamma frailty model can be written as the form of
L ( θ , β , Λ 0 | Y o b s ) = i = 1 n ( λ 01 ( Y i 1 ) δ i 1 λ 02 ( Y i 2 ) δ i 2 ( 1 δ i 1 ) λ 03 ( Y i 2 ) δ i 1 δ i 2 ( 1 + θ ) δ i 1 δ i 2 × exp [ δ i 1 β 1 x i + δ i 2 ( 1 δ i 1 ) β 2 x i + δ i 1 δ i 2 β 3 x i ] × { 1 + θ [ Λ 01 ( Y i 1 ) e β 1 x i + Λ 02 ( Y i 2 ) e β 2 x i + Λ 03 ( Y i 1 , Y i 2 ) e β 3 x i ] } 1 θ δ i 1 δ i 2 ) ,
then, we have the objective log-likelihood function as follows:
l ( θ , β , Λ 0 | Y o b s ) = i = 1 n ( δ i 1 log λ 01 ( Y i 1 ) + δ i 2 ( 1 δ i 1 ) log λ 02 ( Y i 2 ) + δ i 1 δ i 2 log λ 03 ( Y i 2 ) + δ i 1 β 1 x i + δ i 2 ( 1 δ i 1 ) β 2 x i + δ i 1 δ i 2 β 3 x i + δ i 1 δ i 2 log ( 1 + θ ) ( 1 θ + δ i 1 + δ i 2 ) log { 1 + θ [ Λ 01 ( Y i 1 ) e β 1 x i + Λ 02 ( Y i 2 ) e β 2 x i + Λ 03 ( Y i 1 , Y i 2 ) e β 3 x i ] } ) .
Based on the MM principle, it is necessary to find a surrogate function for the objective log-likelihood function l ( θ , β , Λ 0 | Y o b s ) in (3.1). We first denote:
A i ( k ) = 1 + θ k Λ 01 ( k ) ( Y i 1 ) e β 1 ( k ) x i + Λ 02 ( k ) ( Y i 2 ) e β 2 ( k ) x i + Λ 01 ( k ) ( Y i 1 , Y i 2 ) e β 3 ( k ) x i
and utilize the supporting hyperplane inequality log ( x ) log ( x 0 ) x x 0 x 0 to deal with the last term of (3.1), then we have the temporary surrogate function as follows:
Q 1 ( θ , β , Λ 0 | θ ( k ) , β ( k ) , Λ 0 ( k ) ) = i = 1 n { δ i 1 log λ 01 ( Y i 1 ) + δ i 2 ( 1 δ i 1 ) log λ 02 ( Y i 2 ) + δ i 1 δ i 2 log λ 03 ( Y i 2 ) + δ i 1 β 1 x i + δ i 2 ( 1 δ i 1 ) β 2 x i + δ i 1 δ i 2 β 3 x i + δ i 1 δ i 2 log ( 1 + θ ) 1 θ log A i ( k ) + 1 A i ( k ) 1 Λ 01 ( Y i 1 ) e β 1 x i + Λ 02 ( Y i 2 ) e β 2 x i + Λ 03 ( Y i 1 , Y i 2 ) e β 3 x i A i ( k ) ( δ i 1 + δ i 2 ) θ [ Λ 01 ( Y i 1 ) e β 1 x i + Λ 02 ( Y i 2 ) e β 2 x i + Λ 03 ( Y i 1 , Y i 2 ) e β 3 x i ] A i ( k ) + c 1 } .
where c 1 is a constant. Following [23,24], we use the profile estimation method and first profile out Λ 01 , Λ 02 and Λ 03 from Q 1 ( θ , β , Λ 0 | θ ( k ) , β ( k ) , Λ 0 ( k ) ) for any given θ and β , this provides the estimate of Λ 01 , Λ 02 , and Λ 03 given θ and β as:
d Λ ^ 01 ( Y i 1 ) = δ i 1 j = 1 n I ( Y j 1 Y i 1 ) θ ( 1 θ + δ i 1 + δ i 2 ) e β 1 x j / A j ( k ) ,
d Λ ^ 02 ( Y i 2 ) = δ i 2 ( 1 δ i 1 ) j = 1 n I ( Y j 2 Y i 2 ) θ ( 1 θ + δ i 1 + δ i 2 ) e β 2 x j / A j ( k )
d Λ ^ 03 ( Y i 2 ) = δ i 2 δ i 1 j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) θ ( 1 θ + δ i 1 + δ i 2 ) e β 3 x j / A j ( k )
Substituting (3.2)–(3.4) into Q 1 ( θ , β , Λ 0 | θ ( k ) , β ( k ) , Λ 0 ( k ) ) , we have:
Q 2 ( θ , β | θ ( k ) , β ( k ) , Λ 0 ( k ) ) = i = 1 n { δ i 1 log j = 1 n I ( Y j 1 Y i 1 ) θ ( 1 θ + δ i 1 + δ i 2 ) e β 1 x j / A j ( k ) δ i 2 ( 1 δ i 1 ) log j = 1 n I ( Y j 2 Y i 2 ) θ ( 1 θ + δ i 1 + δ i 2 ) e β 2 x j / A j ( k ) δ i 1 δ i 2 log j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) θ ( 1 θ + δ i 1 + δ i 2 ) e β 3 x j / A j ( k ) + δ i 1 β 1 x i + δ i 2 ( 1 δ i 1 ) β 2 x i + δ i 1 δ i 2 β 3 x i + δ i 1 δ i 2 log ( 1 + θ ) 1 θ log A i ( k ) + 1 / A i ( k ) 1 + c 2 } ,
where c 2 is a constant. For ease of expression, we further denote:
B 1 i ( k ) = j = 1 n I ( Y j 1 Y i 1 ) θ ( k ) ( 1 θ ( k ) + δ i 1 + δ i 2 ) e β 1 x j / A j ( k ) , B 2 i ( k ) = j = 1 n I ( Y j 2 Y i 2 ) θ ( k ) ( 1 θ ( k ) + δ i 1 + δ i 2 ) e β 2 x j / A j ( k ) , B 2 i ( k ) = j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) θ ( k ) ( 1 θ ( k ) + δ i 1 + δ i 2 ) e β 3 x j / A j ( k ) .
and also use the supporting hyperplane inequality to deal with the first three terms of Q 2 ( θ , β | θ ( k ) , β ( k ) , Λ 0 ( k ) ) in Equation (3.5), we obtain the surrogate function:
Q 3 ( θ , β | θ ( k ) , β ( k ) , Λ 0 ( k ) ) = i = 1 n { δ i 1 B 1 i ( k ) j = 1 n I ( Y j 1 Y i 1 ) e β 1 x j A j ( k ) + ( δ j 1 + δ j 2 ) A j ( k ) θ e β 1 x j δ i 1 ( 1 δ i 1 ) B 2 i ( k ) j = 1 n I ( Y j 2 Y i 2 ) e β 2 x j A j ( k ) + ( δ j 1 + δ j 2 ) A j ( k ) θ e β 2 x j δ i 1 δ i 2 B 3 i ( k ) j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) e β 3 x j A j ( k ) + ( δ j 1 + δ j 2 ) A j ( k ) θ e β 3 x j + δ i 1 β 1 x i + δ i 2 ( 1 δ i 1 ) β 2 x i + δ i 1 δ i 2 β 3 x i + δ i 1 δ i 2 log ( 1 + θ ) 1 θ log A i ( k ) + 1 A i ( k ) 1 } .
To separate the parameters θ with each β 1 , β 2 , and β 3 , we further minorize the components θ exp ( β 1 x j ) , θ exp ( β 2 x j ) and θ exp ( β 3 x j ) in Q 3 ( θ , β | θ ( k ) , β ( k ) , Λ 0 ( k ) ) by the inequality:
θ exp ( β 1 x j ) θ ( k ) exp ( β 1 ( k ) x j ) 1 2 θ θ ( k ) 2 1 2 exp ( 2 β 1 x j ) exp ( 2 β 1 ( k ) x j ) ,
then we obtain the final surrogate function:
Q 4 ( θ , β | θ ( k ) , β ( k ) , Λ 0 ( k ) ) = i = 1 n { δ i 1 B 1 i ( k ) j = 1 n I ( Y j 1 Y i 1 ) e β 1 x j A j ( k ) δ i 2 ( 1 δ i 1 ) B 2 i ( k ) j = 1 n I ( Y j 2 Y i 2 ) e β 2 x j A j ( k ) δ i 1 δ i 2 B 3 i ( k ) j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) e β 3 x j A j ( k ) δ i 1 B 1 i ( k ) j = 1 n I ( Y j 1 Y i 1 ) δ j 1 + δ j 2 A j ( k ) exp ( β 1 ( k ) x j ) 2 θ ( k ) θ 2 + θ ( k ) 2 exp ( β 1 ( k ) x j ) exp ( 2 β 1 x j ) δ i 2 ( 1 δ i 1 ) B 2 i ( k ) j = 1 n I ( Y j 2 Y i 2 ) δ j 1 + δ j 2 A j ( k ) exp ( β 2 ( k ) x j ) 2 θ ( k ) θ 2 + θ ( k ) 2 exp ( β 2 ( k ) x j ) exp ( 2 β 2 x j ) δ i 1 δ i 2 B 3 i ( k ) j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) δ j 1 + δ j 2 A j ( k ) exp ( β 3 ( k ) x j ) 2 θ ( k ) θ 2 + θ ( k ) 2 exp ( β 3 ( k ) x j ) exp ( 2 β 3 x j ) + δ i 1 β 1 x i + δ i 2 ( 1 δ i 1 ) β 2 x i + δ i 1 δ i 2 β 3 x i + δ i 1 δ i 2 log ( 1 + θ ) 1 θ log A i ( k ) + 1 A i ( k ) 1 } = ^ Q 4 ( θ | α ( k ) ) + Q 4 ( β 1 | α ( k ) ) + Q 4 ( β 2 | α ( k ) ) + Q 4 ( β 3 | α ( k ) ) ,
where
Q 4 ( θ | α ( k ) ) = i = 1 n { δ i 1 δ i 2 log ( 1 + θ ) 1 θ log A i ( k ) + 1 A i ( k ) 1 θ 2 2 θ ( k ) [ δ i 1 B 1 i ( l ) j = 1 n I ( Y j 1 Y i 1 ) ( δ j 1 + δ j 2 ) exp ( β 1 ( k ) x j ) A j ( k ) + δ i 2 ( 1 δ i 1 ) B 2 i ( k ) j = 1 n I ( Y j 2 Y i 2 ) ( δ j 1 + δ j 2 ) exp ( β 2 ( k ) x j ) A j ( k ) + δ i 1 δ i 2 B 3 i ( k ) j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) ( δ j 1 + δ j 2 ) exp ( β 3 ( k ) x j ) A j ( k ) ] } , Q 4 ( β 1 | α ( k ) ) = i = 1 n δ i 1 β 1 x i δ i 1 B 1 i k j = 1 n I ( Y j 1 Y i 1 ) e β 1 x j A J ( k ) + θ ( k ) ( δ j 1 + δ j 2 ) exp ( 2 β 1 x j ) 2 A j ( k ) exp ( β 1 ( k ) ) , Q 4 ( β 2 | α ( k ) ) = i = 1 n δ i 2 ( 1 δ i 1 ) β 2 x i δ i 2 ( 1 δ i 1 ) B 2 i k j = 1 n I ( Y j 2 Y i 2 ) e β 2 x j A J ( k ) + θ ( k ) ( δ j 1 + δ j 2 ) exp ( 2 β 2 x j ) 2 A j ( k ) exp ( β 2 ( k ) ) , Q 4 ( β 3 | α ( k ) ) = i = 1 n δ i 1 δ i 2 β 3 x i δ i 1 δ i 2 B 3 i k j = 1 n I ( Y j 2 Y i 2 > Y j 1 ) e β 3 x j A j ( k ) + θ ( k ) ( δ j 1 + δ j 2 ) exp ( 2 β 3 x j ) 2 A j ( k ) exp ( β 3 ( k ) ) .
From (3.6), it can be seen that the frailty parameter θ and the regression parameters β 1 , β 2 , and β 3 are separated from each other. Accordingly, the resulting MM algorithm only involves a series of separated univariate optimizations in the next maximization step and matrix inversion is not needed. The Algorithm 1 is stated as follows.
Algorithm 1 The estimation procedures via MM method.
Input: ( θ ( 0 ) , β ( 0 ) , Λ 0 ( 0 ) )
k 1
while ϵ k > 1 e 7 do
    S1. Update θ ( k ) and β ( k ) via Equation (3.6) given ( θ ( k 1 ) , β ( k 1 ) , Λ 0 ( k 1 ) )
    S2. Estimate Λ 0 ( k ) using (3.2)–(3.4) given θ ( k ) and β ( k )
    S3.  ϵ k = l ( θ ( k ) , β ( k ) , Λ 0 ( k ) | Y o b s ) l ( θ ( k 1 ) , β ( k 1 ) , Λ 0 ( k 1 ) | Y o b s )
     k k + 1
end while
Output: ( θ ( k ) , β ( k ) , Λ 0 ( k ) )
The variance for the estimates in the model involves three non-parametric baseline transition functions, frailty variance, and regression coefficients as shown in [4]. Hence, there is no readily available plug-in formula for estimating it. However, the resampling method such as the bootstrap can be employed to calibrate it and construct confidence interval and the associate inferential procedures.

4. Simulation Study

To evaluate the finite sample performance of the proposed methods, we conducted the following simulation studies. As emphasized in the Section 2, the purpose of incorporating subject specific frailty terms is to account for dependence, which is not taken into account by the measured covariates.
Scenario 1: We first independently simulate n observations from the semi-competing risk models (2.4) and (2.6) with ω Gamma ( 1 / θ , 1 / θ ) . We consider one covariate X in the model which follows a standard normal distribution. The censoring times were generated from independent uniform distribution to yield censoring proportions of both a terminal and non-terminal event at around 30% or 50%. Let λ 01 ( t 1 ) = λ 02 ( t 2 ) = 1 , λ 03 ( t 2 ) = 2 , and β 1 = β 2 = β 3 = 0.5 . We choose the true value of θ from Ω θ = { 0.5 , 1 , 2 } and pick the sample size n from Ω n = { 250 , 500 } .
Based on 500 replications, the average values of the biases (Bias) for estimated frailty and regression parameters, their empirical standard deviations (SD), and the average computation times (T) are summarized in Table 1 and Table 2. From the results of Table 1 and Table 2, we find that the SD of each parameter is relatively small which indicating the effectiveness of MM algorithm for the illness–death model with gamma frailty. It can also be found that most the SDs of θ are larger than that of β 1 , β 2 , β 3 and Λ 01 , Λ 02 , Λ 03 . With the increase in the true values of θ , the SD of θ will increase accordingly, while the SD of θ will decrease with the increase in sample size. When the censoring proportion increases from 30% to 50%, the SD of θ is increased accordingly, but the SDs of other parameters, i.e., β 1 , β 2 , β 3 and Λ 01 , Λ 02 , Λ 03 , do not show a clear trend of change. A larger value of t h e t a requires longer computation time.
Furthermore, we plot the estimated cumulative baseline hazard function Λ 01 ( t ) and Λ 03 ( t ) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 01 ( t ) = t and Λ 03 ( t ) = 2 t (with black color) under the model with gamma frailty in Figure 1 and Figure 2 with sample size 500. Since both cumulative baseline hazard functions Λ 01 ( t ) and Λ 02 ( t ) have the same expression, here, we only plot the estimated cumulative baseline hazard function Λ 01 ( t ) .
Scenario 2: We then independently simulate n observations from the semi-competing risk models (2.4)–(2.6) with ω Log normal ( 0 , θ ) . Similarly, we consider one covariate X with standard normal distribution in the model. The censoring times were generated from independent uniform distribution to yield censoring proportions of non-terminal event at around 65% and terminal event at around 30%. Let λ 01 ( t 1 ) = 0.5 , λ 02 ( t 2 ) = 1 , λ 03 ( t 2 ) = 3 and β 1 = β 2 = 0.5 , β 3 = 1 . We choose the true value of θ from Ω θ = { 0.1 , 0.5 , 1 } and pick the sample size n from Ω n = { 250 , 500 } .
Based on 500 replications, the average values of the biases (Bias) of estimated frailty and regression parameters, their empirical standard deviations (SD), and the average computation times (T) are summarized in Table 3. From the results of Table 3, we find that the SDs of parameters β 1 , β 2 , β 3 and Λ 01 , Λ 02 , Λ 03 are relatively small, especially when the sample size is large. It can also be found that the SDs of θ are larger than that of other parameters. With the increase in the true values of θ , the SD of θ will increase accordingly, while the SD of θ will decrease with the increase in sample size. Compared to the model in Scenario 1, some parameters such as the frailty parameter in the semi-competing risk model with log-normal frailty are more difficult to estimate. Similar with Scenario 1, a larger value of θ requires longer computation time.

5. Real Data Analysis

Colon cancer is a serious type of cancer and has a high mortality rate. This type of cancer is not easy to cure since even if all the apparent diseased tissue can be surgically removed, there still exists other residual tumor parts which are not observable. Therefore, the cancer-recurrence event is commonly observed in clinical trials. For those patients who experience a recurrence, they will be subject to a higher risk of mortality. Thus, it is crucial to find the therapy which can significantly reduce the cancer-recurrence rate. In the following illustration, the colon cancer data provided by [25] are applied to our model. A total of 929 patients with Stage C disease are included for the modeling. Among them, 304 patients received levamisole plus fluorouracil, which is the covariate where the effect will be tested. A total of 468 patients developed recurrence, and 414 of them died. There are three stages recorded: State 1 stands for the stage that the patient is alive and disease-free. Stage 2 represents the condition that the patient is alive but with a recurrent diagnosis of cancer, while stage 3 denotes death. A general therapy for this type of cancer is the combination of levamisole and fluorouracil, which is denoted by the treatment Lev+5-FU [26]. Previous research has proven the effectiveness of this therapy on reducing the cancer-recurrence rate. We will show the significance of this covariate on increasing the state transition hazard rate under our model framework.
Similar to the simulation study, we report the estimate of β 1 , β 2 , and β 3 along with the corresponding baseline hazard. The observations ( X i , C i , T i ) from the original dataset are re-sampled with replacement in non-parametric way, where α ^ ( k ) = ( θ ^ ( k ) , β ^ 1 ( k ) , β ^ 2 ( k ) , β ^ 3 ( k ) ) is the parameter estimation result under k t h bootstrap. Let α ^ ¯ be the mean of estimated parameters with the number of resamples equals to r. The estimated standard error is:
SE = ( 1 r 1 k = 1 r ( α ^ ( k ) α ^ ¯ ) 2 ) 1 2 .
In addition, the 95% bootstrap CI from Table 4 is constructed using 2.5 % and 97.5 % quantiles of { α ^ ( k ) } k = 1 r with r = 1000 . As shown in Table 4, the treatment with therapy Lev+5-FU can significantly reduce the probability of occurrence of recurrent event. However, we can observe that this therapy has little effect to the decrease in mortality. Both β ^ 2 and β ^ 3 have a p-value around 0.4, which indicates the insignificance of these two parameters. The cumulative hazard rate for different transitions are presented by Figure 3 and it indicates that a patient who experiences the recurrence of this disease will have very high mortality rate in the following years. In addition, we also observe the high hazard rate for the recurrence of colon cancer, which shows the importance of therapies such as Lev+5-FU, which can significantly reduce the occurrence of such event.

6. Discussion

In this paper, we proposed an efficient MM algorithm for the non-parametric maximum likelihood estimation of the illness–death model with a gamma frailty. The paper is motivated by the feature of colon cancer data while the existing method proposed by [4] has a high computational cost on the parameter estimation process. The proposed MM method can decompose the high-dimensional optimization problem into a sum of univariate optimization problems by the construction of a simple surrogate function which provides accurate and efficient simulation results. This MM approach avoids matrix inversion and can provide a toolkit for developing more efficient algorithms in a broad range of statistical optimization problems. Therefore, the proposed MM algorithm can help to estimate the parameters from the semi-competing risk model more efficiently.
In this paper, the explicit form of the marginal likelihood for the illness–death model can be derived. Therefore, the iteration steps do not involve integration calculation. However, in general, the marginal likelihood of the illness–death model with shared frailty is usually hard to obtain as an explicit form of the marginal likelihood is not available. For example, if the illness–death model involves a log-normal frailty or with correlated frailties, the intractable integral of the shared frailty will lead to a more complicated marginal likelihood. Even in the case of gamma frailty, the complicated model setup with three covariates and three non-parametric baseline hazard causes difficulties in parameter estimation. The inclusion of intractable integrals given general shared frailty will lead to a much higher computational cost and lower estimation accuracy. Therefore, even though our method can be extended to handling the illness–death model with general frailty, further modifications should be made to the algorithm to improve the estimation efficiency.

Author Contributions

Data curation, X.H., J.X., H.G. and W.Z.; Formal analysis, X.H., J.X. and H.G.; Investigation, X.H. and J.S.; Methodology, J.X. and J.S.; Project administration, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fine, J.P.; Jiang, H.; Chappell, R. On semi-competing risks data. Biometrika 2001, 88, 907–919. [Google Scholar] [CrossRef]
  2. Wang, W. Estimating the association parameter for copula models under dependent censoring. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2003, 65, 257–273. [Google Scholar] [CrossRef]
  3. Lakhal, L.; Rivest, L.P.; Abdous, B. Estimating survival and association in a semicompeting risks model. Biometrics 2008, 64, 180–188. [Google Scholar] [CrossRef] [PubMed]
  4. Xu, J.; Kalbfleisch, J.D.; Tai, B. Statistical analysis of illness–death processes and semicompeting risks data. Biometrics 2010, 66, 716–725. [Google Scholar] [CrossRef] [Green Version]
  5. Zhang, Y.; Chen, M.H.; Ibrahim, J.G.; Zeng, D.; Chen, Q.; Pan, Z.; Xue, X. Bayesian gamma frailty models for survival data with semi-competing risks and treatment switching. Lifetime Data Anal. 2014, 20, 76–105. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Lee, K.H.; Haneuse, S.; Schrag, D.; Dominici, F. Bayesian semiparametric analysis of semicompeting risks data: Investigating hospital readmission after a pancreatic cancer diagnosis. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2015, 64, 253–273. [Google Scholar] [CrossRef] [Green Version]
  7. Ha, I.D.; Xiang, L.; Peng, M.; Jeong, J.H.; Lee, Y. Frailty modelling approaches for semi-competing risks data. Lifetime Data Anal. 2020, 26, 109–133. [Google Scholar] [CrossRef]
  8. Jiang, F.; Haneuse, S. A semi-parametric transformation frailty model for semi-competing risks survival data. Scand. J. Stat. 2017, 44, 112–129. [Google Scholar] [CrossRef] [Green Version]
  9. Chen, Y.H. Maximum likelihood analysis of semicompeting risks data with semiparametric regression models. Lifetime Data Anal. 2012, 18, 36–57. [Google Scholar] [CrossRef]
  10. Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med Res. 1997, 6, 38–54. [Google Scholar] [CrossRef]
  11. Wu, B.H.; Michimae, H.; Emura, T. Meta-analysis of individual patient data with semi-competing risks under the Weibull joint frailty–copula model. Comput. Stat. 2020, 35, 1525–1552. [Google Scholar] [CrossRef]
  12. Zeng, D.; Lin, D. Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics 2009, 65, 746–752. [Google Scholar] [CrossRef] [PubMed]
  13. Zhou, R.; Zhu, H.; Bondy, M.; Ning, J. Semiparametric model for semi-competing risks data with application to breast cancer study. Lifetime Data Anal. 2016, 22, 456–471. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Hsieh, J.J.; Huang, Y.T. Regression analysis based on conditional likelihood approach under semi-competing risks data. Lifetime Data Anal. 2012, 18, 302–320. [Google Scholar] [CrossRef]
  15. Peng, L.; Fine, J.P. Regression modeling of semicompeting risks data. Biometrics 2007, 63, 96–108. [Google Scholar] [CrossRef]
  16. Alvares, D.; Haneuse, S.; Lee, C.; Lee, K.H. SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data. R J. 2019, 11, 376. [Google Scholar] [CrossRef] [Green Version]
  17. Chapple, A.G.; Vannucci, M.; Thall, P.F.; Lin, S. Bayesian variable selection for a semi-competing risks model with three hazard functions. Comput. Stat. Data Anal. 2017, 112, 170–185. [Google Scholar] [CrossRef]
  18. Reeder, H.T.; Lu, J.; Haneuse, S. Penalized estimation of frailty-based illness-death models for semi-competing risks. arXiv 2022, arXiv:2202.00618. [Google Scholar] [CrossRef]
  19. Peng, M.; Xiang, L.; Wang, S. Semiparametric regression analysis of clustered survival data with semi-competing risks. Comput. Stat. Data Anal. 2018, 124, 53–70. [Google Scholar] [CrossRef]
  20. Hunter, D.R.; Lange, K. Computing estimates in the proportional odds model. Ann. Inst. Stat. Math. 2002, 54, 155–168. [Google Scholar] [CrossRef]
  21. Huang, X.; Xu, J.; Tian, G. On profile MM algorithms for gamma frailty survival models. Stat. Sin. 2019, 29, 895–916. [Google Scholar] [CrossRef] [Green Version]
  22. Hunter, D.R.; Lange, K. Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 2000, 9, 60–77. [Google Scholar]
  23. Johansen, S. An extension of Cox’s regression model. Int. Stat. Rev. Int. Stat. 1983, 51, 165–174. [Google Scholar] [CrossRef]
  24. Klein, J.P. Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 1992, 48, 795–806. [Google Scholar] [CrossRef] [PubMed]
  25. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Goodman, P.J.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H.; et al. Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. N. Engl. J. Med. 1990, 322, 352–358. [Google Scholar] [CrossRef]
  26. Laurie, J.A.; Moertel, C.G.; Fleming, T.R.; Wieand, H.S.; Leigh, J.E.; Rubin, J.; McCormack, G.W.; Gerstner, J.B.; Krook, J.E.; Malliard, J. Surgical adjuvant therapy of large-bowel carcinoma: An evaluation of levamisole and the combination of levamisole and fluorouracil. The North Central Cancer Treatment Group and the Mayo Clinic. J. Clin. Oncol. 1989, 7, 1447–1456. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Simulation results for estimated cumulative baseline hazard function Λ 01 ( t ) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 01 ( t ) = t (with black color) under the model with gamma frailty in Scenario 1 with sample size 500.
Figure 1. Simulation results for estimated cumulative baseline hazard function Λ 01 ( t ) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 01 ( t ) = t (with black color) under the model with gamma frailty in Scenario 1 with sample size 500.
Mathematics 10 03702 g001
Figure 2. Simulation results for estimated cumulative baseline hazard function Λ 03 ( t ) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 03 ( t ) = 2 t (with black color) under the model with gamma frailty in Scenario 1 with sample size 500.
Figure 2. Simulation results for estimated cumulative baseline hazard function Λ 03 ( t ) by the proposed method based on 20 replications (with red color) together with the true cumulative baseline hazard function Λ 03 ( t ) = 2 t (with black color) under the model with gamma frailty in Scenario 1 with sample size 500.
Mathematics 10 03702 g002
Figure 3. Cumulative hazard for different transitions in terms of days.
Figure 3. Cumulative hazard for different transitions in terms of days.
Mathematics 10 03702 g003
Table 1. The simulation results of the illness–death model with gamma frailty at 30% censoring proportion.
Table 1. The simulation results of the illness–death model with gamma frailty at 30% censoring proportion.
θ true Parameter n = 250 Parameter n = 500
BiasSDBiasSD
0.5 T = 0.1957 T = 0.6736
θ −0.0880.128 θ −0.0720.091
β 1 0.0450.173 β 1 −0.0320.066
β 2 0.0320.196 β 2 −0.0210.077
β 3 0.0440.118 β 3 −0.0230.087
Λ 01 ( 1 ) −0.0360.080 Λ 01 ( 1 ) −0.0240.063
Λ 02 ( 1 ) 0.0230.061 Λ 02 ( 1 ) −0.0130.044
Λ 03 ( 1 ) 0.0450.302 Λ 03 ( 1 ) −0.0380.227
1 T = 0.2576 T = 0.8667
θ 0.0980.173 θ 0.0880.135
β 1 −0.0580.096 β 1 −0.0320.089
β 2 0.0450.094 β 2 0.0250.073
β 3 −0.0430.133 β 3 −0.0220.084
Λ 01 ( 1 ) −0.0310.067 Λ 01 ( 1 ) −0.0220.045
Λ 02 ( 1 ) 0.0290.058 Λ 02 ( 1 ) −0.0210.034
Λ 03 ( 1 ) 0.0510.336 Λ 03 ( 1 ) 0.0310.284
2 T = 0.3157 T = 1.1273
θ 0.1010.184 θ 0.0980.171
β 1 0.0750.097 β 1 −0.0450.073
β 2 0.0560.107 β 2 −0.0420.069
β 3 0.0610.137 β 3 0.0510.086
Λ 01 ( 1 ) −0.0330.062 Λ 01 ( 1 ) −0.0290.05
Λ 02 ( 1 ) −0.0280.055 Λ 02 ( 1 ) −0.0240.04
Λ 03 ( 1 ) −0.0540.341 Λ 03 ( 1 ) −0.0450.246
Table 2. The simulation results of the illness–death model with gamma frailty at 50% censoring proportion.
Table 2. The simulation results of the illness–death model with gamma frailty at 50% censoring proportion.
θ true Parameter n = 250 Parameter n = 500
BiasSDBiasSD
0.5 T = 0.2250 T = 0.7070
θ −0.1480.132 θ −0.1140.104
β 1 −0.0290.107 β 1 −0.0240.078
β 2 −0.0360.116 β 2 −0.0310.083
β 3 −0.0160.175 β 3 −0.0140.113
Λ 01 ( 1 ) −0.0310.08 Λ 01 ( 1 ) 0.0290.055
Λ 02 ( 1 ) 0.0210.055 Λ 02 ( 1 ) 0.0220.039
Λ 03 ( 1 ) 0.0530.232 Λ 03 ( 1 ) −0.0480.173
1 T = 0.2966 T = 1.0153
θ −0.1610.176 θ −0.1430.147
β 1 −0.0320.120 β 1 −0.0280.083
β 2 0.0330.129 β 2 −0.0270.088
β 3 0.020.187 β 3 0.0180.126
Λ 01 ( 1 ) −0.0370.066 Λ 01 ( 1 ) −0.0340.048
Λ 02 ( 1 ) 0.0280.051 Λ 02 ( 1 ) −0.0250.035
Λ 03 ( 1 ) 0.0610.297 Λ 03 ( 1 ) −0.0550.201
2 T = 0.3883 T = 1.2907
θ 0.1730.194 θ −0.1610.172
β 1 0.0410.119 β 1 −0.0340.083
β 2 −0.0360.119 β 2 −0.0310.086
β 3 0.0250.184 β 3 0.0220.138
Λ 01 ( 1 ) −0.0410.063 Λ 01 ( 1 ) −0.0380.044
Λ 02 ( 1 ) −0.0320.047 Λ 02 ( 1 ) −0.0290.038
Λ 03 ( 1 ) −0.0680.293 Λ 03 ( 1 ) −0.0600.232
Table 3. The simulation results of the illness–death model with log-normal frailty.
Table 3. The simulation results of the illness–death model with log-normal frailty.
θ true Parameter n = 250 Parameter n = 500
BiasSDBiasSD
0.1 T = 0.4923 T = 1.3003
θ −0.1390.106 θ −0.1310.098
β 1 0.0300.114 β 1 0.0300.088
β 2 0.0350.084 β 2 0.0240.064
β 3 0.0750.143 β 3 0.0750.120
Λ 01 ( 1 ) 0.0040.067 Λ 01 ( 1 ) 0.0150.054
Λ 02 ( 1 ) −0.1040.084 Λ 02 ( 1 ) −0.1120.066
Λ 03 ( 1 ) −0.0500.489 Λ 03 ( 1 ) −0.0430.350
0.5 T = 0.9377 T = 2.6653
θ −0.1630.147 θ −0.1590.144
β 1 −0.0500.110 β 1 −0.0380.091
β 2 −0.0490.076 β 2 −0.0260.062
β 3 −0.1570.142 β 3 −0.1010.107
Λ 01 ( 1 ) −0.0280.058 Λ 01 ( 1 ) −0.0240.050
Λ 02 ( 1 ) −0.1550.071 Λ 02 ( 1 ) −0.1510.054
Λ 03 ( 1 ) 0.2620.569 Λ 03 ( 1 ) 0.2620.382
1 T = 1.2817 T = 3.6046
θ −0.1780.145 θ −0.1600.145
β 1 −0.0720.109 β 1 −0.0640.088
β 2 −0.0650.081 β 2 −0.0630.064
β 3 −0.2250.136 β 3 −0.1790.105
Λ 01 ( 1 ) −0.0720.060 Λ 01 ( 1 ) −0.0500.046
Λ 02 ( 1 ) −0.2270.072 Λ 02 ( 1 ) −0.1660.056
Λ 03 ( 1 ) 0.6190.694 Λ 03 ( 1 ) 0.4840.514
Table 4. The fitting results of colon cancer data.
Table 4. The fitting results of colon cancer data.
ParameterEst.SEp-Value95% Bootstrap CI
θ 71.563.50 < 0.001 [66.97, 75.62]
β 1 −0.7350.180 < 0.001 [−1.038, −0.473]
β 2 −0.0360.3730.397[−0.696, 0.532]
β 3 0.0540.1670.378[−0.220, 0.313]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, X.; Xu, J.; Guo, H.; Shi, J.; Zhao, W. An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data. Mathematics 2022, 10, 3702. https://doi.org/10.3390/math10193702

AMA Style

Huang X, Xu J, Guo H, Shi J, Zhao W. An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data. Mathematics. 2022; 10(19):3702. https://doi.org/10.3390/math10193702

Chicago/Turabian Style

Huang, Xifen, Jinfeng Xu, Hao Guo, Jianhua Shi, and Wenjie Zhao. 2022. "An MM Algorithm for the Frailty-Based Illness Death Model with Semi-Competing Risks Data" Mathematics 10, no. 19: 3702. https://doi.org/10.3390/math10193702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop