Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data

Huang, Xifen; Xu, Jinfeng; Zhou, Yunpeng

doi:10.3390/math10040538

Open AccessArticle

Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data

by

Xifen Huang

¹,

Jinfeng Xu

^1,*

and

Yunpeng Zhou

²

¹

School of Mathematics, Yunnan Normal University, Kunming 650092, China

²

Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(4), 538; https://doi.org/10.3390/math10040538

Submission received: 8 January 2022 / Revised: 30 January 2022 / Accepted: 31 January 2022 / Published: 9 February 2022

(This article belongs to the Special Issue Recent Advances in Computational Statistics)

Download

Browse Figure

Versions Notes

Abstract

:

Motivated by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, the objective of integration of important biomarkers for the early detection of Mild Cognitive Impairment (MCI) to Alzheimer’s disease (AD) as a therapeutic intervention is most likely to be beneficial in the early stages of disease progression. Developing predictors for MCI to AD comes down to genotype variables such that the dimension of predictors increases as the sample becomes large. Thus, we consider the sparsity concept of coefficients in a high-dimensional regression model with clustered failure time data such as ADNI, which enables enhancing predictive performances and facilitates the model’s interpretability. In this study, we propose two MM algorithms (profile and non-profile) for the shared frailty survival model firstly and then extend the two proposed MM algorithms to regularized estimation in sparse high-dimensional regression model. The convergence properties of our proposed estimators are also established. Furthermore simulation studies and analysis of ADNI data are illustrated by our proposed methods.

Keywords:

clustering; frailty model; sparsity; MM algorithm; ADNI

1. Introduction

In biomedical research, we often encounter clustered failure time data in which individuals from the same cluster (e.g., family) share common genetic and/or environmental factors. An illustrative example comes from the Alzheimer’s Disease Neuroimaging Initiative study where each participant may develop Mild Cognitive Impairment (MCI) and/or develop Alzheimer’s disease (AD). The times to these two events for each individual are expected to be highly associated.

In order to account for the correlation of associated failure times, shared frailty or random effect models are commonly used by researchers ([1,2,3,4]). In particular, the gamma frailty model ([5,6,7,8]) is numerically convenient for such analysis since the likelihood function exhibits a closed form. For example, ref. [2] proposed a frailty model where individuals within the same cluster share a common random effect. Refs. [9,10,11] provided a good and comprehensive review of the applications of different frailty models for clustered survival data. The authors of [12] compared several commonly used methods and demonstrated the advantages of the gamma frailty model. Refs. [13,14] also explored the nonparametric frailty model and the between-within frailty model for correlated survival data.

For high-dimension regression analysis, an important and useful strategy is to exploit sparsity and assume that the true parameters lie in a low-dimensional subspace. In the past years, sparsity-restricted estimation has attracted a great deal of attention in high-dimensional regression models. That is, only a few regression coefficients are assumed to be nonzero ([15,16]). In addition to the widely used LASSO penalty, there exists other types of regularization methods such as ridge regression ([17]), bridge regression ([18]), and elastic net ([19]). In order to improve the performance of the LASSO, many modifications such as the adaptive Lasso, the smoothly clipped absolute deviation (SCAD, [20]), and minimax concave penalty (MCP, [21]) have also been proposed.

In this paper, we develop fast and efficient algorithms for regularized estimation in the general frailty model for clustered failure time data. As the model parameters consists of both the regression coefficients and the unknown nonparametric baseline hazard function, computation in the frailty model with survival data is usually intensive, especially when frailty does not result in a closed-form likelihood function. The regularized estimation poses even more challenges for the problem and further complicates computational burden. The existing approaches of general frailty model almost rely on the EM algorithm. The EM algorithm adopts Newton’s method in its maximization step, which involves matrix inversion and may not have good performance in a high-dimensional environment. The minorization-maximization (MM) algorithm possesses an ascent property, which driving the target likelihood function to increase and reliably converges to the maximum from well-chosen initial values ([22,23,24]). In particular, ref. [25] proposed three different MM algorithms for the gamma frailty model and further demonstrated their utility for the estimation in high-dimensional situations via decomposing a high-dimensional objective function into separable low-dimensional functions. In this paper, we first develop a pair of MM algorithms, including profile and non-profile, for the general frailty model. As demonstrated by [25] in numerical studies, the decomposition inosculated with regularized estimation well in sparse high-dimensional settings. For illustration, we chose to use concave penalties such as the smoothly clipped absolute deviations penalty (SCAD, [20]) and the minimax concave penalty (MCP, [21]) to explore the sparsity because they both possess the good property of unbiasedness.

The rest of the paper is organized as follows. In Section 2, we provide the model description, an overview of MM principle, and then propose a pair of MM algorithms. In Section 3, we derive regularized estimation methods via profile and non-profile MM algorithms in sparse high-dimensional regression setting. The convergence properties of the proposed algorithms are provided in Section 4. In Section 5, a series of simulation studies was conducted to assess the finite-sample performances of the proposed methods. Section 6 provides a real application to the ADNI data.

2. The Model and Estimation

2.1. The Model

Consider datasets from some population that contains

M_{i} ⩾ 1

individuals in i-th subgroup of the population,

i = 1, \dots, B

. Individuals within the i-th subgroup have dependent event times due to some unobserved covariate information summarized in a frailty,

ω_{i}

. Let

Y_{i j}

be the event time, let

C_{i j}

be the censoring time, and let

X_{i j}^{⊤} = (X_{i j 1}, \dots, X_{i j p})

denote the potential covariates for the j-th individual in the i-th subgroup. The censoring time

C_{i j}

is assumed to be independent of

Y_{i j}

, given covarites

X_{i j}

and frailty

ω_{i}

. Define

t_{i j} = min (Y_{i j}, C_{i j})

and

δ_{i j} = I (Y_{i j} ⩽ C_{i j})

, where

I (\cdot)

denotes the indicator function. Suppose that censorship is noninformative. Conditional on frailty

ω_{i}

, the hazard rate function is of the following form:

\begin{matrix} λ (t_{i j} | X_{i j}, ω_{i}) = ω_{i} λ_{0} (t_{i j}) exp {X_{i j}^{⊤} β}, \end{matrix}

(1)

where

λ_{0} (\cdot)

is an arbitrary baseline hazard rate, and

β

is a vector of unknown parameters. Assume that

ω_{i}, i = 1, \dots, B

are independent and identically distributed with density function

f (ω_{i} | θ)

on the domain

W

. Denote

α = (θ, β, Λ_{0})

; we then propose profile MM and non-profile MM methods to estimate parameters based on the minorization-maximization (MM) principle.

2.2. An Overview of MM Principle

Before introducing the two proposed methods, let us review the MM principle. Assume that

Y_{o b s}

is the observed data,

ℓ (α ∣ Y_{o b s})

is the log-likelihood function with unknown parameter

α = {(α_{1}, \dots, α_{q})}^{T}

, and the maximum likelihood estimate of

α

is

\hat{α}

= argmax

ℓ (α ∣ Y_{o b s})

. The MM principle involves two M steps: One is a minorization step, and the other is maximization step. In a maximization problem, the first step is minorization, which aims to construct a minorization/surrogate function for the objective log-likelihood function

ℓ (α ∣ Y_{o b s})

to be maximized through a series of inequalities satisfying the following two conditions:

\{\begin{matrix} Q (α ∣ α^{(k)}) & ⩽ ℓ (α ∣ Y_{o b s}), \\ Q (α^{(k)} ∣ α^{(k)}) & = ℓ (α^{(k)} ∣ Y_{o b s}), \end{matrix}

(2)

where

α^{(k)}

is the k-th approximation of

\hat{α}

. Once the minorization function

Q (α ∣ α^{(k)})

is successfully constructed for objective function

ℓ (α ∣ Y_{o b s})

, the following maximization step is to maximizing the surrogate function

Q (α ∣ α^{(k)})

to obtain the

(k + 1)

-th approximation of

\hat{α}

rather than the objective function, i.e.,

α^{(k + 1)} = argmax Q (α ∣ α^{(k)}) .

By MM principle, we may have

ℓ (α^{(k + 1)} ∣ Y_{o b s}) ⩾ Q (α^{(k + 1)} ∣ α^{(k)}) ⩾ Q (α^{(k)} ∣ α^{(k)}) = ℓ (α^{(k)} ∣ Y_{o b s})

, and the values of the objective function continue to increase until convergence.

2.3. Profile MM Estimation Procedure

For general shared frailty models, we can write the observed log-likelihood function

ℓ (α | Y_{o b s})

as follows:

\begin{matrix} ℓ (α | Y_{o b s}) = \sum_{i = 1}^{B} log \int_{W} τ_{i} (ω_{i} | α) d ω_{i}, \end{matrix}

(3)

where the following is the case.

τ_{i} (ω_{i} | α) = f (ω_{i} | θ) \prod_{j = 1}^{M_{i}} \{{[λ_{0} (t_{i j}) ω_{i} exp (X_{i j}^{⊤} β)]}^{δ_{i j}} exp [- Λ_{0} (t_{i j}) ω_{i} exp (X_{i j}^{⊤} β)]\} .

Generally, the Laplace transform of the frailty’s distribution is theoretically intractable. Hence, the explicit form of marginal hazard is not available to us. However, that has not stopped us from developing the profile MM approach for estimation in general shared frailty model. Define the following:

v_{i} (ω_{i} | α^{(k)}) = \frac{τ_{i} (ω_{i} | α^{(k)})}{\int_{W} τ_{i} (ω_{i} | α^{(k)}) d ω_{i}},

and rewrite the objective function as follows.

\begin{matrix} ℓ (α | Y_{o b s}) = \sum_{i = 1}^{B} log [\int_{W} \frac{τ_{i} (ω_{i} | α)}{v_{i} (ω_{i} | α^{(k)})} \cdot v_{i} (ω_{i} | α^{(k)}) d ω_{i}] . \end{matrix}

(4)

By Jensen’s inequality, we have the following:

φ [\int_{X} h (x) \cdot g (x) d x] ⩾ \int_{X} φ [h (x)] \cdot g (x) d x,

where

X

is a subset of the real line

R

,

φ (\cdot)

is the concave function,

h (\cdot)

is an arbitrary real-valued function defined on

X

, and

g (\cdot)

is a density function defined on

X

. In Equation (4),

v_{i} (ω_{i} | α^{(k)})

is a density function, and choosing

h (x)

as

τ_{i} (ω_{i} | α) / v_{i} (ω_{i} | α^{(k)})

, we can apply the above Jensen’s inequality and construct the following surrogate function for

ℓ (α | Y_{o b s})

:

\begin{matrix} Q_{1} (α | α^{(k)}) = Q_{11} (θ | α^{(k)}) + Q_{12} (β, Λ_{0} | α^{(k)}), \end{matrix}

(5)

where the following is the case:

\begin{matrix} Q_{11} (θ | α^{(k)}) = \sum_{i = 1}^{B} \int_{W} log [f (ω_{i} | θ)] \cdot v_{i} (ω_{i} | θ^{(k)}, α^{(k)}) d ω_{i}, \end{matrix}

(6)

which only consists of parameter

θ

. Moreover, we have the following:

\begin{matrix} Q_{12} (β, Λ_{0} | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} [δ_{i j} log (λ_{0} (t_{i j})) + δ_{i j} X_{i j}^{⊤} β - A_{i}^{(k)} Λ_{0} (t_{i j}) exp (X_{i j}^{⊤} β)], \end{matrix}

(7)

where

A_{i}^{(k)} = \int_{W} ω_{i} \cdot v_{i} (ω_{i} | α^{(k)}) d ω_{i}, i = 1, \dots, B .

The minorizing function

Q_{1} (α | α^{(k)})

separates parameters

θ

and

(β, Λ_{0})

into (6) and (7), respectively. In the second M-step, the updated estimates of

θ

involve maximizing (6) numerically. Due to the presence of the nonparametric component

Λ_{0}

, updating

(β, Λ_{0})

is still a big challenge. Following [7], we apply the profile estimation method to profile out

Λ_{0}

in

Q_{12} (β, Λ_{0} | α^{(k)})

, which results in the estimate of

Λ_{0}

given

β

.

\begin{matrix} d {\hat{Λ}}_{0} (t_{i j}) = \frac{δ_{i j}}{\sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r s}^{⊤} β)} . \end{matrix}

(8)

Substituting (8) into

Q_{12} (β, Λ_{0} | α^{(k)})

, we obtain the following:

\begin{matrix} Q_{13} (β | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} \{δ_{i j} X_{i j}^{⊤} β - δ_{i j} log [\sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r s}^{⊤} β)]\}, \end{matrix}

(9)

which involves

β

only. In Equation (9), Newton’s method and large matrix inversion are required to update

β

in this procedure when there exist a large number of covariates. Here, we further construct minorizing functions for

Q_{13} (β | α^{(k)})

to separate the regression parameters

β_{1}, \dots, β_{q}

from each other under the MM principle. We first use the supporting hyperplane inequality:

- log (x) ⩾ - log (x_{0}) - \frac{x - x_{0}}{x_{0}}

to minorize

Q_{13} (β | α^{(k)})

; then, we have the following surrogate function:

\begin{matrix} Q_{14} (β | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} [δ_{i j} X_{i j}^{⊤} β - \frac{δ_{i j} \sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r s}^{⊤} β)}{\sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r s}^{⊤} β^{(k)})}] + c, \end{matrix}

where c is a constant not depending on

β

. We apply Jensen’s inequality to the concave function

- exp (\cdot)

in

Q_{14} (β | α^{(k)})

by rewriting the following:

X_{r s}^{⊤} β = \sum_{p = 1}^{q} π_{p r s} [π_{p r s}^{- 1} X_{p r s} (β_{p} - β_{p}^{(k)}) + X_{r s}^{⊤} β^{(k)}],

where

π_{p r s} = | X_{p r s} | / \sum_{p = 1}^{q} | X_{p r s} |

. Finally, the minorizing function for

Q_{14} (β | α^{(k)})

is as follows:

\begin{matrix} Q_{15} (β_{1}, \dots, β_{q} | α^{(k)}) \hat{=} \sum_{p = 1}^{q} Q_{15 p} (β_{p} | α^{(k)}), \end{matrix}

(10)

where the following is the case.

\begin{matrix} Q_{15 p} (β_{p} | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} {δ_{i j} X_{p i j} β_{p} \\ - \frac{δ_{i j} \sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} π_{p r s} exp [π_{p r s}^{- 1} X_{p r s} (β_{p} - β_{p}^{(k)}) + X_{r s}^{⊤} β^{(k)}]}{\sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r s}^{⊤} β^{(k)})}}, p = 1, \dots, q . \end{matrix}

(11)

In general, the structure of minorizing function for the objective log-likelihood function

ℓ (α | Y_{o b s})

in (3) is the following:

\begin{matrix} Q_{p r o} (θ, β | α^{(k)}) = Q_{11} (θ | α^{(k)}) + \sum_{p = 1}^{q} Q_{15 p} (β_{p} | α^{(k)}), \end{matrix}

(12)

with an explicit form update of

d Λ_{0}

by (8). We may observe that the objective function to be maximized is decomposed into a sum of

q + 1

univariate functions from (12) as

Q_{11} (θ | α^{(k)})

only consists of one parameter usually. The next maximization step of this MM algorithm only involves

q + 1

separate univariate optimizations and matrix inversion is unnecessary. Note that the success of the above profile MM algorithm requires the convergence of two improper integrals such as

\int_{W} ω_{i} \cdot v_{i} (ω_{i} | α^{(k)}) d ω_{i}

and

\int_{W} log [f (ω_{i} | θ)] \cdot v_{i} (ω_{i} | α^{(k)}) d ω_{i}

. Moreoveer, the convergences of these improper integrals are obvious when we assume the distribution of random effect comes from an exponential distribution family. The estimation proceeds by profile MM algorithm are summarized as follows:

Step 1.: Given initial values for $θ$ , $β$ , and $Λ_{0}$ ;
Step 2.: Update $θ$ via (6). Update each $β_{p}$ via (11) for $p = 1, \dots, q$ ;
Step 3.: Based on the update of $β$ , compute the estimates of $Λ_{0} (t_{i j})$ via (8);
Step 4.: Iterate steps 2 and 3 until convergence.

2.4. Non-Profile MM Estimation Procedure

In this subsection, we bypass the profile estimation procedure in previous subsection and continue to develop new MM procedures for Equation (7) to separate parameters

β

and nuisance baseline hazard rate

Λ_{0}

. Actually, to separate

β

and

Λ_{0}

of (7) is to deal with the last term

- Λ_{0} (t_{i j}) exp (X_{i j}^{⊤} β)

. As in [26], we use the following arithmetic-geometric mean inequality.

\begin{matrix} - \prod_{i = 1}^{n} x_{i}^{a_{i}} ⩾ - \sum_{i = 1}^{n} \frac{a_{i}}{{| | a | |}_{1}} x_{i}^{{| | a | |}_{1}} \end{matrix}

(13)

Here,

x_{i}

and

a_{i}

are non-negative. Choosing

x_{1} = Λ_{0} (t_{i j}) / Λ_{0}^{(k)} (t_{i j})

and

x_{2} = exp (X_{i j}^{⊤} β) /

exp (X_{i j}^{⊤} β^{(k)})

in inequality (13), we obtain the following surrogate function for (7):

\begin{matrix} Q_{2} (β, Λ_{0} | α^{(k)}) \\ = & \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} [δ_{i j} log (λ_{0} (t_{i j})) + δ_{i j} X_{i j}^{⊤} β - \frac{A_{i}^{(k)} exp (X_{i j}^{⊤} β^{(k)})}{2 Λ_{0}^{(k)} (t_{i j})} Λ_{0} {(t_{i j})}^{2} - \frac{A_{i}^{(k)} Λ_{0}^{(k)} (t_{i j})}{2 exp (X_{i j}^{⊤} β^{(k)})} exp (2 X_{i j}^{⊤} β)], \\ \hat{=} & Q_{21} (Λ_{0} | α^{(k)}) + Q_{22} (β | α^{(k)}) \end{matrix}

(14)

where the following is the case.

\begin{matrix} Q_{21} (Λ_{0} | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} [δ_{i j} log (λ_{0} (t_{i j})) - \frac{A_{i}^{(k)} exp (X_{i j}^{⊤} β^{(k)})}{2 Λ_{0}^{(k)} (t_{i j})} Λ_{0} {(t_{i j})}^{2}], \end{matrix}

(15)

\begin{matrix} Q_{22} (β | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} [δ_{i j} X_{i j}^{⊤} β - \frac{A_{i}^{(k)} Λ_{0}^{(k)} (t_{i j})}{2 exp (X_{i j}^{⊤} β^{(k)})} exp (2 X_{i j}^{⊤} β)] . \end{matrix}

(16)

In order to obtain the nonparametric estimate of

Λ_{0}

, the maximization of (15) is required. For ease of computation, a one-step late skill is applied to the first order derivative of Equation (15); then, we obtain the estimate of

λ_{0}

by the following:

\begin{matrix} d {\hat{Λ}}_{0} (t_{i j}) = \frac{δ_{i j}}{\sum_{r = 1}^{B} \sum_{s = 1}^{M_{r}} I (t_{r s} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r s}^{⊤} β^{(k)})}, \end{matrix}

(17)

which is same as (8) in the profile estimation method. To update

β

, a similar technique as dealing with

Q_{14} (β | α^{(k)})

is used. We apply Jensen’s inequality to the concave function

- exp (\cdot)

in

Q_{22} (β | α^{(k)})

by rewriting the following:

2 X_{i j}^{⊤} β = \sum_{p = 1}^{q} π_{p i j} [2 π_{p i j}^{- 1} X_{p i j} (β_{p} - β_{p}^{(k)}) + 2 X_{i j}^{⊤} β^{(k)}],

where

π_{p i j} = | X_{p i j} | / \sum_{p = 1}^{q} | X_{p i j} |

. In the end, the minorizing function for

Q_{22} (β | α^{(k)})

is as follows:

\begin{matrix} Q_{23} (β_{1}, \dots, β_{q} | α^{(k)}) \hat{=} \sum_{p = 1}^{q} Q_{23 p} (β_{p} | α^{(k)}), \end{matrix}

(18)

where the following is obtained:

\begin{matrix} Q_{23 p} (β_{p} | α^{(k)}) = \sum_{i = 1}^{B} \sum_{j = 1}^{M_{i}} \{δ_{i j} X_{p i j} β_{p} - \frac{π_{p i j} A_{i}^{(k)} Λ_{0}^{(k)} (t_{i j}) exp [2 π_{p i j}^{- 1} X_{p i j} (β_{p} - β_{p}^{(k)}) + 2 X_{i j}^{⊤} β^{(k)}]}{2 exp (X_{i j}^{⊤} β^{(k)})}\}, \end{matrix}

(19)

for

p = 1, \dots, q .

As a result, we construct the surrogate function for the objective log-likelihood function via a non-profile MM principle as follows:

\begin{matrix} Q_{n o n p r o} (θ, β | α^{(k)}) = Q_{11} (θ | α^{(k)}) + \sum_{p = 1}^{q} Q_{23 p} (β_{p} | α^{(k)}), \end{matrix}

(20)

with explicit form update of

d Λ_{0}

by (17). From (20), we can find similar nice features as

Q_{p r o} (θ, β | α^{(k)})

; that is,

Q_{n o n p r o} (θ, β | α^{(k)})

is a sum of

q + 1

univariate functions, which means that the next maximization (second M) step only involves

q + 1

simple univariate optimizations. It is worth noting that the parameter separated feature in the proposed profile MM and non-profile MM algorithms will help incoporate with the existing simple off-the-shelf accelerators well and brings about great effectiveness in computation time, as discussed in [25]. The estimation proceeds by non-profile MM algorithm are summarized as follows:

Step 1.: Given initial values of $θ$ , $β$ , and $Λ_{0}$ ;
Step 2.: Update the estimate of $θ$ via (6). Update the estimate of $β_{p}$ based on (19) for $p = 1, \dots, q$ ;
Step 3.: Using the updated estimate of $β$ , compute the estimates of $Λ_{0} (t_{i j})$ via (17);
Step 4.: Iterate steps 2 and 3 until convergence.

3. Regularized Estimation Methods via MM Methods

Followed by Section 3, both proposed estimation approaches created nice parameter-separated surrogate functions to be maximized which may incorporate with regularization cleverly. In this part, we propose regularized estimation approaches based on MM algorithms in regression analysis under general shared frailty model. Many variable selection criteria arise as special cases of the general formulation as discussed in [20], where the penalized likelihood function takes the form

\begin{matrix} ℓ^{P} (α | Y_{o b s}) = ℓ (θ, β, Λ_{0} | Y_{o b s}) - N \sum_{p = 1}^{q} P (| β_{p} |, λ), \end{matrix}

(21)

where

ℓ (θ, β, Λ_{0} | Y_{o b s})

is the log-likelihood function for the shared frailty model, q is the dimension of

β

,

P (\cdot, λ)

is a given nonnegative penalty function, and

λ ⩾ 0

is a tuning parameter, which is allowed to use

λ_{p}

in more general cases. This penalty shrinks some of the coefficients to zero. Under the scope of general frailty topic, the computation of MLEs is extremely complicated and hard in terms of accuracy as parameters of interests involve three blocks

θ, β,

and

Λ_{0}

, and even more complicated when there exists a large numbers of coefficients. It is worth mentioning that the proposed profile and non-profile MM algorithms decomposed the coefficient vector

β

from the other two blocks

θ,

and

Λ_{0}

and different coefficient parameters are separated from each other. This nice feature of the proposed profile and non-profile MM algorithms may mesh well with various regularization problems in (21) to produce more sparse and accurate estimates. Under profile MM algorithmic technique in (12), we obtain the corresponding minorization function for (21) as follows.

\begin{matrix} Q_{p r o} (θ, β | α^{(k)}) - N \sum_{p = 1}^{q} P (| β_{p} |, λ) . \end{matrix}

(22)

Under the non-profile MM algorithmic technique in (20), the minorization function for (21) is as follows.

\begin{matrix} Q_{n o n p r o} (θ, β | α^{(k)}) - N \sum_{p = 1}^{q} P (| β_{p} |, λ) . \end{matrix}

(23)

When

P (\cdot, λ)

is piecewise differentiable, nondecreasing, and concave on

(0, \infty)

such as

L_{1}

, MCP and SCAD penalties [27], explored a connection between the local quadratic approximation with MM algorithm. The penalty term

- P (\cdot, λ)

can be minorized by a local quadratic approximation form as follows:

\begin{matrix} - P (| β_{p} |, λ) ⩾ - P (| β_{p}^{(k)} |, λ) - \frac{[β_{p}^{2} - {(β_{p}^{(k)})}^{2}] P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |} \hat{=} Φ (β_{p} | β_{p}^{(k)}), \end{matrix}

(24)

which is actually a one-step minorizing process. By combining function (24) with (12) or (20), respectively, we obtain the final surrogate functions for penalized likelihood function (21) as follows:

\begin{matrix} Q_{p r o}^{P} (θ, β | α^{(k)}) = Q_{p r o} (θ, β | α^{(k)}) - N \sum_{p = 1}^{q} \frac{β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |} + c_{1}, \end{matrix}

(25)

and the following is obtained.

\begin{matrix} Q_{n o n p r o}^{P} (θ, β | α^{(k)}) = Q_{n o n p r o} (θ, β | α^{(k)}) - N \sum_{p = 1}^{q} \frac{β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |} + c_{2} . \end{matrix}

(26)

Both Equations (25) and (26) are written as a sum of a series of univariate functions so that maximizing (25) and (26) will be easier than directly maximizing (21). Moreover, some simple off-the-shelf accelerators may also be used here to make the optimization problems more simplier and efficient. Regularized estimation proceeds by (profile/non-profile) MM algorithms and are summarized as follows:

Step 1.: Given initial values of $θ$ , $β$ and $Λ_{0}$ ;
Step 2.: Update the estimate of $θ$ via (6);
Step 3.: For profile MM method, update $β$ by maximizing

$\sum_{p = 1}^{q} Q_{15 p} (β_{p} | α^{(k)}) - N \sum_{p = 1}^{q} \frac{β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |} .$

For non-profile MM method, update $β$ by maximizing

$\sum_{p = 1}^{q} Q_{23 p} (β_{p} | α^{(k)}) - N \sum_{p = 1}^{q} \frac{β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |};$
Step 4.: Using the updated estimate of $β$ in Step 3, compute the estimates of $Λ_{0} (t_{i j})$ via (8) for profile MM method and via (17) for non-profile MM method, respectively;
Step 5.: Iterate steps 2 to 4 until convergence.

Model Selection

From the recent literature, tuning parameter

λ

may be selected by multiple data driven model selection criteria, such as Bayesian information criterion BIC ([28]) and generalized cross-validation GCV ([29]). In this paper, we consider a widely used BIC-type criterion, defined by the following:

\begin{matrix} {B I C}_{λ} = - 2 ℓ (\hat{α}) + C_{n} (\hat{S} + 1) log (N), \end{matrix}

(27)

to select the tuning parameter

λ

, where

C_{n} = max {1, log [log (q + 1)]}

, q is the dimension of

β

, and the degrees of freedom

\hat{S}

are defined as the number of nonzero parameters in

\hat{β}

.

4. Theoretical Properties

We first consider the convergence properties for the profile and non-profile MM algorithms of maximizing

ℓ (α)

and

ℓ^{P} (α)

. As discussed in Section 3 and Section 4,

Q_{p r o} (θ, β | α^{(k)})

and

Q_{n o n p r o} (θ, β | α^{(k)})

are the constructed minorizing functions of

ℓ (α)

via the profile and non-profile MM approach,

Q_{p r o}^{P} (θ, β | α^{(k)})

and

Q_{n o n p r o}^{P} (θ, β | α^{(k)})

are the constructed minorizing functions of penalized likelihood function

ℓ^{P} (α)

via the profile and non-profile MM approach,

θ

and

β

are the parameter vectors, and

α^{(k)}

is its current estimate. In this paper, we assume that the common component

Q_{11} (θ | α^{(k)})

of all minorizing functions is strictly concave with respect to

θ

, and the second M step of MM principle is based on the Newton–Raphson method. As strict concavity holds for all surrogate functions

Q_{p r o} (\cdot | α^{(k)})

,

Q_{n o n p r o} (\cdot | α^{(k)})

,

Q_{p r o}^{P} (\cdot | α^{(k)})

and

Q_{n o n p r o}^{P} (\cdot | α^{(k)})

by using concave penalties

L_{1}

, MCP and SCAD, we denote their unique maximizers by

M_{p r o} (α^{(k)})

,

M_{n o n p r o} (α^{(k)})

,

M_{p r o}^{P} (α^{(k)})

, and

M_{n o n p r o}^{P} (α^{(k)})

, respectively. Following Proposition 15.4.3 of Lange [30], we provide the following convergence properties.

Proposition 1.

Assume the differentiability and coerciveness of

- ℓ (α)

hold, all stationary points of

- ℓ (α)

are isolated and the subsets

{α \in Ω : ℓ (α) ⩾ ℓ (α^{(k)})}

of parameter domain Ω are compact. Then, the profile iteration sequence

(θ^{(k + 1)}, β^{(k + 1)}) = M_{p r o} (α^{(k)})

together with

Λ_{0}^{(k + 1)}

in (8) and non-profile sequence

(θ^{(k + 1)}, β^{(k + 1)}) = M_{n o n p r o} (α^{(k)})

together with

Λ_{0}^{(k + 1)}

in (17) converge to the stationary point of

ℓ (α)

. If the strict concavity of

ℓ (α)

also hold, then the profile MM sequence and non-profile MM sequence converge to the same maximum point of

ℓ (α)

.

Proposition 2.

Suppose

P (\cdot | λ)

is piecewise differentiable, nondecreasing and concave on

(0, \infty)

, continuous at 0 and

P^{'} (0_{+} | λ) < \infty

. Then, for all

β_{p} \neq 0

,

Φ (β_{q} | β_{q}^{(k)})

as defined in (24) minorizes

- P (| β_{q} | | λ)

at points

\pm | β_{q} |

. In particular, conditions in construction minorization function hold, that is,

- P (| β_{q} | | λ) ⩾ Φ (β_{q} | β_{q}^{(k)})

for all

β_{q}

with equality at

β_{q} = β_{q}^{(k)}

and the ascent property

- P (| β_{q}^{(k + 1)} | | λ) ⩾ Φ (β_{q}^{(k + 1)} | β_{q}^{(k)}) ⩾ Φ (β_{q}^{(k)} | β_{q}^{(k)}) = - P (| β_{q}^{(k)} | | λ)

hold.

Based on Proposition 2, we can easily obtain the following convergence properties of profile MM and non-profile MM algorithms for maximizing penalized likelihood function

ℓ^{P} (α)

defined in (21).

Proposition 3.

Assume the differentiability and coerciveness of

- ℓ^{P} (α)

hold, all stationary points of

- ℓ^{P} (α)

are isolated and the subsets

{α \in Ω : ℓ^{P} (α) ⩾ ℓ^{P} (α^{(k)})}

of parameter domain Ω are compact. Then, the profile iteration sequence

(θ^{(k + 1)}, β^{(k + 1)}) = M_{p r o}^{P} (α^{(k)})

together with

Λ_{0}^{(k + 1)}

in (8) and non-profile sequence

(θ^{(k + 1)}, β^{(k + 1)}) = M_{n o n p r o}^{P} (α^{(k)})

together with

Λ_{0}^{(k + 1)}

in (17) converge to the stationary point of

ℓ^{P} (α)

. If the strict concavity of

ℓ^{P} (α)

also hold, then the profile MM sequence and non-profile MM sequence converge to the same maximum point of

ℓ^{P} (α)

.

5. Numerical Examples

Example 1.

We independently simulate data from three different frailty models:

\begin{matrix} λ (t | X_{i j}, ω_{i}) = ω_{i} λ_{0} (t) exp {X_{i j}^{⊤} β}, ω_{i} \sim \{\begin{matrix} L o g - n o r m a l (0, θ), & θ = 0.25, \\ I n v e r s e G a u s s i a n (θ, θ^{2}), & θ = 1, \\ G a m m a (1 / θ, 1 / θ), & θ = 2, \end{matrix} \end{matrix}

with three different sample sizes

(B, M) = {(15, 20), (30, 13), (50, 10)}

. The true value of regression vector

β

is set to be

(- 2_{6}^{⊤}, - 1_{6}^{⊤}, 1_{6}^{⊤}, 2_{6}^{⊤}, 3_{6}^{⊤})

with dimension

q = 30

and all

X_{i}

’s are generated from independent uniform distribution between 0 and

0.5

. The censoring times are generated from independent uniform distribution to yield censoring proportion at around 15% or 30%. In this example, we numerically illustrate the efficiency of two proposed profile and non-profile MM algorithms under three different (Log-normal, Inverse Gaussian, and Gamma) frailty models with three different sample sizes at two censoring situations. Furthermore, we compare the performance of the two proposed profile and non-profile MM algorithms with the existing estimation approach by coxph function of the survival R package under gamma and log-normal frailty models since the coxph function of the survival R package only allows gamma and log-normal frailty. The computation time of MM algorithm can be improved by using simple off-the-shelf accelerators ([31,32]); here, we implement the accelerated profile MM and non-profile MM algorithms with the squared iterative method (SqS1). We set the stopping criterion of iteration as follows.

\begin{matrix} \frac{|ℓ (α^{(k + 1)} | Y_{o b s}) - ℓ (α^{(k)} | Y_{o b s})|}{|ℓ (α^{(k)} | Y_{o b s})| + 1} < 10^{- 6} . \end{matrix}

Based on 500 replications, the average values of estimated frailty and regression parameters (MLE), their biases (Bias), their empirical standard deviations (SD), and run times (T) based on three estimation methods are summarized in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6. In general, with a sample size increase, the biases and empirical standard deviations of almost all parameters become smaller. Small sample size

(B, M) = (15, 20)

always causes obvious biases for frailty parameters in Log-normal and Inverse Gaussian frailty models because the number of clusters is too small. Moreover, a larger censoring proportion usually results in greater empirical standard deviations for almostly all cases. From Table 1, Table 2, Table 3 and Table 4, we also observe that the existing estimation approach using coxph function is the fastest among the three methods since the survival R package is optimised using the C language. In terms of estimation accuracy, the non-profile MM algorithm performs the best for both frailty and regression parameters, almost always exhibiting smallest biases and empirical standard deviations in all situations. Even in a small sample size such as

(B, M) = (15, 20)

, the non-profile MM estimates for frailty parameter still perform well, especially for Log-normal frailty and Gamma frailty models.

Example 2.

In this example, we simulated 200 realizations consisting of

B = 50

clusters and

M = 6

subjects in each cluster from the frailty model:

\begin{matrix} λ (t | X_{i j}, ω_{i}) = ω_{i} λ_{0} (t) exp {X_{i j}^{⊤} β}, \end{matrix}

(28)

where the frailty terms are

ω_{i} \sim G a m m a (1 / θ, 1 / θ)

with

θ = 0.5

for

i = 1, \dots, B

,

β = {(1, 3, 0_{46}^{⊤}, 2, 4)}^{⊤}, q = 50, λ_{0} (t) = 5

. The

X_{i}

were marginally standard normal, and the correlation between

X_{i}

and

X_{j}

is

ϱ^{| i - j |}

with

ϱ = 0.25, 0.75

, respectively. The censoring times were generated from uniform distribution to yield a censoring proportion around

30 %

. In this simulation experiment, the utility of our proposed profile MM and non-profile MM estimation methods for regularized estimation was illustrated in a sparse high-dimensional regression model (21) with three different penalties (

L_{1}

, MCP, and SCAD). The model error (ME) and relative model error (RME), which is the ratio of the model error of the regularized estimator and that of the ordinary maximum likelihood estimator, are calculated to evaluate the estimation accuracy. Based on 200 replications, we report the median of relative model errors (MRME) and the average number of correctly and incorrectly identified zero coefficients in Table 7 in which the column labeled “Correct" presents the average restricted only to the true zero coefficients, while the column labeled “Incorrect" depicts the average of coefficients erroneously set to 0. From Table 4, we can observe that the proposed profile and non-profile MM algorithms mesh very well with MCP and SCAD penalties and provide good results in both parameter estimation and variable selection, especially for MCP penalty. For

L_{1}

penalty, we observe that it meshes well with profile and non-profile MM algorithms under lower correlation case at

ϱ = 0.25

but tends to yield biased estimates and inaccurate variable selection results when the correlation between

X_{i}

and

X_{j}

becomes stronger. Furthermore, we report the average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties based on 200 repetition in Table 8. It can be observed that both profile and non-profile MM methods perform similarly well in varying ϱ and different penalties, always showing small biases and empirical standard deviations.

6. Real Data Analysis

Alzheimer Disease’s (AD) is the most common dementia that causes progressive memory and other body function losses. According to Alzheimer Association, Alzheimer’s is the sixth leading cause of death in the United States. Many researches have studied the development of AD from Mild Cognitive Impairment (MCI). The authors of [33] analyzed the AUC score of whether MCI will transfer to AD using LASSO penalized logistic regression. The authors of [34] applied cognitive scores, ApoE genotype, and CSF biomarkers to predict the transition time from MCI to AD. Moreover, there are research studies such as [35] that used image data to analyze the transition time to dementia. In this paper, we will apply clinical data and selected SNP genotypes to conduct feature selection under three different frailty models (Gamma, Inverse Gaussian, and Log-normal) using SCAD and MCP penalties. The dataset was obtained from ADNI database (adni.loni.usc.edu). In this dataset, 267 people were recorded as cognitively normal (CN) during the first visit. Among these people, 78 of them were diagnosed with MCI before the last visit. Eventually, we observed 22 people transferred from the MCI stage to dementia. To predict the these two conversion times, 19 clinical predictors were applied mainly based on the age, marriage status, education level, and test score wuch as Alzheimer’s Disease Assessment Scale cognitive subscale (ADAS-Cog) and Functional Activities Questionnaire (FAQ). Other than these predictors, we also included genotypes of SNP from GWAS such as ApoE-

ε 4

, which have a relationship with early onset dementia or late-onset dementia; 132 covariates are selected for model training.

In general, using the same notation from simulation, we have

(B, M) = (276, 2)

. Individuals are independent, and two events of the same individual are grouped into the same cluster that share the same frailty. The dataset contains the following information. For individual (

i = 1, 2, \dots, 276

) and event (

j = 1, 2

),

t_{i j}

is minimum value of event time

Y_{i j}

and censoring time

C_{i j}

. As shown by Table 9, censoring time

C_{i j}

in this dataset is the time difference between the first observation date of certain event (CN or MCI) and last observation date of this patient. The event time is the state transition time, and

Y_{i 1}

is the time difference between the first observation date of this patient in state CN and date of transition from CN to MCI. If no transition is observed, censoring occurs where

δ_{i 1} = 0

and

t_{i 1} = C_{i 1}

, otherwise,

δ_{i 1} = 1

and

t_{i 1} = Y_{i 1}

. Similarly,

Y_{i 2}

is the time difference between the first observation date of this patient in state MCI and the date of transition from MCI to Dementia. If no transition is observed,

δ_{i 2} = 0

and

t_{i 2} = C_{i 2}

, otherwise,

δ_{i 2} = 1

and

t_{i 2} = Y_{i 2}

.

X_{i j}^{⊤} = (X_{i j 1}, \dots, X_{i j p})

denote the potential covariates (such as SNP genotypes and clinical predictors) where

p = 132

. According to data description, it is reasonable to assume that censoring time

C_{i j}

is independent of event time

Y_{i j}

. Moreover, the two events from each individual are considered to be highly associated due to common genetic factors and/or certain habits. Thus, the frailty model is suitable for analyzing this dataset.

Three frailty models are applied for variable selection using SCAD and MCP penalties. According to the results presented by Table 10, Table 11 and Table 12, around 70 covariates were selected for Gamma Frailty and Log-normal Frailty model, and 27 covariates were selected by an Inverse Gaussian model. Similar covairates were selected by SCAD or MCP penalty for both Profile MM and Non-profile MM under the same model, which verifies the result from simulation study. For selected covariates, the ApoE-

ε 4

is selected as a negative effect to the survival time as expected. In addition to ApoE-

ε 4

, single SNPs such as rs2333227 and rs669 from the MPO and A2M gene are associated with the development of Alzheimer’s Disease. We can observe similar results from [36]. However, many selected SNPs are not recognized from other studies, and it needs to be analyzed further whether these SNPs can help to identify the risk of Alzheimer’s.

We use the model with the lowest BIC score for prediction by constructing prognostic index in order to see whether the selected covariates can help identify the individual with low risk or high risk. Let

X

be a collection of

X_{i j}^{⊤}

. The full dataset is divided into 10 parts (

X^{(k)}

(k = 1, 2, \dots, 10)

). Similarly to the cross-validation process, without the part of dataset

X^{(k)}

, covariates are selected using BIC criteria and the vector of parameters

{\hat{β}}_{k}

is estimated correspondingly. The risk factor

{\hat{r}}_{k} = X^{(k)} {\hat{β}}_{k}

(k = 1, 2, \dots, 10)

is calculated using dataset

X^{(k)}

without considering the individuals’ Latent variable. Individuals with estimated risk factors below median are classified into to low risk group, and and individual with estimated risk above median is classified into the high risk group. The survival plots for estimated high risk group and low risk group are constructed based on their true survival times. Figure 1 shows that the selected model has conducted a good estimation for the transition time from stage CN to stage MCI. Due to the lack of data for the second transition, their confidence intervals are overlapped. Therefore, the prediction result, especially from stage CN to stage MCI, has shown good estimation using the corresponding model, and two risk groups are distinguished by predicted risk levels.

7. Discussion

The profile and non-profile MM algorithms are proposed for high-dimension regression analysis with clustered failure time data where a general frailty is used to model within-cluster dependence and the penalty such as the SCAD and MCP is used for inducing sparsity. The proposed methods can separate the high-dimensional minorizing function into a sum of univariate functions after a sequence of minorization steps. These approaches avoid matrix inversion and provide a toolkit for developing more efficient algorithms in a broad range of in statistical optimization problems. Meshing well with sparsity penalties such as SCAD or MCP, the two regularized MM algorithms are further shown to exhibit certain numerical advantages in sparse high-dimensional regression analyses. The shared frailty model only represents a special and relatively simple model among the widely used frailty models for multivariate survival data. For example, the standard shared frailty model assumes that all subjects in the same cluster share a common frailty. This assumption can be relaxed to the correlated frailty terms among subjects in the same cluster. Correlated frailty models present the limitation that shared frailty models may only be used to fit positively correlated event times. Furthermore, frailty is assumed to be time-constant. However, unobserved heterogeneity may also be time dependent, which can be explained by an unobserved random process that unfolds over time. Based on this idea, several approaches have been proposed such as diffusion processes modeling or Levy processes modeling for frailty. As an approach based on birth-death Poisson or simpler, piecewise constant, frailty models have recently been proposed. It would be worthwhile to extend the proposed two MM algorithms in these applications. Lastly, spatially correlated survival data present another important and useful setting where the proposed MM algorithms can be further extended to accommodate.

Author Contributions

Conceptualization, J.X.; Data curation, X.H., J.X. and Y.Z.; Formal analysis, X.H. and Y.Z.; Investigation, Y.Z.; Methodology, X.H. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the Editor and referees for many helpful comments. This work has been partially supported by the National Natural Science Foundation of China (11901515).

Conflicts of Interest

The authors declare no conflict of interest.

References

Clayton, D.G. A model for association in bivariate life tables and its application in epidemiologic studies of familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [Google Scholar] [CrossRef]
Clayton, D.G.; Cuzick, J. Multivariate generalizations of the proportional hazards model. J. R. Stat. Soc. Ser. 1985, 148, 82–117. [Google Scholar] [CrossRef]
Oakes, D. Bivariate survival models induced by frailties. J. Am. Stat. Assoc. 1989, 84, 487–493. [Google Scholar] [CrossRef]
Zeng, D.; Chen, Q.; Ibrahim, J. Gamma frailty transformation models for multivariate survival times. Biometrika 2009, 96, 277–291. [Google Scholar] [CrossRef] [PubMed]
Andersen, P.K.; Klein, J.P.; Knudsen, K.M.; Palacios, R.T. Estimation of variance in Cox’s regression model with shared gamma frailties. Biometrics 1997, 53, 1475–1484. [Google Scholar] [CrossRef] [PubMed]
Cox, D.R. Regression models and life tables (with discussion). J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar]
Klein, J.P. Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 1992, 48, 795–806. [Google Scholar] [CrossRef]
Nielsen, G.G.; Gill, R.D.; Andersen, P.K.; Sorensen, T.I.A. A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Stat. 1992, 19, 25–43. [Google Scholar]
Balan, T.A.; Putter, H. A tutorial on frailty models Stat. Methods Med Res. 2020, 29, 3424–3454. [Google Scholar] [CrossRef]
Do Ha, I.; Lee, Y. A review of h-likelihood for survival analysis. Jpn. J. Stat. Data Sci. 2021, 4, 1157–1178. [Google Scholar]
Duchateau, L.; Janssen, P. The Frailty Model; Springer Science & Business Media: New York, NY, USA, 2007. [Google Scholar]
Glidden, D.V.; Vittinghoff, E. Modelling clustered survival data from multicentre clinical trials. Stat. Med. 2004, 23, 369–388. [Google Scholar] [CrossRef] [PubMed]
Dahlqwist, E.; Pawitan, Y.; Sjölander, A. Regression standardization and attributable fraction estimation with between-within frailty models for clustered survival data. Stat. Methods Med Res. 2019, 28, 462–485. [Google Scholar] [CrossRef] [PubMed]
Manda, S.O. A nonparametric frailty model for clustered survival data. Commun. Stat. Methods 2011, 40, 863–875. [Google Scholar] [CrossRef]
Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic Decomposition by Basis Pursuit. Siam J. Sci. Comput. 1998, 20, 33–61. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Frank, L.E.; Friedman, J.H. A Statistical View of Some Chemometrics Regression Tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zhang, C.-H. Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
Lange, K.; Hunter, D.R.; Yang, I. Optimization transfer using surrogate objective functions (with discussions). J. Comput. Graph. Stat. 2000, 9, 1–20. [Google Scholar]
Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med. Res. 1997, 6, 38–54. [Google Scholar] [CrossRef] [PubMed]
Huang, X.F.; Xu, J.F.; Tian, G.L. On profile MM algorithms for gamma frailty survival models. Stat. Sin. 2019, 29, 895–916. [Google Scholar] [CrossRef] [Green Version]
Lange, K.; Zhou, H. MM algorithms for geometric and signomial programming. Math. Program. Ser. 2014, 143, 339–356. [Google Scholar] [CrossRef] [Green Version]
Hunter, D.R.; Li, R. Variable selection using MM algorithms. Ann. Stat. 2005, 33, 1617–1642. [Google Scholar] [CrossRef] [Green Version]
Schwarz, C. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Craven, P.; Wahba, G. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 1979, 31, 377–403. [Google Scholar] [CrossRef]
Lange, K. Numerical Analysis for Statisticians, 2nd ed.; Statistics and Computing; Springer: New York, NY, USA, 2010. [Google Scholar]
Varadhan, R.; Roland, C. Simple and globally convergent methods for accelerating the convergence of any EM algorithms. Scand. J. Stat. 2008, 35, 335–353. [Google Scholar] [CrossRef]
Zhou, H.; Alexander, D.; Lange, K. A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 2011, 21, 261–273. [Google Scholar] [CrossRef] [Green Version]
Ye, J.P.; Farnum, M.; Yang, E.; Verbeeck, R.; Lobanov, V.; Raghavan, V.; Novak, G.; DiaBernardo, A.; Narayan, V.A. Sparse learning and stability selection for predicting mci to ad conversion using baseline adni data. BMC Neurol. 2012, 12, 46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Da, X.; Toledo, J.; Toledo, J.; Zee, J.; Wolk, D.A.; Xie, S.X.; Ou, Y.; Shacklett, A.; Parmpi, P.; Shaw, L.; et al. Integration and relative value of biomarkers for prediction of mci to ad progression: Spatial patterns of brain atrophy, cognitive scores, apoe genotype, and csf markers. Neuroimage Clin. 2014, 4, 164–173. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kong, D.H.; Ibrahim, J.G.; Lee, E.; Zhu, H.T. Flcrm: Functional linear cox regression model. Biometrics 2018, 74, 109–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zappia, M.; Manna, I.; Serra, P.; Cittadella, R.; Andreoli, V.; La Russa, A.; Annesi, F.; Spadafora, P.; Romeo, N.; Nicoletti, G.; et al. Increased risk for alzheimer disease with the interaction of mpo and a2m polymorphisms. Arch. Neurol. 2004, 61, 341–344. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Survival curves of the transition times from CN to MCI and from MCI to AD for high-risk and low risk groups constructed using prognostic index.

Table 1. Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 15% censoring.

Par.	Scenario 1: $(B, M) = (15, 20)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		8.5716			5.7928			0.0586
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.3438	0.0938	0.3359	0.2925	0.0425	0.3173	0.2955	0.0455	0.3458
$β_{1}$	−2.2157	−0.2157	0.5034	−2.0969	-0.0969	0.5175	−2.1697	−0.1697	0.5842
$β_{5}$	−2.1316	−0.1316	0.5725	−2.1392	-0.1392	0.5567	−2.2005	−0.2005	0.5747
$β_{10}$	−1.0976	−0.0976	0.5523	−1.0863	−0.0863	0.5232	−1.0854	−0.0854	0.5794
$β_{15}$	1.0720	0.0720	0.5160	1.0126	0.0126	0.5185	1.1232	0.1232	0.5392
$β_{20}$	2.1871	0.1871	0.5584	2.0966	0.0966	0.5286	2.0560	0.0560	0.5217
$β_{25}$	3.2619	0.2619	0.5576	3.1412	0.1412	0.5455	3.2396	0.2396	0.6022
$β_{30}$	3.2156	0.2156	0.5775	3.1507	0.1507	0.5468	3.2781	0.2781	0.5669
Par.	Scenario 2: $(B, M) = (30, 13)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		5.7458			7.5817			0.0893
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.3222	0.0722	0.3006	0.2520	0.0020	0.1099	0.2813	0.0313	0.1065
$β_{1}$	−2.1110	−0.1110	0.4399	−2.0930	−0.0930	0.4531	−2.0783	−0.0783	0.4743
$β_{5}$	−2.1236	−0.1236	0.4634	−2.1208	−0.1208	0.4615	−2.1232	−0.1232	0.4384
$β_{10}$	−1.0788	−0.0788	0.4491	−1.0528	−0.0528	0.4517	−1.0598	−0.0598	0.4877
$β_{15}$	1.0507	0.0507	0.4467	0.9944	−0.0056	0.4416	1.0457	0.0457	0.4539
$β_{20}$	2.0663	0.0663	0.4679	2.0421	0.0421	0.4665	2.1415	0.1415	0.4872
$β_{25}$	3.1734	0.1734	0.4824	3.0586	0.0586	0.4499	3.2054	0.2054	0.4726
$β_{30}$	3.1833	0.1833	0.4574	3.0666	0.0666	0.4371	3.1279	0.1279	0.4558
Par.	Scenario 3: $(B, M) = (50, 10)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		6.2193			11.4968			0.1281
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.2683	0.0183	0.2427	0.2478	−0.0022	0.0862	0.2866	0.0366	0.0884
$β_{1}$	−2.0630	−0.0630	0.3870	−2.0491	−0.0491	0.3831	−2.0548	−0.0548	0.4021
$β_{5}$	−2.0668	−0.0668	0.3975	−2.0430	−0.0430	0.3903	−2.0340	−0.0340	0.3716
$β_{10}$	−1.0286	−0.0286	0.3875	−1.0327	−0.0327	0.3799	−1.0583	−0.0583	0.4058
$β_{15}$	1.0613	0.0613	0.4001	1.0131	0.0131	0.3915	1.0136	0.0136	0.3867
$β_{20}$	2.0946	0.0946	0.3984	2.0315	0.0315	0.3853	2.0697	0.0697	0.4497
$β_{25}$	3.1535	0.1535	0.4104	3.0739	0.0739	0.3973	3.1178	0.1178	0.4147
$β_{30}$	3.1266	0.1266	0.4187	3.0378	0.0378	0.4088	3.1216	0.1216	0.3875

Table 2. Simulation results for Log-normal frailty model in Example 1 with different sample sizes at 30% censoring.

Par.	Scenario 1: $(B, M) = (15, 20)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		4.8084			5.7248			0.0541
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.2712	0.0212	0.1446	0.2491	−0.0009	0.1434	0.3022	0.0522	0.1507
$β_{1}$	−2.2181	−0.2181	0.6137	−2.1211	−0.1211	0.5712	−2.1928	−0.1928	0.6084
$β_{5}$	−2.1717	−0.1717	0.6270	−2.1431	−0.1431	0.6230	−2.2133	−0.2133	0.5920
$β_{10}$	−1.1136	−0.1136	0.6001	−1.0786	−0.0786	0.5881	−1.0936	−0.0936	0.6158
$β_{15}$	1.0926	0.0926	0.6117	1.0452	0.0452	0.5603	1.0499	0.0499	0.6049
$β_{20}$	2.2211	0.2211	0.6234	2.0423	0.0423	0.5667	2.2068	0.2068	0.6567
$β_{25}$	3.2667	0.2667	0.6156	3.1369	0.1369	0.6192	3.2481	0.2481	0.5558
$β_{30}$	3.2490	0.2490	0.6192	3.1452	0.1452	0.6355	3.2599	0.2599	0.5502
Par.	Scenario 2: $(B, M) = (30, 13)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		5.56726			7.7258			0.0818
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.3189	0.0689	0.9271	0.2496	−0.0004	0.1150	0.2749	0.0249	0.1077
$β_{1}$	−2.0826	−0.0826	0.4974	−2.0987	−0.0987	0.4898	−2.0761	−0.0761	0.5023
$β_{5}$	−2.1135	−0.1135	0.5391	−2.0965	−0.0965	0.4950	−2.0974	−0.0974	0.5200
$β_{10}$	−1.0764	−0.0764	0.5065	−1.0743	−0.0743	0.5180	−1.0510	−0.0510	0.5055
$β_{15}$	1.0654	0.0654	0.5137	1.0433	0.0433	0.5028	1.0739	0.0739	0.5766
$β_{20}$	2.1250	0.1250	0.5033	2.0525	0.0525	0.4933	2.1029	0.1029	0.5137
$β_{25}$	3.1731	0.1731	0.5236	3.0821	0.0821	0.5105	3.1316	0.1316	0.5783
$β_{30}$	3.1966	0.1966	0.5207	3.0716	0.0716	0.5034	3.2236	0.2236	0.5268
Par.	Scenario 3: $(B, M) = (50, 10)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		4.7591			8.4607			0.1188
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.2729	0.0229	0.4236	0.2434	−0.0066	0.0865	0.2792	0.0292	0.0966
$β_{1}$	−2.0917	−0.0917	0.3814	−2.0714	−0.0714	0.3741	−2.1494	−0.1494	0.4376
$β_{5}$	−2.0692	−0.0692	0.4076	−2.0495	−0.0495	0.3990	−2.0617	−0.0617	0.4617
$β_{10}$	−1.0320	−0.0320	0.3822	−1.0341	−0.0341	0.3725	−1.0714	−0.0714	0.4174
$β_{15}$	1.0606	0.0606	0.3928	1.0163	0.0163	0.3843	1.0991	0.0991	0.3947
$β_{20}$	2.0530	0.0530	0.3851	1.9865	−0.0135	0.3773	2.0780	0.0780	0.4779
$β_{25}$	3.1058	0.1058	0.4000	3.0158	0.0158	0.3855	3.1531	0.1531	0.4395
$β_{30}$	3.1071	0.1071	0.3849	3.0182	0.0182	0.3752	3.1250	0.1250	0.4593

Table 3. Simulation results for Gamma frailty model in Example 1 with different sample sizes at 15% censoring.

Par.	Scenario 1: $(B, M) = (15, 20)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		22.3576			18.1613			0.0846
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	2.0584	0.0584	0.6494	2.1028	0.1028	0.7066	2.1197	0.1197	0.7023
$β_{1}$	−2.1343	−0.1343	0.5452	−2.1155	−0.1155	0.4974	−2.1027	−0.1027	0.6415
$β_{5}$	−2.1700	−0.1700	0.5426	−2.1303	−0.1303	0.4923	−2.1633	−0.1633	0.5354
$β_{10}$	−1.0561	−0.0561	0.5395	−1.0550	−0.0550	0.5505	−1.0768	−0.0768	0.6086
$β_{15}$	1.0515	0.0515	0.5749	1.0073	0.0073	0.5046	1.0883	0.0883	0.6006
$β_{20}$	2.1385	0.1385	0.542	2.1311	0.1311	0.5639	2.1685	0.1685	0.5608
$β_{25}$	3.1702	0.1702	0.5815	3.1928	0.1928	0.5893	3.2140	0.2140	0.5916
$β_{30}$	3.2185	0.2185	0.5623	3.1268	0.1268	0.6005	3.2433	0.2433	0.6320
Par.	Scenario 2: $(B, M) = (30, 13)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		23.2786			41.7191			0.1438
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	2.0934	0.0934	0.4866	2.0270	0.0270	0.4817	2.151	0.1510	0.6232
$β_{1}$	−2.1266	−0.1266	0.4648	−2.1027	−0.1027	0.4600	−2.1189	−0.1189	0.5060
$β_{5}$	−2.1197	−0.1197	0.4763	−2.0886	−0.0886	0.4549	−2.1842	−0.1842	0.4881
$β_{10}$	−1.0328	−0.0328	0.4703	−1.0685	−0.0685	0.4789	−1.0671	−0.0671	0.4818
$β_{15}$	1.0604	0.0604	0.4871	1.0212	0.0212	0.4847	1.0956	0.0956	0.4446
$β_{20}$	2.1313	0.1313	0.4696	2.0185	0.0185	0.4745	2.0961	0.0961	0.4848
$β_{25}$	3.1788	0.1788	0.4695	3.0747	0.0747	0.4954	3.1749	0.1749	0.4888
$β_{30}$	3.1934	0.1934	0.5120	3.1287	0.1287	0.4953	3.1908	0.1908	0.5070
Par.	Scenario 3: $(B, M) = (50, 10)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		27.4950			47.8408			0.2415
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	2.0792	0.0792	0.4091	2.0268	0.0268	0.3890	2.0905	0.0905	0.4287
$β_{1}$	−2.0896	−0.0896	0.4017	−2.0466	−0.0466	0.4406	−2.1372	−0.1372	0.4366
$β_{5}$	−2.0619	−0.0619	0.4060	−2.0388	−0.0388	0.4199	−2.1541	−0.1541	0.463
$β_{10}$	−1.0431	−0.0431	0.4186	−1.0190	−0.0190	0.4047	−1.0672	−0.0672	0.4016
$β_{15}$	1.0739	0.0739	0.4190	1.0363	0.0363	0.4025	1.0006	0.0006	0.4269
$β_{20}$	2.0655	0.0655	0.4141	2.0694	0.0694	0.4272	2.0781	0.0781	0.4213
$β_{25}$	3.1542	0.1542	0.3919	3.0680	0.0680	0.4461	3.1171	0.1171	0.4331
$β_{30}$	3.1085	0.1085	0.4201	3.0707	0.0707	0.4800	3.1198	0.1198	0.4200

Table 4. Simulation results for Gamma frailty model in Example 1 with different sample sizes at 30% censoring.

Par.	Scenario 1: $(B, M) = (15, 20)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		18.1609			27.0720			0.0601
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	2.0325	0.0325	0.8620	2.0844	0.0844	0.7522	2.0332	0.0332	0.6807
$β_{1}$	−1.9628	0.0372	0.8105	−2.2108	−0.2108	0.6094	−2.1984	−0.1984	0.5954
$β_{5}$	−1.9076	0.0924	0.8051	−2.1330	−0.1330	0.5744	−2.1072	−0.1072	0.6045
$β_{10}$	−1.0357	−0.0357	0.5701	−1.0732	−0.0732	0.5423	−1.1172	−0.1172	0.5989
$β_{15}$	0.9478	−0.0522	0.6464	1.0731	0.0731	0.5596	1.0585	0.0585	0.5717
$β_{20}$	1.9482	−0.0518	0.7783	2.0924	0.0924	0.5450	2.1451	0.1451	0.6499
$β_{25}$	2.9937	−0.0063	1.0379	3.1667	0.1667	0.6197	3.2788	0.2788	0.5763
$β_{30}$	2.9867	−0.0133	1.0485	3.2437	0.2437	0.6203	3.2631	0.2631	0.5981
Par.	Scenario 2: $(B, M) = (30, 13)$
Par.	Profile MM approach			Non-Profile MM Approach			Coxph
T		16.7308			30.7202			0.0995
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	2.0880	0.0880	0.5207	2.0962	0.0962	0.4792	2.1913	0.1913	0.6610
$β_{1}$	−2.1346	−0.1346	0.5360	−2.1102	−0.1102	0.4817	−2.1127	−0.1127	0.5501
$β_{5}$	−2.0743	−0.0743	0.5299	−2.1301	−0.1301	0.5327	−2.0796	−0.0796	0.5362
$β_{10}$	−1.0846	−0.0846	0.5289	−1.0862	−0.0862	0.5136	−1.0690	−0.0690	0.4962
$β_{15}$	1.0980	0.0980	0.5242	1.0001	0.0001	0.5169	1.0932	0.0932	0.5306
$β_{20}$	2.1329	0.1329	0.5216	2.0354	0.0354	0.5278	2.1117	0.1117	0.5432
$β_{25}$	3.2089	0.2089	0.5183	3.1365	0.1365	0.5155	3.2383	0.2383	0.5587
$β_{30}$	3.1702	0.1702	0.5345	3.1695	0.1695	0.5495	3.2301	0.2301	0.5380
Par.	Scenario 3: $(B, M) = (50, 10)$
Par.	Profile MM Approach			Non-Profile MM Approach			Coxph
T		16.3097			29.1778			0.1647
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$θ$	2.0999	0.0999	0.4480	2.0462	0.0462	0.3831	2.0948	0.0948	0.3700
$β_{1}$	−2.1151	−0.1151	0.4605	−2.0508	−0.0508	0.4187	−2.1324	−0.1324	0.4554
$β_{5}$	−2.0767	−0.0767	0.4197	−2.0876	−0.0876	0.4615	−2.1001	−0.1001	0.4394
$β_{10}$	−1.0963	−0.0963	0.4629	−1.0712	−0.0712	0.4393	−1.0513	−0.0513	0.4306
$β_{15}$	1.0443	0.0443	0.4313	0.9910	−0.0090	0.4234	2.0347	0.0347	0.4476
$β_{20}$	2.0813	0.0813	0.4775	2.0330	0.0330	0.4630	2.1039	0.1039	0.4538
$β_{25}$	3.1089	0.1089	0.4848	3.0978	0.0978	0.4660	3.1829	0.1829	0.4484
$β_{30}$	3.1216	0.1216	0.4805	3.1462	0.1462	0.5005	3.1834	0.1834	0.5308

Table 5. Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 15% censoring.

Par.	Scenario 1: $(B, M) = (15, 20)$
Par.	Profile MM Approach			Non-Profile MM Approach
T		3.0369			7.9534
	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.7060	−0.2940	0.6920	0.7130	−0.2870	0.6511
$β_{1}$	−2.1516	−0.1516	0.5664	−2.1002	−0.1002	0.4646
$β_{5}$	−2.1783	−0.1783	0.5366	−2.1062	−0.1062	0.5173
$β_{10}$	−1.1028	−0.1028	0.5526	−1.0881	−0.0881	0.4719
$β_{15}$	1.1284	0.1284	0.5523	0.9661	−0.0339	0.5069
$β_{20}$	2.2162	0.2162	0.5225	1.9742	−0.0258	0.4936
$β_{25}$	3.2591	0.2591	0.5269	2.9877	−0.0123	0.5332
$β_{30}$	3.2139	0.2139	0.5553	3.0247	0.0247	0.5250
Par.	Scenario 2: $(B, M) = (30, 13)$
Par.	Profile MM Approach			Non-Profile MM Approach
T		5.2872			27.2494
	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.7537	−0.2463	0.5623	0.9178	−0.0822	0.4438
$β_{1}$	−2.1515	−0.1515	0.4592	−2.1441	−0.1441	0.4455
$β_{5}$	−2.1300	−0.1300	0.4646	−2.1146	−0.1146	0.4907
$β_{10}$	−1.0873	−0.0873	0.4743	−1.0830	−0.0830	0.4316
$β_{15}$	1.0681	0.0681	0.4755	1.0223	0.0223	0.4445
$β_{20}$	2.1454	0.1454	0.4600	2.0591	0.0591	0.4625
$β_{25}$	3.2323	0.2323	0.4972	3.1031	0.1031	0.4807
$β_{30}$	3.2223	0.2223	0.4698	3.0855	0.0855	0.4907
Par.	Scenario 3: $(B, M) = (50, 10)$
Par.	Profile MM Approach			Non-Profile MM Approach
T		54.9905			36.7796
	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.8461	−0.1539	0.4143	0.9724	−0.0276	0.3678
$β_{1}$	−2.0933	−0.0933	0.4056	−2.0688	−0.0688	0.4135
$β_{5}$	−2.1509	−0.1509	0.4375	−2.0664	−0.0664	0.4318
$β_{10}$	−1.0878	−0.0878	0.3992	−1.0790	−0.0790	0.3824
$β_{15}$	1.0437	0.0437	0.4217	0.9728	−0.0272	0.4111
$β_{20}$	2.1137	0.1137	0.4077	2.0257	0.0257	0.4005
$β_{25}$	3.1511	0.1511	0.4311	3.0709	0.0709	0.4129
$β_{30}$	3.1846	0.1846	0.4137	3.0540	0.0540	0.4219

Table 6. Simulation results for Inverse Gaussian model in Example 1 with different sample sizes at 30% censoring.

Par.	Scenario 1: $(B, M) = (15, 20)$
Par.	Profile MM Approach			Non-Profile MM Approach
T		9.7444			16.1049
	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.5343	−0.4657	0.6665	0.7017	−0.2983	0.6431
$β_{1}$	−2.1693	−0.1693	0.6153	−2.1795	−0.1795	0.6054
$β_{5}$	−2.2288	−0.2288	0.5688	−2.1677	−0.1677	0.6601
$β_{10}$	−1.0228	−0.0228	0.5690	−1.1006	−0.1006	0.5843
$β_{15}$	1.1635	0.1635	0.5542	1.0577	0.0577	0.6157
$β_{20}$	2.2779	0.2779	0.6479	2.0902	0.0902	0.6168
$β_{25}$	3.2790	0.2790	0.6760	3.1711	0.1711	0.6672
$β_{30}$	3.3969	0.3969	0.6867	3.1939	0.1939	0.6412
Par.	Scenario 2: $(B, M) = (30, 13)$
Par.	Profile MM Approach			Non-Profile MM Approach
T		22.7787			22.0124
	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.6564	−0.3436	0.5263	1.0203	0.0203	0.5297
$β_{1}$	−2.1801	−0.1801	0.5395	−2.1915	−0.1915	0.4992
$β_{5}$	−2.0213	−0.0213	0.4727	−2.1508	−0.1508	0.5142
$β_{10}$	−1.1390	−0.1390	0.5202	−1.0678	−0.0678	0.5140
$β_{15}$	1.0824	0.0824	0.5276	0.9845	−0.0155	0.5005
$β_{20}$	2.1354	0.1354	0.4639	2.0252	0.0252	0.5088
$β_{25}$	3.1809	0.1809	0.6421	3.1362	0.1362	0.5233
$β_{30}$	3.2375	0.2375	0.5542	3.0828	0.0828	0.5183
Par.	Scenario 3: $(B, M) = (50, 10)$
Par.	Profile MM Approach			Non-Profile MM Approach
T		20.9400			32.6620
	MLE	Bias	SD	MLE	Bias	SD
$θ$	0.8502	−0.1498	0.3856	0.9809	−0.0191	0.3556
$β_{1}$	−2.1046	−0.1046	0.4252	−2.0688	−0.0688	0.4124
$β_{5}$	−2.1536	−0.1536	0.4224	−2.1123	−0.1123	0.4065
$β_{10}$	−1.0597	−0.0597	0.3893	−1.0542	−0.0542	0.3806
$β_{15}$	1.0692	0.0692	0.3973	1.0067	0.0067	0.3841
$β_{20}$	2.1129	0.1129	0.4162	2.0184	0.0184	0.4005
$β_{25}$	3.1770	0.1770	0.4159	3.0489	0.0489	0.4005
$β_{30}$	3.2102	0.2102	0.4277	3.0832	0.0832	0.4120

Table 7. The median of relative model errors for gamma frailty model by

L_{1}

, MCP, and SCAD penalties with sample size

(B, M) = (50, 6)

based on 200 realizations in Example 2.

Table 7. The median of relative model errors for gamma frailty model by

L_{1}

, MCP, and SCAD penalties with sample size

(B, M) = (50, 6)

based on 200 realizations in Example 2.

	MRME	Zeros		MRME	Zeros
		Correct	Incorrect		Correct	Incorrect
Penalty	$ϱ = 0.25$			$ϱ = 0.75$
Profile MM method
$L_{1}$	0.159	45.775	0	0.163	44.39	0
MCP ( $γ = 3$ )	0.091	46	0	0.066	46	0
SCAD ( $γ = 3.7$ )	0.143	45.915	0	0.101	45.965	0
Non-profile MM method
$L_{1}$	0.089	45.74	0	0.188	44.225	0
MCP ( $γ = 3$ )	0.051	46	0	0.085	46	0
SCAD ( $γ = 3.7$ )	0.077	45.885	0	0.106	45.88	0

Table 8. The average values of estimated parameters (MLE), their biases (Bias), and their empirical standard deviations (SD) under MCP and SCAD penalties in Example 2.

Penalty	Par.	Profile MM Approach			Non-Profile MM Approach
Penalty	Par.	MLE	Bias	SD	MLE	Bias	SD
MCP (ϱ = 0.25)	$θ$	0.4835	$- 0.0165$	0.1120	0.4820	$- 0.0180$	0.1398
	$β_{1}$	$0.9993$	$- 0.0007$	0.0950	0.9988	$- 0.0012$	0.1028
	$β_{2}$	$3.0090$	$0.0090$	0.1984	2.9856	$- 0.0144$	0.1971
	$β_{49}$	$1.9940$	$- 0.0060$	0.1513	1.9791	$- 0.0209$	0.1512
	$β_{50}$	$3.9697$	$- 0.0303$	0.2498	3.9868	$- 0.0132$	0.2566
SCAD (ϱ = 0.25)	$θ$	0.4967	$- 0.0033$	0.1267	0.4800	$- 0.0200$	0.1335
	$β_{1}$	1.0133	0.0133	0.0960	0.9864	$- 0.0136$	0.0975
	$β_{2}$	3.0167	0.0167	0.2023	2.9467	$- 0.0532$	0.1644
	$β_{49}$	2.0183	0.0183	0.1417	1.9816	$- 0.0184$	0.1255
	$β_{50}$	4.0315	0.0315	0.2657	3.9300	$- 0.0700$	0.2334
MCP (ϱ = 0.75)	$θ$	0.4886	$- 0.0114$	0.1258	0.4842	$- 0.0158$	0.1480
	$β_{1}$	0.9774	$- 0.0226$	0.1331	1.0071	0.0071	0.1369
	$β_{2}$	3.0252	0.0252	0.2020	2.9743	$- 0.0257$	0.2570
	$β_{49}$	2.0083	0.0083	0.1831	1.9857	$- 0.0143$	0.2203
	$β_{50}$	4.0134	0.0134	0.2606	4.0026	0.0026	0.3055
SCAD (ϱ = 0.75)	$θ$	0.4869	$- 0.0131$	0.1277	0.4839	$- 0.0161$	0.1341
	$β_{1}$	1.0119	0.0119	0.1364	1.0152	0.0152	0.1560
	$β_{2}$	2.9815	$- 0.0185$	0.2063	3.0035	0.0035	0.2422
	$β_{49}$	2.0012	0.0012	0.2238	1.9984	$- 0.0016$	0.2052
	$β_{50}$	4.0119	0.0119	0.2633	4.0021	0.0021	0.2855

Table 9. Illustration of survival years and censoring date for some patients from the ADNI dataset.

Patient ID	From CN to MCI
Patient ID	First Observation Date of State CN	Date of Transition to MCI	Last Date of Observation	$t_{ij}$ (yr)	$δ_{ij}$
011_S_0002	8 September 2005	26 September 2012	18 October 2017	7.05	1
011_S_0021	24 October 2005	-	27 November 2017	12.10	0
100_S_0035	8 November 2005	8 December 2010	8 December 2010	5.08	1
131_S_0123	7 February 2006	23 February 2012	10 February 2016	6.05	1
127_S_0259	28 March 2006	9 April 2014	22 September 2017	8.04	1
Patient ID	From MCI to Dementia
Patient ID	First Observation Date of State MCI	Date of Transition to Dementia	Last Date of Observation	$t_{ij}$ (yr)	$δ_{ij}$
011_S_0002	26 September 2012	-	18 October 2017	5.06	0
011_S_0021	-	-	-	-	-
100_S_0035	8 December 2010	-	8 December 2010	0.00	0
131_S_0123	23 February 2012	12 February 2014	10 February 2016	1.97	1
127_S_0259	9 April 2014	10 April 2015	22 September 2017	1.00	1

Table 10. Training results for Gamma Frailty Model.

	Profile MM		Non-Profile MM
Penalty	BIC	No. of Non-Zero $β$	BIC	No. of Non-Zero $β$
MCP	1154.70	70	1154.75	70
SCAD	1150.59	69	1150.82	69

Table 11. Training results for Inverse Gaussian Model.

	Profile MM		Non-Profile MM
Penalty	BIC	No. of Non-Zero $β$	BIC	No. of Non-Zero $β$
MCP	1198.76	27	1198.82	27
SCAD	1203.97	27	1204.50	27

Table 12. Training results for Log-normal Frailty Model.

	Profile MM		Non-Profile MM
Penalty	BIC	No. of Non-Zero $β$	BIC	No. of Non-Zero $β$
MCP	1149.10	71	1148.86	72
SCAD	1149.10	71	1148.86	71

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, X.; Xu, J.; Zhou, Y. Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics 2022, 10, 538. https://doi.org/10.3390/math10040538

AMA Style

Huang X, Xu J, Zhou Y. Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics. 2022; 10(4):538. https://doi.org/10.3390/math10040538

Chicago/Turabian Style

Huang, Xifen, Jinfeng Xu, and Yunpeng Zhou. 2022. "Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data" Mathematics 10, no. 4: 538. https://doi.org/10.3390/math10040538

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data

Abstract

1. Introduction

2. The Model and Estimation

2.1. The Model

2.2. An Overview of MM Principle

2.3. Profile MM Estimation Procedure

2.4. Non-Profile MM Estimation Procedure

3. Regularized Estimation Methods via MM Methods

Model Selection

4. Theoretical Properties

5. Numerical Examples

6. Real Data Analysis

7. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI