Exploring Complex Survival Data through Frailty Modeling and Regularization

Huang, Xifen; Xu, Jinfeng; Zhou, Yunpeng

doi:10.3390/math11214440

Open AccessArticle

Exploring Complex Survival Data through Frailty Modeling and Regularization

by

Xifen Huang

¹,

Jinfeng Xu

^1,*

and

Yunpeng Zhou

²

¹

School of Mathematics, Yunnan Normal University, Kunming 650092, China

²

Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(21), 4440; https://doi.org/10.3390/math11214440

Submission received: 22 September 2023 / Revised: 22 October 2023 / Accepted: 24 October 2023 / Published: 26 October 2023

(This article belongs to the Special Issue Mathematical and Computational Statistics and Applications)

Download Versions Notes

Abstract

:

This study addresses the analysis of complex multivariate survival data, where each individual may experience multiple events and a wide range of relevant covariates are available. We propose an advanced modeling approach that extends the classical shared frailty framework to account for within-subject dependence. Our model incorporates a flexible frailty distribution, encompassing well-known distributions, such as gamma, log-normal, and inverse Gaussian. To ensure accurate estimation and effective model selection, we utilize innovative regularization techniques. The proposed methodology exhibits desirable theoretical properties and has been validated through comprehensive simulation studies. Additionally, we apply the approach to real-world data from the Medical Information Mart for Intensive Care (MIMIC-III) dataset, demonstrating its practical utility in analyzing complex survival data structures.

Keywords:

complex data; multivariate survival data; inverse Gaussian; log normal; general frailty model

MSC:

46N30; 62N01; 62N02

1. Introduction

In recent years, advancements in biomedicine, genomics, epidemiology, image processing, and other fields have led to a surge in high-dimensional data analysis emerging as a prominent theme in statistics. A notable example is the MIMIC-III (Medical Information Mart for Intensive Care) study, which aims to identify relevant variables for modeling and predicting two distinct periods: from ICU admission to discharge and from discharge to death. This study has generated high-dimensional, correlated multivariate failure time data which is commonly analyzed by the shared frailty models [1]. Specifically, the Cox model or relative risk model [2] with frailty is employed to incorporate dependence and assess the effects of covariates [3].

However, the observed likelihood of the Cox model with shared frailty often lacks an analytic form, except for the gamma frailty model. Consequently, parameter estimation of general frailty models is always computationally challenging. To perform variable selection for frailty models, Ref. [4] initially proposed a theory based on SCAD penalized regression. Ref. [5] developed an algorithm that utilizes Laplace transformation to handle frailty distributions in their general form. Ref. [6] applied the Laplace approximation of the full likelihood and developed an R package to implement their method. However, the methods mentioned above employ the Newton–Raphson method in their algorithms, resulting in significant computational burden when dealing with high-dimensional covariates. To avoid high-dimensional matrix inversion in the Newton–Raphson method, we try to use the MM (minorize-maximization) principle, to obtain the nonparametric maximum likelihood estimation for general frailty models, as the objective likelihood derived from the MM algorithm exhibits a monotonically increasing trend with reliable convergence properties when initialized appropriately [7].

In general, we present an MM algorithm for the Cox model with general frailties in this paper, highlighting its applicability in high-dimensional scenarios. We introduce a regularized estimation method, in which our algorithm decomposes the objective function with high-dimensional parameters into separable functions with low-dimensional parameters. This approach seamlessly integrates into the analysis of multivariate failure time data, effectively handling the challenges posed by its high-dimensional nature. LASSO is a commonly used penalty, but it can introduce notable biases in the resulting estimator. Therefore, we employ concave penalties such as SCAD [8] and MCP to conduct variable selection that gives consistent estimates.

The structure of the paper is as follows: In Section 2, we introduce the formulation of the Cox model with general frailties for high-dimensional multivariate failure time data. Section 3 presents our proposed MM algorithm for estimating model parameters. In Section 4, we introduce a regularized estimation method using the profile MM algorithm, leveraging sparse regression assumptions with high-dimensional parameters. We establish the convergence properties in Section 5. To assess the finite sample performances, we conducted simulation studies as described in Section 6. In Section 7, we illustrate the proposed methods, using the aforementioned MIMIC-III dataset. Finally, we provide further discussion in Section 8.

2. Data and Model Formulation

Assume there are J different types of events and there are n patients in the study. Denote the event time for the i-th studied subject

(i = 1, \dots, n)

and its j-th event type

(j = 1, \dots, J)

by

T_{i j}

, and the corresponding censoring time is denoted by

C_{i j}

. We use

t_{i j}

to denote the event time and let

I_{i j}

be the censoring indicator (

I_{i j} = 0

for right-censored observation;

I_{i j} = 1

otherwise) and

X_{i j}^{⊤} = (X_{i j 1}, \dots, X_{i j p})

be the high-dimensional covariates. The observed data consist of

Y_{o b s} = {t_{i j}, I_{i j}, X_{i j}}

. Given a noninformative censoring assumption, the Cox model with generality frailties assume that, conditional on the subject-specific frailties

ξ_{i}, i = 1, \dots, n

,

T_{i j}, j = 1, \dots, J

are independent of one other and the conditional hazard function of

T_{i j}

given

ξ_{i}

and

X_{i j}

takes the form

\begin{matrix} λ_{j} (t | X_{i j}, ξ_{i}) = ξ_{i} λ_{0 j} (t) exp {X_{i j}^{⊤} β}, \end{matrix}

(1)

where

ξ_{i}, i = 1, \dots, n,

are the individual-level frailty variables with density function

f (ξ_{i} | γ)

, which are independently and identically distributed. The baseline hazard of the j-th event is denoted by

λ_{0 j} (\cdot)

, and

β

is a parameter vector. We denote

Λ_{0} = (Λ_{01}, \dots, Λ_{0 J})

as the cumulative hazard functions of all events. The three components make up the model parameters, i.e.,

γ, β

, and the nonparametric components

Λ_{0}

. To simplify the wording, we define

θ = (γ, β, Λ_{0})

. The observed data likelihood function of model (1) can be written as

\begin{matrix} L (θ | Y_{o b s}) \\ = & \prod_{i = 1}^{n} \int_{W} f (ξ_{i} | γ) \prod_{j = 1}^{J} {[λ_{0 j} (t_{i j}) ξ_{i} exp (X_{i j}^{⊤} β)]}^{I_{i j}} exp [- Λ_{0 j} (t_{i j}) ξ_{i} exp (X_{i j}^{⊤} β)] d ξ_{i} . \end{matrix}

(2)

Generally, there is no closed-form solution of the frailty distribution’s Laplace transform. Hence, we cannot find the explicit form of the marginal hazard in (2). In the following, we apply the MM principle, to deal with such an intractable case and to separate parameters when the model parameters are especially large, for fast and accurate estimation of model parameters. The MM principle is presented as follows: the minorization step first constructs a surrogate function

Q (θ | θ^{(k)})

, which satisfies

Q (θ | θ^{(k)}) ⩽ L (θ | Y_{o b s}), \forall θ, θ^{(k)} \in Θ a n d Q (θ^{(k)} | θ^{(k)}) = L (θ^{(k)} | Y_{o b s}),

where

θ^{(k)}

denotes the current estimate of

\hat{θ}

in the k-th iteration. Function

Q (\cdot | θ^{(k)})

always lies under

L (\cdot)

and is tangent to it at the point

θ = θ^{(k)}

. The maximization step then updates

θ^{(k)}

by

θ^{(k + 1)}

, which maximizes the surrogate function

Q (\cdot | θ^{(k)})

instead of

L (\cdot)

:

\begin{matrix} L (θ^{(k + 1)} | Y_{o b s}) ⩾ Q (θ^{(k + 1)} | θ^{(k)}) ⩾ Q (θ^{(k)} | θ^{(k)}) = L (θ^{(k)} | Y_{o b s}) . \end{matrix}

3. Likelihood Function and the Proposed Methodologies

From (2), we then formulate the log-likelihood function as

\begin{matrix} ℓ (θ | Y_{o b s}) = \sum_{i = 1}^{n} log \int_{W} τ_{i} (ξ_{i} | θ) d ξ_{i}, \end{matrix}

(3)

where

\begin{matrix} τ_{i} (ξ_{i} | θ) = f (ξ_{i} | γ) \prod_{j = 1}^{J} \{{[λ_{0 j} (t_{i j}) ξ_{i} exp (X_{i j}^{⊤} β)]}^{I_{i j}} exp [- Λ_{0 j} (t_{i j}) ξ_{i} exp (X_{i j}^{⊤} β)]\} . \end{matrix}

(4)

Given the following weight function,

v_{i} (ξ_{i} | θ^{(k)}) = \frac{τ_{i} (ξ_{i} | θ^{(k)})}{\int_{W} τ_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}},

then we can rewrite the objective function,

\begin{matrix} ℓ (θ | Y_{o b s}) = \sum_{i = 1}^{n} log [\int_{W} \frac{τ_{i} (ξ_{i} | θ)}{v_{i} (ξ_{i} | θ^{(k)})} \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}] . \end{matrix}

(5)

According to the measure-theoretic form of Jensen’s inequality,

Ψ [\int_{X} u (x) \cdot g (x) d x] ⩾ \int_{X} Ψ [u (x)] \cdot g (x) d x,

(6)

where

X \subset R

,

Ψ (\cdot)

is concave,

g (\cdot)

is the corresponding density function on real line

X

and

u (\cdot)

is an arbitrary function on real line

X

. Then, applying this inequality (6) to Equation (5), we let

v_{i} (ξ_{i} | θ^{(k)})

be the corresponding density function

g (\cdot)

, and the

u (\cdot)

is defined as

τ_{i} (ξ_{i} | θ) / v_{i} (ξ_{i} | θ^{(k)})

.

Ψ (\cdot)

takes the concave log function. Thus, we have,

\begin{matrix} ℓ (θ | Y_{o b s}) & \geq \sum_{i = 1}^{n} \int_{W} log [\frac{τ_{i} (ξ_{i} | θ)}{v_{i} (ξ_{i} | θ^{(k)})}] \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i} \\ \geq \sum_{i = 1}^{n} \int_{W} log τ_{i} (ξ_{i} | θ) \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i} + C_{1}, \end{matrix}

where

C_{1}

is a constant. Then, substituting

τ_{i} (ξ_{i} | θ)

, using Equation (4), we can construct the following surrogate function for

ℓ (θ | Y_{o b s})

:

\begin{matrix} Q_{1} (θ | θ^{(k)}) = Q_{11} (γ | θ^{(k)}) + Q_{12} (β, Λ_{0} | θ^{(k)}), \end{matrix}

(7)

where

\begin{matrix} Q_{11} (γ | θ^{(k)}) = \sum_{i = 1}^{n} \int_{W} log [f (ξ_{i} | γ)] \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}, \end{matrix}

(8)

with parameter

γ

. And

\begin{matrix} Q_{12} (β, Λ_{0} | θ^{(k)}) \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{J} [I_{i j} log (λ_{0 j} (t_{i j})) + I_{i j} X_{i j}^{⊤} β - A_{i}^{(k)} Λ_{0 j} (t_{i j}) exp (X_{i j}^{⊤} β)], \end{matrix}

(9)

where

A_{i}^{(k)} = \int_{W} ξ_{i} \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}, i = 1, \dots, n .

Therefore,

Q_{1} (θ | θ^{(k)})

successfully separates the parameters

γ

,

β

, and

Λ_{0}

into (8) and (9), correspondingly. Then, in the second M-step, the

γ

will be updated by maximizing (8) numerically. However, the updating of

(β, Λ_{0})

by maximizing (9) is challenging, due to the presence of the nonparametric components

Λ_{0}

. Therefore, to tackle this issue, following [9], we utilize their profile estimation approach in (9), to profile out

Λ_{0}

given

β

, which results in the estimate of

Λ_{0}

as

\begin{matrix} d {\hat{Λ}}_{0 j} (t_{i j}) = \frac{I_{i j}}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)}, j = 1, \dots, J . \end{matrix}

(10)

By substituting (10) into (9), we furthermore have

\begin{matrix} \sum_{i = 1}^{n} \sum_{j = 1}^{J} [I_{i j} log (λ_{0 j} (t_{i j})) + I_{i j} X_{i j}^{⊤} β - A_{i}^{(k)} Λ_{0 j} (t_{i j}) exp (X_{i j}^{⊤} β)] \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{J} [I_{i j} log (\frac{I_{i j}}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)})) + I_{i j} X_{i j}^{⊤} β \\ - \sum_{s = 1}^{n} \frac{I (t_{s j} \leq t_{i j})}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{s j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)} A_{i}^{(k)} exp (X_{i j}^{⊤} β)] \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{J} {I_{i j} X_{i j}^{⊤} β - I_{i j} log [\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)] \\ + I_{i j} log I_{i j}} - \sum_{j = 1}^{J} \sum_{s = 1}^{n} \sum_{i = 1}^{n} \frac{I (t_{s j} \leq t_{i j}) A_{i}^{(k)} exp (X_{i j}^{⊤} β)}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{s j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)} \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{J} \{I_{i j} X_{i j}^{⊤} β - I_{i j} log [\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)]\} + C_{2}, \end{matrix}

where

C_{2}

is a constant. Therefore,

\begin{matrix} Q_{13} (β | θ^{(k)}) = \sum_{i = 1}^{n} \sum_{j = 1}^{J} \{I_{i j} X_{i j}^{⊤} β - I_{i j} log [\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)]\}, \end{matrix}

(11)

which includes parameter

β

only. It is obvious that

Q_{13} (β | θ^{(k)})

is of the same form as the Cox model’s log partial likelihood, where the modified Cox regression technique can be applied to obtain the updated estimates of

β

, but this procedure will apply Newton’s method, which involves matrix inversion, which is computationally inefficient with large number of covariates. As the MM algorithm helps to separate the estimation of parameters, here we treat (11) as a new objective function and construct the minorizing function by reformulating the high-dimensional parameter estimation problem into a collection of low-dimensional optimizations. Here, we use the hyperplane inequality, which is generally applied in the MM algorithm [10]:

\begin{matrix} - log (x) ⩾ - log (x_{0}) - \frac{x - x_{0}}{x_{0}} . \end{matrix}

(12)

In inequality (12), we take

x = \sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β), x_{0} = \sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)}) .

Then,

\begin{matrix} Q_{13} (β | θ^{(k)}) \geq & \sum_{i = 1}^{n} \sum_{j = 1}^{J} {I_{i j} X_{i j}^{⊤} β - I_{i j} (log [\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)})] \\ - \frac{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β) - \sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)})}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)})})} \\ \geq & \sum_{i = 1}^{n} \sum_{j = 1}^{J} [I_{i j} X_{i j}^{⊤} β - \frac{I_{i j} \sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)})}] + C_{3}, \end{matrix}

with constant

C_{3}

. Then we have the following surrogate function:

\begin{matrix} Q_{14} (β | θ^{(k)}) = \sum_{i = 1}^{n} \sum_{j = 1}^{J} [I_{i j} X_{i j}^{⊤} β - \frac{I_{i j} \sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β)}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)})}] . \end{matrix}

(13)

For the finite form of Jensen’s inequality:

Ψ [\sum x_{i} \cdot ω_{i}] ⩾ \sum Ψ [x_{i}] ω_{i},

(14)

where

ω_{i}

are positive weights with

\sum ω_{i} = 1

and

Ψ (\cdot)

is concave. As in [11], the concave function

- exp (\cdot)

can then adopt inequality (14), and a part of

Q_{14} (β | θ^{(k)})

can be rewritten as

X_{r j}^{⊤} β = \sum_{p = 1}^{q} δ_{p r j} [δ_{p r j}^{- 1} X_{p r j} (β_{p} - β_{p}^{(k)}) + X_{r j}^{⊤} β^{(k)}],

where

δ_{p r j} = | X_{p r j} | / \sum_{p = 1}^{q} | X_{p r j} |

is the weight in Jensen’s inequality. Thus,

\begin{matrix} - exp (X_{r j}^{⊤} β) \geq - \sum_{p = 1}^{q} δ_{p r j} exp [δ_{p r j}^{- 1} X_{p r j} (β_{p} - β_{p}^{(k)}) + X_{r j}^{⊤} β^{(k)}] . \end{matrix}

(15)

Then, substituting (15) into

Q_{14} (β | θ^{(k)})

, the minorizing function can be obtained as

\begin{matrix} Q_{15} (β_{1}, \dots, β_{q} | θ^{(k)}) \hat{=} \sum_{p = 1}^{q} Q_{15 p} (β_{p} | θ^{(k)}), \end{matrix}

(16)

where

\begin{matrix} Q_{15 p} (β_{p} | θ^{(k)}) = \sum_{i = 1}^{n} \sum_{j = 1}^{J} {I_{i j} X_{p i j} β_{p} \\ - \frac{I_{i j} \sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} δ_{p r j} exp [δ_{p r j}^{- 1} X_{p r j} (β_{p} - β_{p}^{(k)}) + X_{r j}^{⊤} β^{(k)}]}{\sum_{r = 1}^{n} I (t_{r j} ⩾ t_{i j}) A_{r}^{(k)} exp (X_{r j}^{⊤} β^{(k)})}}, \end{matrix}

(17)

for

p = 1, \dots, q

. In the first M-step of the profile MM algorithm, the final minorizing function using the profiled method for the objective log-likelihood

ℓ (θ | Y_{o b s})

is

\begin{matrix} Q_{p r o} (γ, β | θ^{(k)}) = Q_{11} (γ | θ^{(k)}) + \sum_{p = 1}^{q} Q_{15 p} (β_{p} | θ^{(k)}), \end{matrix}

(18)

with the update of

d Λ_{0}

by (10). From (18), maximizing the original objective function can be transferred into the maximization for a collection of

q + 1

univariate functions, as the

γ

from

Q_{11} (γ | θ^{(k)})

is one-dimensional in most cases. Thus, the next M-step is conducted by optimizing

q + 1

univariate objective functions separately without the inefficient matrix inversion. We can see that there exist two integrals,

\int_{W} ξ_{i} \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}

and

\int_{W} log [f (ξ_{i} | γ)] \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}

. While our model is designed to address general frailty forms, the improper integrals presented above may not be computationally retrievable in extremely sparse and high-dimensional cases. However, as long as these integrals can be numerically calculated, our MM algorithm will always converge. Then, we propose the following estimation procedure.

4. The Algorithms

The parameter-separated surrogate functions proposed in Section 3 can cope well with sparsity-inducing penalties like SCAD or MCP. Therefore, in this section, the regularized estimation method is proposed, using the MM principle, as discussed in Section 3. Many variable selection penalties are special cases from the general form given in [8], and the likelihood function incorporated with the penalty term is written as

\begin{matrix} ℓ^{P} (θ | Y_{o b s}) = ℓ (γ, β, Λ_{0} | Y_{o b s}) - n \sum_{p = 1}^{q} P (| β_{p} |, λ), \end{matrix}

(19)

where

ℓ (γ, β, Λ_{0} | Y_{o b s})

is discussed in the previous section with q-dimensional

β

.

λ

is a non-negative tuning parameter that can also be set as

λ_{p}

in more general cases. The variable selection can be realized by shrinking some of the coefficients to zero, using a given penalty. With general frailties, the computation of MLEs is rather complicated, as the parameters involve three parts,

γ, β, Λ_{0}

, and it is more challenging when the number of parameters is high-dimensional, which is indeed our case. From the discussion in the previous section, our proposed profile MM algorithm separates all these parameters, which leads to an efficient and accurate estimation. This nice property of our proposed profile MM algorithm can mesh well with the different kinds of regularization penalties in (19), to produce efficient and accurate sparse estimation. Using the same profile MM strategies as in (18), as

Q_{p r o} (γ, β | θ^{(k)})

is the surrogate function for

ℓ (γ, β, Λ_{0} | Y_{o b s})

, the corresponding minorization function for

ℓ^{P} (θ | Y_{o b s})

can be expressed as

\begin{matrix} Q_{p r o} (γ, β | θ^{(k)}) - n \sum_{p = 1}^{q} P (| β_{p} |, λ), \end{matrix}

(20)

where

P (\cdot, λ)

is a concave function that is nondecreasing and piecewise differentiable, defined on

(0, \infty)

, Ref. [12] proposed an approximation approach for the penalty term, using quadratic functions:

\begin{matrix} - P (| β_{p} |, λ) ⩾ - P (| β_{p}^{(k)} |, λ) - \frac{[β_{p}^{2} - {(β_{p}^{(k)})}^{2}] P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |}, \end{matrix}

(21)

combining function (21) with (18):

\begin{matrix} Q_{p r o} (γ, β | θ^{(k)}) - n \sum_{p = 1}^{q} P (| β_{p} |, λ) \\ \geq & Q_{p r o} (γ, β | θ^{(k)}) - n \sum_{p = 1}^{q} (P (| β_{p}^{(k)} |, λ) + \frac{β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |}) \\ \geq & Q_{11} (γ | θ^{(k)}) + \sum_{p = 1}^{q} [Q_{15 p} (β_{p} | θ^{(k)}) - \frac{n β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |}] + C_{4} . \end{matrix}

Therefore, the final surrogate function based on the MM algorithm for the penalized log-likelihood (19) is written as

\begin{matrix} Q_{p r o}^{P} (γ, β | θ^{(k)}, λ) & = & Q_{11} (γ | θ^{(k)}) + \sum_{p = 1}^{q} [Q_{15 p} (β_{p} | θ^{(k)}) - \frac{n β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |}] . \end{matrix}

(22)

Equation (22) decomposes the original maximizing function (19) to a sum of univariate functions that is more computationally efficient. Moreover, off-the-shelf accelerators can be applied, to improve the efficiency of the optimization problems. Then, we propose an alternative estimation procedure.

From the literature, there are different criteria for the selection of tuning parameter

λ

, such as the BIC (Bayesian information criterion [13]) and the GCV (generalized cross-validation [14]). In this paper, a BIC-type criterion is applied, which is defined by

\begin{matrix} {B I C}_{λ} = - 2 ℓ (\hat{θ}) + G_{n} (\hat{S} + 1) log (n), \end{matrix}

(23)

where

G_{n} = max {1, log [log (q + 1)]}

with q-dimensional

β

.

\hat{S}

is the degrees of freedom, and it denotes the number of estimates from

\hat{β}

with nonzero values. To determine the optimal

λ

, we use a similar method to the one commonly used in the R package “glmnet” [15]. To determine the appropriate range of

λ

, we first conduct a search from

R^{+}

, using the BIC criteria. Next, we select the optimal

λ

, using a grid search within this determined range.

5. Theoretical Properties

The convergence properties are established in this section, for both the profile MM algorithm and the regularized MM algorithm. Let

Q (θ | θ^{(k)})

be the minoring function based on the original objective

ℓ (\cdot | Y_{o b s})

, where

θ

denotes the parameters and

θ^{(k)}

is the k-th iteration’s estimation. The general convergence of the MM algorithm is provided by [16], as stated in Lemma 1 below. We denote

M (θ)

to be the maximizer of

Q (\cdot | θ)

, given the following regularity conditions:

I.: The set $S$ of parameter $θ$ is open $R^{d}$ .
II.: The objective function $ℓ (\cdot | Y_{o b s})$ is continuously differentiable.
III.: The set defined as $S_{c} = {θ \in S : ℓ (θ | Y_{o b s}) \geq c}$ is compact in $R^{d}$ .
IV.: The surrogate function $Q (θ | θ^{(k)})$ is continuously differentiable in $θ$ and continuous in $θ^{(k)}$ .
V.: The objective function $ℓ (\cdot | Y_{o b s})$ ’s stationary points are isolated.
VI.: For the surrogate function $Q (\cdot | θ^{(k)})$ , there exists a unique global maximum.

Lemma 1.

Let

θ^{(k)}, k \in N

denote a sequence from the MM algorithm:

(i): $M (θ)$ is continuous at $θ^{(k)}$ if VI is satisfied.
(ii): If I–VI are satisfied, for any initial value $θ^{(0)}$ , when $k \to \infty$ , $θ^{(k)}$ tends to stationary point $θ^{*}$ . Moreover, $M (θ^{*})$ = $θ^{*}$ , and the likelihood $ℓ (θ^{(k)} | Y_{o b s})$ sequence strictly increases to $ℓ (θ^{*} | Y_{o b s})$ if $θ^{(k)} \neq θ^{*}$ for all k.

Therefore, based on this Lemma, given Condition 1 below, we have the convergence of our MM algorithm.

Condition 1.

Conditions for the convergence of the profile MM algorithm:

(a): $ℓ (\cdot | Y_{o b s})$ is continuously differentiable.
(b): γ and $β$ are compact, given $ℓ (θ | Y_{o b s}) \geq c$ .
(c): Stationary points for $ℓ (θ | Y_{o b s})$ are isolated.

Theorem 1.

If Condition 1 holds, for any initial value of

θ^{(0)} = {γ^{(0)}, β^{(0)}, Λ_{0}^{(0)}}

, the profile MM algorithm (Algorithm 1) converges to

θ^{(*)}

.

Algorithm 1 Estimating Procedure.

S1. Provide initial values for parameters γ,

β

, and

Λ_{0}

.
S2. Update the estimation of parameter γ, using (8).
S3. Update the estimation of other parameters of covariates

β_{p}

, using (17) for

p = 1, \dots, q

.
S4. Compute the estimation of

Λ_{0 j} (t_{i j})

, using (10) with the previous estimated

β

from S3.
S5. Conduct iterations from S2 to S4 repeatedly till convergence.

Proof.

\begin{matrix} ℓ (θ | Y_{o b s}) & = \sum_{i = 1}^{n} log [\int_{W} \frac{τ_{i} (ξ_{i} | θ)}{v_{i} (ξ_{i} | θ^{(k)})} \cdot v_{i} (ξ_{i} | θ^{(k)}) d ξ_{i}] \\ \geq \sum_{i = 1}^{n} \int_{W} log [\frac{τ_{i} (ξ_{i} | θ)}{v_{i} (ξ_{i} | θ^{(k)})}] \cdot v_{i} (ω_{i} | θ^{(k)}) d ξ_{i} \\ = Q_{1} (θ | θ^{(k)}) . \end{matrix}

Therefore, the initial minorizing function

Q_{1} (θ | θ^{(k)})

satisfies the condition that

\begin{matrix} ℓ (θ | Y_{o b s}) \geq Q_{1} (θ | θ^{(k)}), ℓ (θ^{(k)} | Y_{o b s}) = Q_{1} (θ^{(k)} | θ^{(k)}) . \end{matrix}

Then, after profiling out

Λ_{0}

,

Q_{p r o} (γ, β | θ^{(k)}) = Q_{11} (γ | θ^{(k)}) + \sum_{p = 1}^{q} Q_{15 p} (β_{p} | θ^{(k)})

from (18),

Q_{11} (γ | θ^{(k)})

has a unique global maximum and

Q_{15 p} (β_{p} | θ^{(k)})

is a unimodal function with a unique global maximum that verifies Condition VI. Condition I is easily followed from the form of

ℓ (\cdot | Y_{o b s})

. Condition II follows by Condition 1(a) and, hence, IV is also satisfied. Also, Condition V is followed by Condition 1(c). Then, we verify Condition III. It follows from the continuity of

ℓ (θ | Y_{o b s})

that

Ω_{c}

is closed. If

d Λ_{0 j} (t_{i j})

is unbounded,

ℓ (θ | Y_{o b s}) \to - \infty

. By contrast,

d Λ_{0 j} (t_{i j})

is bounded when

ℓ (θ | Y_{o b s}) \geq c

. Combined with Condition 1(b), III is satisfied. Thus, by Lemma 1, the profiled MM algorithm is convergent. □

Theorem 2.

If Condition 1 holds, for any initial value of

θ^{(0)} = {γ^{(0)}, β^{(0)}, Λ_{0}^{(0)}}

, the regularized profile MM algorithm (Algorithm 2) converges to

θ^{(*)}

.

Algorithm 2 An alternative method.

S1. Provide initial values for parameters γ,

β

, and

Λ_{0}

.
S2. Update the estimation of parameter γ, using (8).
S3. Under the profile MM method,

β

is updated by maximizing the equation

\sum_{p = 1}^{q} [Q_{15 p} (β_{p} | θ^{(k)}) - n β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ) / 2 | β_{p}^{(k)} |] .

S4. Compute the estimates of

Λ_{0 j} (t_{i j})

, using (10) with estimated

β

in S3.
S5. Conduct iterations from S2 to S4 repeatedly till convergence.

Proof.

Note that, after profiling out

Λ_{0}

, the surrogate function consists of

Q_{11} (γ | θ^{(k)})

and

Q_{15 p} (β_{p} | θ^{(k)}) - \frac{n β_{p}^{2} \cdot P^{'} (| β_{p}^{(k)} |_{+}, λ)}{2 | β_{p}^{(k)} |}

with the unique global maximum, which verifies condition VI. For the other conditions, I–V, the proof is similar to Theorem 1. Thus, by Lemma 1, the regularized profiled MM algorithm is convergent. □

6. Simulation Study

Example 1.

We conducted simulations for the frailty models stated below:

\begin{matrix} λ_{j} (t | X_{i j}, ξ_{i}) = ξ_{i} λ_{0 j} (t) exp {X_{i j}^{⊤} β}, ξ_{i} \sim \{\begin{matrix} l o g - n o r m a l (0, γ), & γ = 1 / 4, \\ i n v e r s e G a u s s i a n (γ, γ^{2}), & γ = 1, \\ g a m m a (1 / γ, 1 / γ), & γ = 2, \end{matrix} \end{matrix}

with two events,

J = 2

, a sample size

n = 300

, and

λ_{01} (t) = 3, λ_{02} (t) = 5 / (1 + 5 t)

. We set the true value

β

as

(- 2_{6}^{⊤}, - 1_{6}^{⊤}, 1_{6}^{⊤}, 2_{6}^{⊤}, 3_{6}^{⊤})

with dimension

q = 30

. Covariates

X_{i}

’s were generated independently from a uniform distribution

(0, 0.5)

. The generation process ensured that the censoring rate fell between

25 %

and

35 %

.

We tested the performance of our proposed profile MM algorithms, using the three different frailty models provided above. For computational issue, our algorithm separated the estimation of all parameters into univariate objectives, making it easily adaptable to off-the-shelf accelerators. Utilizing simple off-the-shelf accelerators [17], we can speed up our MM algorithms. As described in [17], there are several types of accelerators that utilize quasi-Newton approximation. For our article, we chose to adopt the squared iterative (SqS1) approach. The simulation results are presented in Table 1, where (MLE) denotes the average estimation of regression and frailty parameters, and (Bias) denotes their biases. (SD) is their estimated standard deviations, estimated using empirical standard deviation (for

β_{i},

the estimated standard deviation was

\frac{1}{200}

\sum_{k = 1}^{200} ({\hat{β}}_{i}^{(k)} - \bar{β_{i}})

where

{\hat{β}}_{i}^{(k)}

was the estimation of

β_{i}

in

k^{t h}

replication), (K) is the number of iterations until convergence, (T) is the total run time, in terms of seconds, and the final value for the objective function is denoted by (L). The results indicate that our algorithms exhibited fast convergence and performed well across all three frailty models. Our algorithm also demonstrated good estimation accuracy, as evidenced by the nearly unbiased results and the high significance of the estimates of

β

, which was observed from their estimated standard deviations in all cases.

Example 2.

In this example, we conducted a simulation for multivariate frailty models:

\begin{matrix} λ_{j} (t | X_{i j}, ξ_{i}) = ξ_{i} λ_{0 j} (t) exp {X_{i j}^{⊤} β}, ξ_{i} \sim \{\begin{matrix} g a m m a (1 / γ, 1 / γ), & γ = 2, \\ l o g - n o r m a l (0, γ), & γ = 0.5, \\ i n v e r s e G a u s s i a n (γ, γ^{2}), & γ = 1, \end{matrix} \end{matrix}

for

i = 1, \dots, 400, j = 1, 2

, with

λ_{01} (t) = 3, λ_{02} (t) = 5 / (1 + 5 t)

. Here, we set the true coefficient vector

β

as

{(2, 3, 4, 0_{27}^{⊤})}^{⊤}

with dimension

q = 30

. We generated the covariates

X^{⊤} = (X_{1}, \dots, X_{q})

following multivariate normal distribution. The mean was equal to zero and the covariance was

Σ = {ϱ^{| r - s |}}_{r s}

for

r, s = 1, \dots, q

with a first-order autoregressive form, where

ϱ = 0.2

. Similar to Example 1, there existed censored observation while the censoring proportion was set to be between

5 %

to

15 %

.

In the following simulation study, we tested the correctness of the variable selection of our proposed regularized profile MM methods in a high-dimensional regression model, using MCP and SCAD penalties with high parameter sparsity (19). The first derivative of the penalty function for SCAD was

P^{'} (| β |, λ) = λ {I (| β | \leq λ) + \frac{{(a λ - | β |)}_{+}}{(a - 1) λ} I (| β | > λ)},

and for MCP it was

P^{'} (| β |, λ) = sign (β) (λ - \frac{| β |}{a})

. According to [4,18], they suggested taking

a = 3.7

for SCAD and

a = 3

for MCP. To assess the performance of the model selection method, we conducted 200 replications. Based on these replications, we calculated the probability of obtaining the true model, where all covariates with non-zero parameters were estimated to have non-zero values, while covariates with zero parameters were not selected. This measure provided an evaluation of how well the method performed in correctly selecting the model that accurately represented the underlying data. We counted the number of correctly identified zero coefficients and the non-zero parameters that were estimated to be zero in each simulation run, and the average is shown in Table 2. The column titled “Correct” denotes the average count of true zero coefficients with zero estimated value, and the column titled “Incorrect” represents the average count of parameters where the true value was not zero but had an estimated zero value. We can observe that it consistently provided correct variable selection results across three different frailty models. Furthermore, we report the average parameter estimates and their bias and the corresponding mean squared error with 200 replications in Table 3. We observe that our proposed MM profile algorithms effectively handled penalties such as MCP and SCAD, producing accurate estimation results for various frailties. In particular, the inverse Gaussian frailty model with the SCAD penalty demonstrated excellent performance. Moreover, we also found that the estimates from the SCAD penalty were consistently more accurate than the results from the MCP penalty. The results from Table 2 and Table 3 highlight the efficacy of our approach in dealing with different penalty functions and its ability to yield reliable estimation results for various types of frailty models.

7. Real Data Application

MIMIC-III [19] is a dataset that includes information about patients who have been admitted to a critical care unit for various diseases. In this study, we specifically focused on a subset of 8516 patients who were admitted due to respiratory disease. The dataset included 82 covariates, which consisted of clinical variables such as mean blood pressure, heart rate, height, and other features, like whether the patient was transferred from another hospital. Two events were considered in our analysis. The first event was the time from admission to discharge from the ICU, and the second event was the time from discharge to death.

Three frailty models—namely gamma, log-normal, and inverse Gaussian—were fitted, using the regularized profile MM estimation method, with both MCP and SCAD penalties. The results are given in Table 4. The result obtained using MCP and SCAD penalties were similar, as they both selected the same set of significant variables, using the BIC criterion. For both the log-normal and the inverse Gaussian models, a total of 11 significant covariates were selected. These covariates were primarily clinical variables, as indicated by Table 5 and Table 6. Among the 11 clinical variables that were considered significant, more than half of them were numerical variables, such as height. These numerical variables were considered to be more relevant to the two events being studied. On the other hand, other types of variables, such as categorical variables, may have had less of a direct relationship with the events. In addition to the clinical variables, it was observed that patients with booked admission (ELECTIVE) had a significant positive impact on the two events being studied. This suggests that patients who have planned admissions may have better outcomes compared to those who are admitted in emergency situations. On the other hand, variables related to patients’ personal information, such as marital status, had smaller effects on the two events. This indicates that these personal factors may have had less influence on the outcomes being studied. For the Gamma frailty model, as presented by Table 7, more significant covariates were found: for example, the ethnicity was considered to be significant. It is reasonable that different frailty models will generate different estimates. It is worth noting that the Gamma frailty model selected too many sparse features, including ethnicity, which may not have been preferable in this particular case. However, regardless of the choice of frailty, there were seven key features that were consistently selected by all models. This suggests that our model can consistently select those features which play a vital role in reflecting the length of a patient’s stay in the ICU.

8. Discussion

We have developed an innovative algorithm for parameter estimation in the analysis of complex multivariate failure time data with high-dimensional covariates. This type of data presents challenges, due to its intricate nature and the presence of multiple correlated survival outcomes. Estimating frailty models, which incorporate random effects to capture unobserved heterogeneity, typically requires nonparametric maximum likelihood estimation, due to the unknown baseline hazard function. Most existing researches apply the gamma frailty model because it provides an explicit formula for parameter estimation in each iteration, reducing the computational burden. By contrast, our method offers the advantage of being applicable to general frailty models and providing efficient estimation results, even in high-dimensional cases. This is achieved through the incorporation of regularization methods.

Our proposed algorithm efficiently addresses these challenges by accurately estimating the high-dimensional parameters. Our method avoids the Laplace approximation and the traditional Newton–Raphson method, which can result in inaccurate and inefficient estimation when dealing with high-dimensional data. A significant contribution of our method is the utilization of a decomposition approach, which splits the minorizing function into a collection of univariate functions. This approach improves computational efficiency and ensures robust parameter estimation. In high-dimensional cases, the traditional EM algorithm for parameter estimation of frailty models can lead to significant computational costs. In practical applications involving various clinical and genetic covariates, our algorithm enables the estimation procedure to be completed within a reasonable timeframe, thus facilitating effective identification of key features.

Importantly, our algorithm can be applied to various settings and complexities of multivariate survival data without relying on specific regularization techniques. In addition to the multiple event data modeled in this article, our algorithm can be further extended, to handle data with different structures of correlation, such as recurrent event data and clustered survival data with general frailties. The package “frailtyMMpen” can be downloaded from CRAN, providing convenient support for setting up various types of survival data and regularization methods in the model. By leveraging this algorithm, researchers and practitioners can effectively analyze and gain insights from real-world applications of multivariate survival data. Understanding the relationships between covariates and survival outcomes is crucial for informed decision-making and predictive modeling in these scenarios.

Author Contributions

Data curation, X.H. and Y.Z.; Formal analysis, Y.Z. and X.H.; Investigation, X.H. and Y.Z.; Methodology, X.H. and Y.Z.; Project administration, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Clayton, D.G. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [Google Scholar] [CrossRef]
David, C.R. Regression models and life-tables (with discussion). J. R. Stat. Soc. B 1972, 34, 187220. [Google Scholar]
Andersen, P.K.; Klein, J.P.; Knudsen, K.M.; Tabanera y Palacios, R. Estimation of variance in Cox’s regression model with shared gamma frailties. Biometrics 1997, 53, 1475–1484. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Li, R. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 2002, 30, 74–99. [Google Scholar] [CrossRef]
Androulakis, E.; Koukouvinos, C.; Vonta, F. Estimation and variable selection via frailty models with penalized likelihood. Stat. Med. 2012, 31, 2223–2239. [Google Scholar] [CrossRef] [PubMed]
Groll, A.; Hastie, T.; Tutz, G. Selection of effects in Cox frailty models by regularization methods. Biometrics 2017, 73, 846–856. [Google Scholar] [CrossRef] [PubMed]
Becker, M.P.; Yang, I.; Lange, K. EM algorithms without missing data. Stat. Methods Med. Res. 1997, 6, 38–54. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Johansen, S. An extension of Cox’s regression model. Int. Stat. Rev. 1983, 51, 165–174. [Google Scholar] [CrossRef]
Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
Ding, J.; Tian, G.L.; Yuen, K.C. A new MM algorithm for constrained estimation in the proportional hazards model. Comput. Stat. Data Anal. 2015, 84, 135–151. [Google Scholar] [CrossRef]
Hunter, D.R.; Li, R. Variable selection using MM algorithms. Ann. Stat. 2005, 33, 1617. [Google Scholar] [CrossRef] [PubMed]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Craven, P.; Wahba, G. Smoothing noisy data with spline functions. Numer. Math. 1978, 31, 377–403. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [PubMed]
Vaida, F. Parameter convergence for EM and MM algorithms. Stat. Sin. 2005, 15, 831–840. [Google Scholar]
Zhou, H.; Alexander, D.; Lange, K. A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 2011, 21, 261–273. [Google Scholar] [CrossRef] [PubMed]
Ma, S.; Huang, J. A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 2017, 112, 410–423. [Google Scholar] [CrossRef]
Johnson, A.E.; Pollard, T.J.; Shen, L.; Lehman, L.w.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Anthony Celi, L.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef]

Table 1. Simulation results of Example 1.

	Log-Normal Frailty			Inverse Gaussian Frailty			Gamma Frailty
K	31.2780			162.1980			64.7648
T	4.7591			12.6620			5.1203
L	$- 2702.3917$			$- 2586.3026$			$- 2696.2794$
	MLE	Bias	SD	MLE	Bias	SD	MLE	Bias	SD
$γ$	0.2729	$0.0229$	0.4236	0.8502	$- 0.1498$	0.3856	2.1017	0.1017	0.4097
$β_{1}$	$- 2.0917$	$- 0.0917$	0.3814	$- 2.1046$	$- 0.1046$	0.4252	$- 2.0687$	$- 0.0687$	0.3844
$β_{5}$	$- 2.0692$	$- 0.0692$	0.4076	$- 2.1536$	$- 0.1536$	0.4224	$- 2.0894$	$- 0.0894$	0.3818
$β_{10}$	$- 1.0320$	$- 0.0320$	0.3822	$- 1.0597$	$- 0.0597$	0.3893	$- 1.0610$	$- 0.0610$	0.4010
$β_{15}$	1.0606	0.0606	0.3928	1.0692	0.0692	0.3973	1.0348	0.0348	0.4015
$β_{20}$	2.0530	0.0530	0.3851	2.1129	0.1129	0.4162	2.0891	0.0891	0.3830
$β_{25}$	3.1770	0.1770	0.4159	3.1770	0.1770	0.4159	3.1120	0.1120	0.3890
$β_{30}$	3.1071	0.1071	0.3849	3.2102	0.2102	0.4277	3.1410	0.1410	0.4070

Table 2. Simulation results based on three frailty models using two types of penalties. The sample size was 400, and 200 replications were conducted in Example 2.

Frailty	Sparsity Penalties	P (Selecting the True Model)	Zeros
Frailty	Sparsity Penalties	P (Selecting the True Model)	Correct	Incorrect
Gamma	MCP ( $a = 3$ )	1	27	0
Gamma	SCAD ( $a = 3.7$ )	1	27	0
Log-Normal	MCP ( $a = 3$ )	1	27	0
Log-Normal	SCAD ( $a = 3.7$ )	1	27	0
Inverse Gaussian	MCP ( $a = 3$ )	1	27	0
Inverse Gaussian	SCAD ( $a = 3.7$ )	1	27	0

Table 3. The estimated parameters’ MLE, Bias, and SD (standard error). The sample size equaled 400 and the replications number was 200 in Example 2.

Frailty	Par.	MCP			SCAD
Frailty	Par.	MLE	Bias	SD	MLE	Bias	SD
Gamma	$γ$	2.1012	$- 0.1012$	0.3106	2.0678	$- 0.0687$	0.3223
	$β_{1}$	2.0891	$- 0.0891$	0.2438	1.9984	0.0016	0.2081
	$β_{2}$	3.1120	$- 0.1120$	0.3517	2.9893	0.0107	0.2729
	$β_{3}$	4.1410	$- 0.1410$	0.3459	4.0175	$- 0.0175$	0.2920
Log-Normal	$γ$	0.4641	0.0359	0.1182	0.4696	0.0304	0.1101
	$β_{1}$	2.0563	$- 0.0563$	0.1831	2.0282	$- 0.0282$	0.1566
	$β_{2}$	2.9962	0.0038	0.3565	2.9705	0.0295	0.3080
	$β_{3}$	3.9756	0.0244	0.2931	4.0010	$- 0.0010$	0.2230
Inverse Gaussian	$γ$	0.8999	0.1001	0.2024	1.0193	$- 0.0193$	0.1717
	$β_{1}$	2.0562	$- 0.0562$	0.2240	2.0305	$- 0.0305$	0.1978
	$β_{2}$	3.0888	$- 0.0888$	0.2591	3.0052	$- 0.0052$	0.2548
	$β_{3}$	4.0915	$- 0.0915$	0.1816	4.0055	$- 0.0055$	0.1794

Table 4. The minimum BIC scores (BIC) and number of significant variables from the three frailty models.

Penalty	BIC	No. of Significant Variables
	Log-Normal Frailty
MCP	160,361	11
SCAD	160,361	11
	Inverse Gaussian Frailty
MCP	175,223	11
SCAD	175,127	11
	Gamma Frailty
MCP	160,684	31
SCAD	160,690	31

Table 5. Parameter estimates for covariates using the inverse Gaussian frailty model.

The Inverse Gaussian Frailty Model
Clinical
Fraction of inspired oxygen	0.221
Glucose	$- 0.128$
Heart Rate	−0.069
Height	−0.173
Oxygen saturation	0.109
Respiratory rate	−0.147
Elective	0.486
Emergency room admit	−0.152
(Admission location)
Phys referral/normal deli	0.220
(Admission location)
Personal Information
Engl (Language)	0.156
Married (Marriage status)	0.079

Table 6. Parameter estimates for covariates, using the log-normal frailty model.

The Log-Normal Frailty Model
Clinical
Fraction of inspired oxygen	0.208
Glucose	−0.120
Heart Rate	−0.065
Height	−0.165
Oxygen saturation	0.098
Respiratory rate	−0.139
Elective	0.433
Emergency room admit	−0.138
(Admission location)
Phys referral/normal deli	0.229
(Admission location)
Personal Information
Engl (Language)	0.134
Married (Marriage status)	0.070

Table 7. Parameter estimates for covariates, using the gamma frailty model.

The Gamma Frailty Model
Clinical
Fraction of inspired oxygen	0.246
Glucose	−0.048
Height	−0.162
Respiratory rate	−0.141
Elective	0.533
Phys referral/normal deli	0.365
(Admission location)
TRANSFER FROM OTHER HEALT	0.833
Personal Information
Engl (Language)	0.220
Separated (Marriage status)	0.355
Unitarian universalist	0.338
Ethnicity
Asian	−0.317
Hispanic/Latino–Puerto Rican	0.373
Multi-race ethnicity	0.262
Asian–other	−0.265
Hispanic/Latino–Colombian	1.790
Hispanic/Latino–Dominican	0.465
Middle eastern	0.306
Hispanic/Latino–Cuban	0.664
Asian–Asian Indian	0.664
White–eastern European	0.362
White–Brazilian	0.888
Portuguese	−0.315
Hispanic/Latino–Mexican	−0.345
Asian–Japanese	−1.110
Hispanic/Latino–Salvadoran	0.808
American Indian/Alaska native	−0.841
Federally recognized tribe
Asian–Filipino	−0.465
Asian–Korean	0.263
Hispanic/Latino–Guatemalan	−0.221
American Indian/Alaska native	−0.426
Asian–Cambodian	0.240

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, X.; Xu, J.; Zhou, Y. Exploring Complex Survival Data through Frailty Modeling and Regularization. Mathematics 2023, 11, 4440. https://doi.org/10.3390/math11214440

AMA Style

Huang X, Xu J, Zhou Y. Exploring Complex Survival Data through Frailty Modeling and Regularization. Mathematics. 2023; 11(21):4440. https://doi.org/10.3390/math11214440

Chicago/Turabian Style

Huang, Xifen, Jinfeng Xu, and Yunpeng Zhou. 2023. "Exploring Complex Survival Data through Frailty Modeling and Regularization" Mathematics 11, no. 21: 4440. https://doi.org/10.3390/math11214440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Complex Survival Data through Frailty Modeling and Regularization

Abstract

1. Introduction

2. Data and Model Formulation

3. Likelihood Function and the Proposed Methodologies

4. The Algorithms

5. Theoretical Properties

6. Simulation Study

7. Real Data Application

8. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI