Next Article in Journal
Are Brazilian Higher Education Institutions Efficient in Their Graduate Activities? A Two-Stage Dynamic Data-Envelopment-Analysis Cooperative Approach
Previous Article in Journal
An Optimized Point Multiplication Strategy in Elliptic Curve Cryptography for Resource-Constrained Devices
Previous Article in Special Issue
Nonparametric Additive Regression for High-Dimensional Group Testing Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Modified Cure Rate Model Based on a Piecewise Distribution with Application to Lobular Carcinoma Data

by
Yolanda M. Gómez
1,*,
John L. Santibañez
2,
Vinicius F. Calsavara
3,
Héctor W. Gómez
4 and
Diego I. Gallardo
1
1
Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4081112, Chile
2
Departamento de Matemática, Universidad de Atacama, Copiapó 7500015, Chile
3
Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, CA 90048, USA
4
Departamento de Estadística y Ciencia de Datos, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(6), 883; https://doi.org/10.3390/math12060883
Submission received: 31 January 2024 / Revised: 27 February 2024 / Accepted: 14 March 2024 / Published: 17 March 2024
(This article belongs to the Special Issue Advances in Biostatistics and Applications)

Abstract

:
A novel cure rate model is introduced by considering, for the number of concurrent causes, the modified power series distribution and, for the time to event, the recently proposed power piecewise exponential distribution. This model includes a wide variety of cure rate models, such as binomial, Poisson, negative binomial, Haight, Borel, logarithmic, and restricted generalized Poisson. Some characteristics of the model are examined, and the estimation of parameters is performed using the Expectation–Maximization algorithm. A simulation study is presented to evaluate the performance of the estimators in finite samples. Finally, an application in a real medical dataset from a population-based study of incident cases of lobular carcinoma diagnosed in the state of São Paulo, Brazil, illustrates the advantages of the proposed model compared to other common cure rate models in the literature, particularly regarding the underestimation of the cure rate in other proposals and the improved precision in estimating the cure rate of our proposal.

1. Motivation

Cure models have had enormous growth in the medical area [1], especially associated with cancer, because it allows us to estimate two crucial components together: on the one hand, the probability of cure and, on the other, the survival time of susceptible patients (that is, those who are not cured and will die due to the cancer studied). The increase in the use of this type of model is due, in large part, to the increase in preventive medical techniques that allow many types of cancer to be detected in the initial stages and, therefore, allow a better prognosis for the patient.
In this context, we assume the existence of a latent random variable (r.v.), M, which represents the number of competing causes related to the occurrence of the event of interest. The pioneer model in this context was proposed by Berkson and Gage [2], which assumed the Bernoulli distribution for M. Almost fifty years later, in a cancer context, the causes were represented by carcinogenic cells and modeled according to the Poisson distribution by Chen et al. [3]. Other important models in this context consider this approach, modifying the discrete distribution, including the negative binomial (NB) [4,5,6,7,8,9], zero-modified geometric [10], power series family [11], Conway–Maxwell– Poisson [12], weighted Poisson [13], modified power series (MPS) [14].
On the other hand, the piecewise exponential (PE) model proposed by Feigl and Zelen [15], and extended by Friedman [16], is widely used for modeling clinical data, because it has a constant risk function in each of its predefined L intervals, which can be very useful in certain situations, as it allows the survival function to fall more (or less) quickly at specific times that have a clear explanation from a practical point of view. Gomez et al. [17], based on the exponential method, extended the PE to obtain the power piecewise exponential distribution (PPE). The PPE generalizes the PE model by adding flexibility to the hazard function, allowing for both monotonic and non-monotonic patterns within each of the L intervals, in addition to the already known constant hazard function pattern. De Castro and Gómez [9] employed the PPE model in a cure rate model context, assuming the negative binomial distribution for the number of competing causes, whereas Gómez et al. [8] discussed the classical counterpart.
In this article, we propose the use of the PPE model, extending the NB cure fraction model discussed in [8] through the modified power series family of distributions. Thus, the proposed model offers multiple options for modeling the time to event because the exponential, PE, and exponentiated exponential (EE) [18] models are particular cases of the PPE model. On the other hand, particular cases of the MPS include traditional models such as Poisson (Po), binomial (Bin), NB, logarithmic (Lo), as well as less used discrete models such as Borel (Bo), Haight (Ha), generalized binomial (GB), and restricted generalized Poisson (RGP), to name a few. The manuscript is organized as follows. Section 2 is devoted to introducing details of the PPE and MPS distributions. In Section 3, we introduce the MPS cure rate model with baseline PPE. Section 4 discusses the estimation procedure for the proposed model, including an Expectation–Maximization (EM)-type algorithm [19] to obtain the maximum likelihood (ML) estimators. Section 5 presents a simulation to assess the performance of the ML estimators in finite samples. In Section 6, we present a real data illustration of the model for patients with lobular carcinoma. Finally, Section 7 presents the main conclusions of the work and possible future research based on this article.

2. Background

In this Section, we provide some details of the PPE and MPS distributions, which are relevant to introduce our proposal.

2.1. PPE Distribution

The PPE model was introduced by Gómez et al. [17]. For a fixed L (representing the breakpoints of the distribution), let T be an r.v. with PPE distribution with parameters λ = ( λ 1 , , λ L ) and α and known partition a = ( a 1 , , a L 1 ) , such that 0 = a 0 < a 1 < < a L 1 < a L = . Note that each λ , = 1 , , L is related to each partition. We denote T PPE ( λ , α , a ) . The probability density function (PDF) and cumulative distribution function (CDF) are determined by
f ( t ; λ , α | a ) = α κ l λ exp λ l ( t a 1 ) 1 exp = 1 L λ Δ ( t ) α 1 , t [ a 1 , a ) , l = 1 , , L , F ( t ; λ , α | a ) = 1 exp = 1 L λ Δ ( t ) α , t > 0 ,
where κ 1 = 1 and κ = exp i = 1 1 λ i ( a i a i 1 ) , = 2 , , L . In addition, Δ is defined as
Δ ( t ) = 0 , if t < a 1 , t a 1 , if a 1 t < a , = 1 , , L . a a 1 , if t a .
Remark 1.
The PPE distributions include the following particular cases.
  • For α = 1 , T PE ( λ , a ) .
  • For L = 1 , T EE ( λ , α ) .
  • For L = 1 and α = 1 , T E ( λ ) (the standard exponential model).
The survival function of the PPE model is given by
S ( t ; λ , α | a ) = 1 1 exp l = 1 L λ l Δ l ( t ) α , t > 0 ,
and its respective hazard function is given by
h ( t ; λ , α | a ) = α κ l λ l exp λ l ( t a l 1 ) 1 exp l = 1 L λ l Δ l ( t ) α 1 1 1 exp l = 1 L λ l Δ l ( t ) α ,
for t [ a l 1 , a l ) and l = 1 , , L . Figure 1 shows the different forms adopted by the distribution of the PPE with partition at a = 2 .

2.2. Modified Power Series Family of Distributions

The MPS distribution was introduced by Noack [20]. We say that M MPS ( θ ) , if its probability mass function (PMF) is given by
P ( M = m ; θ ) = a m [ ϕ ( θ ) ] m A ( ϕ ( θ ) ) , m = 0 , 1 , 2 ,
where a m > 0 , θ Θ R + , ϕ ( · ) is a positive function, and A ( ϕ ( θ ) ) = m = 0 a m [ ϕ ( θ ) ] m . In a cure rate models context, the probability generating function (PGF) is very important, and for the MPS models, such a function is given by
G M ( u ; θ ) = E ( u M ; θ ) = A ( u ϕ ( θ ) ) A ( ϕ ( θ ) ) , | u | 1 .
Table 1 details some particular cases of the MPS distribution.
Note that very well-known models in the literature are particular cases of this class of distributions.

3. The Proposed Model

In this Section, we introduce the MPS cure rate (MPScr) with baseline PPE distribution. Henceforth, this model will be named the MPScr-PPE model.
Suppose that a patient diagnosed with some type of cancer has M carcinogenic cells. Evidently, M is not observable; thus, for the formulation of the model, we will assume that its PMF corresponds to the PMF of the MPS distribution. Further, we assume that V j , j = 1 , , M represents, for each M, the associated time to produce carcinogenesis. The time of death of the patient is given by the minimum of the V j s, as long as the patient has at least one cancer cell. Otherwise, the patient will be considered cured. Under this scheme, the failure time of the patient is given by T = min ( V 0 , V 1 , , V M ) , where P ( V 0 = ) = 1 is a degenerate r.v. at zero. We assume that, conditional on M = m , V 1 , , V m are independent and identically distributed such as V j P P E ( λ , α , a ) . With those assumptions, the population survival function is given by
S pop ( t ; θ , λ , α , a ) = G M ( S ( t ; λ , α , a ) ; θ ) = A ϕ ( θ ) 1 1 exp l = 1 L λ l Δ l ( t ) α A ( ϕ ( θ ) ) .
The population PDF of the model is given by
f pop ( t ; θ , λ , α , a ) = d S pop ( t ; θ , λ , α , a ) d t = A ( S ( t ; λ , α , a ) ϕ ( θ ) ) A ( ϕ ( θ ) ) f ( t ; λ , α , a ) ϕ ( θ ) ,
and substituting the PDF and survival function of the PPE model provided in Equations (1) and (2), the population PDF is given by
f pop ( t ; θ , λ , α , a ) = A ϕ ( θ ) 1 1 exp l = 1 L λ l Δ l ( t ) α A ( ϕ ( θ ) ) ×   α κ l λ l exp λ l ( t a l 1 ) 1 exp l = 1 L λ l Δ l ( t ) α 1 ϕ ( θ ) ,
where A ( u ) = m = 1 m a m u m 1 . A ( u ) is reduced in a simple function for some particular cases of the MPS model (Bin, Po, NB, and Lo, which coincide with ϕ as the identity function). For the other cases, A ( u ) cannot be reduced. On the other hand, the cure rate of the model is given by
p = lim t S pop ( t ; θ , λ , α , a ) = P ( M = 0 ; θ ) = a 0 A ( ϕ ( θ ) ) ,
which only depends on θ . Therefore, the model can be reparametrized directly in the cure rate term in order to perform a regression on this term. Let C ( θ ) = A ( ϕ ( θ ) ) ; then, θ = C 1 ( a 0 / p ) .
Considering that the population is not homogeneous, we assume the existence of a set of r covariates measured for each observation, say x i = ( 1 , x i 1 , , x i r ) , i = 1 , , n , where the first term is related to the intercept, and n represents the sample size. The vector x i can be introduced in the cure rate term using, for instance, the logit link function such that p i = exp ( x i β ) / [ 1 + exp ( x i β ) ] , where β = ( β 0 , β 1 , , β r ) corresponds to the vector of unknown coefficients. Note that for all the distributions in Table 1, we obtain a 0 = 1 . Therefore,
θ i = C 1 1 + exp ( x i β ) exp ( x i β ) .

4. Estimation

In this Section, we discuss the estimation procedure for the model under a classical approach. As the studies related to the cure rate are prospective, it is natural to assume a right censoring scheme. The failure indicator of the i-th observation will be denoted by δ i , which will take the value of 1 when the event of interest is observed and 0 when the time is censored, with i = 1 , , n . The observations are considered independent. Under this configuration, the log-likelihood function for the MPScr-PPE model is given by
( Ψ ) = i = 1 n δ i log f p o p ( t i ; λ , α , β | a , x i ) + ( 1 δ i ) log S p o p ( t i ; λ , α , β | a , x i ) ,
where Ψ = ( λ , α , β ) denotes the vector of the parameters, and S p o p , and f p o p are given in Equations (6) and (7), respectively. However, the maximization of Equation (9) can be difficult, especially because, for some particular cases of the MPScr-PPE model, A ( · ) cannot be reduced to a simpler form. For this reason, in the next subsection, we will discuss a more efficient estimation procedure with less complexity based on the EM algorithm.

EM Algorithm

The EM algorithm is a very useful tool to deal with models in the presence of latent variables, facilitating the estimation process. Let M = ( M 1 , , M n ) be the vector containing the number of concurrent causes for all the individuals (the unobserved data) and D obs = { t , δ , X } the observed data, where t = ( t 1 , , t n ) , δ = ( δ 1 , , δ n ) and X = ( x 1 , , x n ) . Thus, D comp = { D obs , M } represents the complete data. Considering proposition 1 of Gallardo et al. [14], it follows directly that, for the MPScr-PPE model, we obtain
E ϕ ( θ i ) S i [ M i r ] = m i = δ i m i r a m i [ ϕ ( θ i ) S i ] m i A ( ϕ ( θ i ) S i ) , i = 1 , , n ,
with S i as the survival function of the PPE model evaluated at t i . Therefore, the PMF of the number of concurrent causes, M i , conditional on t i and δ i , is given by
P ( M i = m i ; θ i , S i | t i , δ i ) = a m i [ ϕ ( θ i ) S i ] m i A ( ϕ ( θ i ) S i ) m i E ϕ ( θ i ) S i [ M i r ] δ i ,
with m i = 0 , 1 , 2 , and i = 1 , , n . In this way, the conditional expectation of M i given t i and δ i becomes
E [ M i ; θ i , S i | t i , δ i ] = ( 1 δ i ) E ϕ ( θ i ) S i [ M i ] + δ i E ϕ ( θ i ) S i [ M i 2 ] E ϕ ( θ i ) S i [ M i ] .
The complete log-likelihood function is given by
c ( Ψ | D comp ) = i = 1 n M i log S i + δ i log h i + M i log ϕ ( θ i ) log A ( ϕ ( θ i ) ) ,
where h i = f i / S i , with f i denoting the PDF of the PPE distribution evaluated at t i . Let Ψ ( k ) be the estimate of Ψ at the k-th iteration of the EM algorithm, and let Q ( Ψ | Ψ ( k ) ) be the conditional expectation of the complete log-likelihood function given the observed data and Ψ ( k ) . With those notations, Q ( Ψ | Ψ ( k ) ) can be rewritten as
Q ( Ψ | Ψ ( k ) ) = Q 1 ( β | Ψ ( k ) ) + Q 2 ( λ , α | Ψ ( k ) ) ,
where
Q 1 ( β | Ψ ( k ) ) = i = 1 n M i ( k ) log ϕ ( θ i ) log A ( ϕ ( θ i ) ) ,
Q 2 ( λ , α | Ψ ( k ) ) = i = 1 n M i ( k ) log S i + δ i log h i ,
and M i ( k ) = E [ M i | D obs , Ψ ( k ) ] , which can be computed using the result in Equation (10). In summary, the k-th step of the EM algorithm is given by
  • E-step: For i = 1 , , n , compute
    M i ( k ) = m i = 0 m i a m i ϕ θ i ( k 1 ) S i ( k 1 ) m i A ϕ θ i ( k 1 ) S i ( k 1 ) , if δ i = 0 m i = 0 m i 2 a m i ϕ θ i ( k 1 ) S i ( k 1 ) m i m i = 0 m i a m i ϕ θ i ( k 1 ) S i ( k 1 ) m i , if δ i = 1
  • M-step: Given M 1 ( k ) , , M n ( k ) , find β ( k ) , λ ( k ) , and α ( k ) that maximize Equations (11) and (12) with respect to β , λ , and α .
The maximization of the Equations (11) and (12) can be performed using numerical procedures. For instance, we use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm implemented in the optim of [21]. The E- and M- steps are iterated until some convergence criterion is met. We consider that the distance between the estimates in two consecutive steps is less than a preset ϵ . In particular, we consider the distance | | Ψ ( k + 1 ) Ψ ( k ) | | as the maximum of the absolute difference between Ψ ( k + 1 ) and Ψ ( k ) . The asymptotic covariance matrix of the ML estimators of Ψ , say Ψ ^ , can be estimated as
Var ^ Ψ ^ = 2 c ( Ψ | D comp ) Ψ Ψ | Ψ = Ψ ^ 1 .
This matrix can be estimated numerically. For instance, we consider the hessian function included in the pracma [22] package of R [21] version 4.2.2.

5. Simulation Study

In this Section, we present a simulation study to assess the performance of the ML estimators for the MPScr-PPE model obtained via the EM algorithm in finite samples.

Recovery Parameters

This study was devoted to assessing some properties of the ML estimators in finite samples. In particular, we performed the study for the GBcr-PPE model. We considered a scenario similar to a real data application with two covariates: ( x i 10 , x i 11 , x i 12 , x i 13 ) simulated from the multinomial distribution with the parameter vectors ( 0.28 , 0.38 , 0.25 , 0.08 ) (representing the four stages of the cancer) and x i 2 simulated from the standard normal distribution with the mean and standard deviation of 59 and 12.6, respectively (representing the age of the patient). Therefore, the vector of the covariates for each individual is given by x i = ( 1 , x i 11 , x i 12 , x i 13 , x i 2 ) , i = 1 , , n (note that x i 10 is not included to avoid identifiability problems). To draw samples, we consider the stochastic representation of the model. For a fixed vector β = ( β 0 , β 11 , β 12 , β 13 , β 2 ) , we compute θ i as in Equation (8); then, M i is drawn from the corresponding GB distribution with q = 1 and r = 2 . If M i = 0 , then the failure time is defined as Y i = . For M i 1 , we draw V i 1 , , V i M i from the PPE distribution (with a predefined λ ), and the failure time is defined as Y i = min ( V i 1 , V i 2 , , V i M i ) . For simplicity, we also consider that all the censoring times are identical to C. Therefore, the observed times are given by T i = min ( Y i , C ) , with the corresponding failure indicators δ i = I ( Y i C ) . We consider L = 3 partitions, with λ = ( λ 1 = 0.03 , λ 2 = 0.04 , λ 3 = 0.06 ) and β = ( β 0 = 4 , β 11 = 1.4 , β 12 = 2.4 , β 13 = 3.8 , β 2 = 0.01 ) . We also consider three sample sizes: 500, 750, and 1000; three values for C: 10, 14, and 18; and three values for α : 0.8, 1.0, and 1.2, totaling 27 cases. Table 2 and Table 3 summarize the average bias (bias), the estimated root mean square error (RMSE), and the mean of the standard errors (SE). The results suggest that the estimators are consistent, because as the sample size increases, the bias generally decreases, while the precision of the estimate increases. In addition, the model works well to capture when the survival time has a PE distribution instead of PPE, because the estimate for α = 1 has a higher accuracy with respect to the other scenarios, where α takes larger values.

6. Application

The data set includes information from 2562 patients diagnosed with lobular carcinoma (a breast cancer), treated in the mastology area, with a diagnosis date between 2009 and 2016, with follow-up conducted until 2018. The data set was obtained from the Oncocenter Foundation of São Paulo, Brazil (Fundação Oncocentro de São Paulo (FOSP) in Portuguese), which is responsible for coordinating the Hospital Cancer Registry of the State of São Paulo (http://fosp.saude.sp.gov.br, accessed on 31 December 2023). This pathology is a type of breast cancer that occurs in the lobes, the glands that produce milk.
Death due to cancer was defined as the event of interest, and the time was measured from the date of diagnosis until the patient’s death (in years, mean: 5.01, standard deviation (SD): 3.05, median: 4.50, range: 0.0027–13.62). A total of 461 (18%) events occurred during the follow-up period. The median follow-up time was 12.7 years. The observed independent variables were as follows: age at diagnosis (mean: 58.98, SD: 12.64, median: 59, range: 20–94) and the clinical stage (I: 721 (28.14%), II: 976 (38.1%), III: 651 (25.41%), and IV: 214 (8.35%)), with clinical stage IV representing the most advanced stage. Figure 2 shows the estimated survival curves obtained by the Kaplan–Meier (KM) estimator for the breast cancer dataset. According to the estimated overall survival (Figure 2a), the survival function appears to trend towards a plateau close to 0.5, suggesting the presence of long-term survivors in the population. Additionally, younger patients (≤55 years old) and those in early clinical stages exhibit higher survival rates (Figure 2b,c).
We fitted 59 particular cases of the MPScr-PPE model, considering homogeneous partitions based on the quantiles ranging L from 1 to 30. The models considered were Po, Lo, Bo, Ha, NB (with q = 1 to q = 10 ), Bin ( q = 1 to q = 10 ), RGP ( q = 1 to q = 10 ), and GB ( q = 1 to q = 5 and r = 1 to r = 5 ), including the most popular cure rate models in the literature mentioned in the Introduction. Figure 3 shows the Akaike information criteria (AIC) [23] for all combinations of the available covariates, considering the five models with a better performance (Po, Lo, NB with q = 1 , NB with q = 2 , and GB with q = 1 and r = 2 ). The best scenario was L = 25 , with a similar trend in all the proposed adjusted models. For comparative purposes, we also considered the same models with the baseline Weibull (WEI) distribution for the concurrent causes; here, the lowest AIC was 23 points higher than our proposal.
Table 4 presents the log-likelihood function, AIC, and Bayesian information criterion (BIC) [24] for the members of the GBcr-PPE and GBcr-WEI models. Note that in both cases, the GBcr model provides a better result.
The parameter estimation under the GBcr-PPE model with q = 1 and r = 2 shows that α ^ = 1.0549 with a standard deviation of 0.2343 . This suggests that the PE model should be preferred instead of the PPE model for this particular problem. To verify this, we also performed the likelihood ratio test for the hypothesis H 0 : α = 1 versus H 1 : α 1 , providing the observed statistic TRV = 2 ( loglike PPE loglike PE ) = 2 ( 1713.390 + 1713.414 ) = 0.024 . Under H 0 , this statistic follows a chi-square distribution with one degree of freedom, providing a p value of 0.0009. Therefore, with a level of significance of 5%, it is concluded that there is not enough information to establish a difference between the PE and PPE distributions. Table 5 shows the parameter estimation with the respective standard error of the PE-GBcr model with q = 1 and r = 2 . Note that the estimates for the regression coefficients are concordant for both models in the sense that both have the same sign.
Figure 4 shows the QQ-plot for the quantile residuals (left panel) and KM estimator for the Cox–Snell residuals [25] (right panel). On the other hand, we also applied some common normality tests to check the validity of the quantile residuals, such as the Kolmogorov–Smirnov (KS, [26]), Shapiro–Wilk (SW, [27]), Anderson–Darling (AD, [28]), and Cramer–Von-Mises (CVM, [29]). The p values for such tests suggest that the quantile residuals have a standard normal distribution. Finally, the KM estimator for the Cox–Snell residuals suggest that it is reasonable that such residuals have a standard exponential distribution. For this reason, both residuals suggest that the GBcr model provides satisfactory results for this data set.
Finally, in order to illustrate the advantage of using the GBcr-PE instead of the GBcr-WEI model, we computed the estimated cure rate and the corresponding 95% confidence interval (CI) for both models, which are presented in Figure 5. Note that the GBcr-WEI model underestimates the cure rate in relation to the GBcr-PE model. Furthermore, in some cases (such as Figure 5a), the GBcr-PE provides a more accurate 95% CI.

7. Conclusions and Future Work

A new cure rate model was introduced based on the power piecewise exponential distribution. The parameter estimation was performed using the EM algorithm, which produces a very simplified estimation procedure. Properties of the ML estimators were validated through a simulation study, which revealed that, as the sample size increases, the bias and standard error (SE) decrease. The components of the vector λ (related to the PPE distribution) highlighted slower convergence of the estimator compared to other parameters, indicating the need for a larger sample size to reach acceptable properties. The model proficiently identifies when survival times align with a PE distribution rather than a PPE distribution. Finally, in a real data application related to breast cancer, the GBcr-PPE model performed better than the common models in this context. Specifically, we determined that, for this kind of cancer, the punctual estimation for the cure rate based on our proposal varies between 99% for the most favorable case (younger patients in stage I) and 35% (older patients in stage IV), which was always underestimated by a concurrent model. Future research along these lines could consider a Bayesian approach to perform the parameter estimation and the inclusion of random effects in the cure rate terms of the model.

Author Contributions

Conceptualization, D.I.G. and Y.M.G.; methodology, D.I.G., Y.M.G. and H.W.G.; software, D.I.G., Y.M.G. and J.L.S.; validation, H.W.G.; formal analysis, D.I.G., Y.M.G., H.W.G. and J.L.S.; investigation, D.I.G., Y.M.G. and J.L.S.; resources, Y.M.G.; data curation, V.F.C. and J.L.S.; writing—original draft preparation, D.I.G., Y.M.G. and J.L.S.; writing—review and editing, D.I.G., Y.M.G. and J.L.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of the first author was partially supported by a grant from the Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) 11230397. This work was also supported in part by the NIH National Center for Advancing Translational Sciences UCLA CTSI UL1 TR001881.

Data Availability Statement

Data and computational codes are available upon request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yin, G.; Ibrahim, J.G. Cure rate models: A unified approach. Can. J. Stat. 2005, 33, 559–570. [Google Scholar] [CrossRef]
  2. Berkson, J.; Gage, R.P. Survival curve for cancer patients following treatment. J. Am. Stat. Assoc. 1952, 47, 501–515. [Google Scholar] [CrossRef]
  3. Chen, M.H.; Ibrahim, J.G.; Sinha, D. A new Bayesian model for survival data with a surviving fraction. J. Am. Stat. Assoc. 1999, 94, 909–919. [Google Scholar] [CrossRef]
  4. Cancho, V.G.; Rodrigues, J.; de Castro, M. A flexible model for survival data with a cure rate: A Bayesian approach. J. Appl. Stat. 2011, 38, 57–70. [Google Scholar] [CrossRef]
  5. Yiqi, B.; Maria Russo, C.; Cancho, V.G.; Louzada, F. Influence diagnostics for the Weibull-Negative-Binomial regression model with cure rate under latent failure causes. J. Appl. Stat. 2016, 43, 1027–1060. [Google Scholar] [CrossRef]
  6. Ortega, E.M.; Cordeiro, G.M.; Kattan, M.W. The negative binomial–beta Weibull regression model to predict the cure of prostate cancer. J. Appl. Stat. 2012, 39, 1191–1210. [Google Scholar] [CrossRef]
  7. D’Andrea, A.; Rocha, R.; Tomazella, V.; Louzada, F. Negative binomial Kumaraswamy-G cure rate regression model. J. Risk Financ. Manag. 2018, 11, 6. [Google Scholar] [CrossRef]
  8. Gómez, Y.M.; Gallardo, D.I.; Leão, J.; Calsavara, V.F. On a new piecewise regression model with cure rate: Diagnostics and application to medical data. Stat. Med. 2021, 40, 6723–6742. [Google Scholar] [CrossRef] [PubMed]
  9. de Castro, M.; Gómez, Y.M. A Bayesian cure rate model based on the power piecewise exponential distribution. Methodol. Comput. Appl. Probab. 2020, 22, 677–692. [Google Scholar] [CrossRef]
  10. Leão, J.; Bourguignon, M.; Gallardo, D.I.; Rocha, R.; Tomazella, V. A new cure rate model with flexible competing causes with applications to melanoma and transplantation data. Stat. Med. 2020, 39, 3272–3284. [Google Scholar] [CrossRef]
  11. Cancho, V.G.; Louzada, F.; Ortega, E.M. The power series cure rate model: An application to a cutaneous melanoma data. Commun. Stat.-Simul. Comput. 2013, 42, 586–602. [Google Scholar] [CrossRef]
  12. Balakrishnan, N.; Pal, S. Expectation maximization-based likelihood inference for flexible cure rate models with Weibull lifetimes. Stat. Methods Med. Res. 2016, 25, 1535–1563. [Google Scholar] [CrossRef]
  13. Balakrishnan, N.; Koutras, M.V.; Milienos, F.S. A weighted Poisson distribution and its application to cure rate models. Commun. Stat.-Theory Methods 2018, 47, 4297–4310. [Google Scholar] [CrossRef]
  14. Gallardo, D.I.; Gomez, Y.M.; Gómez, H.W.; de Castro, M. On the use of the modified power series family of distributions in a cure rate model context. Stat. Methods Med. Res. 2020, 29, 1831–1845. [Google Scholar] [CrossRef]
  15. Feigl, P.; Zelen, M. Estimation of exponential survival probabilities with concomitant information. Biometrics 1965, 21, 826–838. [Google Scholar] [CrossRef]
  16. Friedman, M. Piecewise exponential models for survival data with covariates. Ann. Stat. 1982, 10, 101–113. [Google Scholar] [CrossRef]
  17. Gómez, Y.M.; Gallardo, D.I.; Arnold, B.C. The power piecewise exponential model. J. Stat. Comput. Simul. 2018, 88, 825–840. [Google Scholar] [CrossRef]
  18. Gupta, R.D.; Kundu, D. Generalized exponential distributions. Aust. N. Z. J. Stat. 1999, 41, 173–188. [Google Scholar] [CrossRef]
  19. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data Via the EM Algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
  20. Noack, A. A Class of Random Variables with Discrete Distributions. Ann. Math. Stat. 1950, 21, 127–132. [Google Scholar] [CrossRef]
  21. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
  22. Borchers, H.W. pracma: Practical Numerical Math Functions; R Package Version 2.4.2; R Package Vignette: Madison, WI, USA, 2022. [Google Scholar]
  23. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  24. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  25. Cox, D.R.; Snell, E.J. A general definition of residuals. J. R. Stat. Soc. Ser. B (Methodol.) 1968, 30, 248–275. [Google Scholar] [CrossRef]
  26. Kolmogorov, A.N. Sulla determinazione empirica di una legge di distribuzione (On the empirical determination of a distribution law). G. Dell’Inst. Ital. Degli Attuari 1933, 4, 83–91. [Google Scholar]
  27. Shapiro, S.S.; Wilk, M.B. An analysis of variance test for normality (complete samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
  28. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
  29. Cramér, H. On the composition of elementary errors. Skand. Aktuarietidskr. 1928, 11, 13–74. [Google Scholar]
Figure 1. PDF (left) and hazard function (right) of the PPE distribution for different values of λ and α , with a = 2 ( L = 2 partitions).
Figure 1. PDF (left) and hazard function (right) of the PPE distribution for different values of λ and α , with a = 2 ( L = 2 partitions).
Mathematics 12 00883 g001
Figure 2. Estimated survival curve obtained by the Kaplan–Meier estimator.
Figure 2. Estimated survival curve obtained by the Kaplan–Meier estimator.
Mathematics 12 00883 g002
Figure 3. AIC for L = 1 to L = 30 partitions. The right panel is a zoomed in image of the outlined part of the graph.
Figure 3. AIC for L = 1 to L = 30 partitions. The right panel is a zoomed in image of the outlined part of the graph.
Mathematics 12 00883 g003
Figure 4. Quantile–quantile (QQ) plot with envelope for quantile residuals (and the corresponding p value for different normality tests) and the KM estimator for the Cox–Snell residuals for the GBcr-PE model for the lobular carcinoma data.
Figure 4. Quantile–quantile (QQ) plot with envelope for quantile residuals (and the corresponding p value for different normality tests) and the KM estimator for the Cox–Snell residuals for the GBcr-PE model for the lobular carcinoma data.
Mathematics 12 00883 g004
Figure 5. Estimated cure rate and the corresponding 95% confidence intervals for the MPScr-PE and MPScr-WEI models.
Figure 5. Estimated cure rate and the corresponding 95% confidence intervals for the MPScr-PE and MPScr-WEI models.
Mathematics 12 00883 g005
Table 1. a m , ϕ ( · ) , A ( ϕ ( · ) ) , Θ , and C 1 ( · ) (the inverse function of A ( ϕ ( · ) ) ) for some particular cases of the MPS model.
Table 1. a m , ϕ ( · ) , A ( ϕ ( · ) ) , Θ , and C 1 ( · ) (the inverse function of A ( ϕ ( · ) ) ) for some particular cases of the MPS model.
Distribution a m ϕ ( θ ) A ( ϕ ( θ ) ) Θ C 1 ( u )
Bin ( q , θ ) q m θ ( 1 + θ ) q ( 0 , ) u 1 / q 1
Po ( θ ) ( m ! ) 1 θ exp ( θ ) ( 0 , ) log ( u )
NB ( q , θ ) m + q 1 m θ ( 1 θ ) q ( 0 , 1 ) 1 u 1 / q
Lo ( θ ) ( m + 1 ) 1 θ log ( 1 θ ) θ ( 0 , 1 ) 1 + W ( u exp ( u ) )
Bo ( θ ) ( m + 1 ) m 1 / m ! θ exp ( θ ) exp ( θ ) ( 0 , 1 ) log ( u )
Ha ( θ ) 2 m + 1 m + 1 / ( 2 m + 1 ) θ ( 1 θ ) ( 1 θ ) 1 ( 0 , 1 ) 1 u 1
GB ( q , r , θ ) r r + q m m / ( r + q m ) θ ( 1 θ ) q 1 ( 1 θ ) ( q + r 1 ) ( 0 , 1 ) 1 u 1 / ( q + r 1 )
RGP ( q , θ ) ( q m + 1 ) m 1 / m ! θ exp ( q θ ) exp ( θ ) ( 0 , ) log ( u )
Table 2. Estimated bias, RMSE, and SE for the PPE-GB model with q = 1 and r = 2 under different scenarios ( C = 10 and C = 14 ).
Table 2. Estimated bias, RMSE, and SE for the PPE-GB model with q = 1 and r = 2 under different scenarios ( C = 10 and C = 14 ).
n = 500 n = 750 n = 1000
Cens. α ParameterBiasRMSESEBiasRMSESEBiasRMSESE
10 0.8 β 0 1.5262.1033.2691.4041.6070.8031.3751.4660.523
β 11 −0.2591.3983.065−0.1470.6930.634−0.1080.3900.375
β 12 −0.411.4293.048−0.2820.7290.621−0.2530.4500.364
β 13 2.8483.6559.7542.7532.9341.4372.7492.8250.738
β 2 0.0020.0090.0090.0010.0070.0070.0010.0060.006
α −0.1360.1780.115−0.1490.1740.091−0.1520.1710.078
λ 1 0.0810.0910.0450.0760.0830.0360.0760.0820.031
λ 2 0.0970.1050.0420.0970.1030.0340.0940.0980.029
λ 3 0.2640.2690.0800.2600.2640.0640.2580.2610.055
1 β 0 1.7092.1862.5371.6341.8741.2031.5731.6600.562
β 11 −0.2621.3212.324−0.2020.8351.031−0.1580.4600.412
β 12 -0.4681.3752.305−0.4030.8911.015−0.3510.5450.399
β 13 2.9643.94813.9832.7213.0282.6702.7522.8490.841
β 2 0.0010.0090.0100.0010.0070.0080.0010.0060.007
α −0.1140.2270.186−0.1300.1920.148−0.1330.1840.127
λ 1 0.0910.1040.0530.0860.0950.0420.0840.0900.036
λ 2 0.1000.1080.0430.1000.1060.0350.0990.1040.030
λ 3 0.2680.2730.0760.2660.2690.0610.2660.2680.053
1.2 β 0 2.2263.2177.4541.9452.4312.3261.7971.9200.733
β 11 −0.6232.3807.245−0.3331.4142.153−0.2200.6370.580
β 12 −0.8942.4517.223−0.6071.4872.134−0.4710.7440.565
β 13 2.9764.87629.8162.8183.5797.9022.7542.9902.057
β 2 0.0010.0090.0100.0010.0080.0080.0010.0070.007
α −0.0390.3530.299−0.0820.2550.229−0.0950.2090.195
λ 1 0.1040.1230.0630.0990.1100.0510.0950.1030.043
λ 2 0.1050.1140.0450.1050.1110.0370.1060.1100.032
λ 3 0.2770.2820.0750.2720.2750.0600.2710.2730.052
14 0.8 β 0 1.1911.5191.2831.0981.2210.5561.1231.2160.479
β 11 −0.1280.8421.083−0.0740.3870.391−0.0670.3440.336
β 12 −0.1950.8541.070−0.1360.3900.380−0.1350.3560.327
β 13 2.8593.3144.9442.7962.8780.9212.7532.7930.474
β 2 0.0020.0090.0090.0020.0070.0070.0020.0060.006
α −0.1330.1730.111−0.1410.1650.090−0.1460.1630.077
λ 1 0.0550.0640.0360.0530.0590.0290.0520.0570.025
λ 2 0.0590.0670.0310.0590.0640.0260.0580.0620.022
λ 3 0.1400.1440.0440.1390.1420.0360.1410.1430.031
1 β 0 1.3661.7911.8891.2691.3910.5831.2381.3270.501
β 11 −0.1941.1101.688−0.0980.4390.417−0.0760.3600.355
β 12 −0.3091.1171.672−0.2070.4660.405−0.1780.3860.345
β 13 2.8883.6058.7572.7922.8880.9592.7762.8240.505
β 2 0.0020.0080.0090.0010.0070.0070.0020.0060.006
α −0.1070.2220.179−0.1330.1900.140−0.1270.1740.122
λ 1 0.0640.0770.0420.0610.0680.0340.0620.0670.030
λ 2 0.0630.0710.0320.0620.0670.0260.0620.0650.023
λ 3 0.1460.1490.0420.1440.1460.0340.1440.1460.029
1.2 β 0 1.6882.4954.5751.4131.5470.6181.4061.5040.531
β 11 −0.4201.8474.375−0.1460.4980.449−0.1310.4380.384
β 12 −0.5631.8774.358−0.2860.5450.436−0.2740.4850.373
β 13 2.7313.76612.3222.7672.9541.9732.7002.7900.745
β 2 0.0010.0090.0090.0010.0070.0070.0010.0060.006
α −0.0140.3360.290−0.0470.2540.226−0.0720.2060.189
λ 1 0.0810.0970.0530.0760.0860.0430.0720.0800.036
λ 2 0.0680.0760.0350.0680.0730.0280.0680.0710.024
λ 3 0.1540.1560.0410.1530.1550.0330.1520.1530.029
Table 3. Estimated bias, RMSE, and SE for the PPE-GB model with q = 1 and r = 2 under different scenarios ( C = 18 ).
Table 3. Estimated bias, RMSE, and SE for the PPE-GB model with q = 1 and r = 2 under different scenarios ( C = 18 ).
n = 500 n = 750 n = 1000
Cens. α ParameterBiasRMSESEBiasRMSESEBiasRMSESE
18 0.8 β 0 1.0241.3131.1680.9681.0830.5250.9711.0670.454
β 11 −0.0890.7040.972−0.0210.3630.363−0.0310.3240.315
β 12 −0.1180.7010.96−0.0460.3410.354−0.0560.3240.307
β 13 2.7532.9311.6792.8032.8480.5192.7602.7960.444
β 2 0.0020.0080.0080.0020.0060.0070.0020.0060.006
α −0.1360.1760.109−0.1430.1660.088−0.1460.1620.076
λ 1 0.0420.0520.0310.0400.0470.0250.0400.0450.022
λ 2 0.0420.0490.0270.0410.0460.0210.0410.0440.019
λ 3 0.0920.0950.0310.0920.0940.0250.0910.0920.022
1 β 0 1.1081.3660.9751.0861.3120.7271.0421.1320.468
β 11 −0.0960.6800.776−0.0780.6480.564−0.0260.3280.325
β 12 −0.1470.6740.763−0.1310.6500.554−0.0800.3270.316
β 13 2.7913.0111.8572.7632.8630.7282.7712.8080.461
β 2 0.0020.0080.0080.0020.0070.0070.0020.0060.006
α −0.1110.2060.174−0.1250.1860.138−0.1270.1730.119
λ 1 0.0510.0620.0370.0490.0560.0300.0480.0540.026
λ 2 0.0450.0520.0270.0460.0500.0220.0460.0490.019
λ 3 0.0970.0990.0300.0950.0960.0240.0950.0970.021
1.2 β 0 1.2781.7601.981.1571.2760.5641.1511.2440.485
β 11 −0.1541.1361.777−0.0650.4150.398−0.0820.3600.343
β 12 −0.2311.1461.763−0.1390.4230.386−0.1470.3760.333
β 13 2.7783.1813.2302.7752.8340.5672.7282.7720.480
β 2 0.0010.0080.0090.0020.0070.0070.0020.0060.006
α −0.0210.3300.279−0.0540.2200.215−0.0680.2000.183
λ 1 0.0660.0810.0470.0620.0700.0370.0600.0670.032
λ 2 0.0510.0580.0300.0490.0540.0240.0480.0520.020
λ 3 0.1000.1030.0290.0990.1000.0230.1000.1010.020
Table 4. AIC and BIC criteria for PPE-MPScr and Wei-MPScr.
Table 4. AIC and BIC criteria for PPE-MPScr and Wei-MPScr.
DistributionPPEWeibull
Cure Rate ModelLoglikeAICBICLoglikeAICBIC
Pocr−1720.23502.43526.7−1752.33518.73591.0
Locr−1725.63513.23537.6−1766.13546.23618.6
NBcr q = 1 −1713.73489.43513.7−1751.63517.33589.6
NBcr q = 2 −1714.53490.93515.2−1749.53512.93585.3
GBcr q = 1 , r = 2 −1713.43488.83513.1−1748.53511.03583.3
Note: Values in bold indicate the lowest values for each criterion. AIC: Akaike information criterion; BIC: Bayesian information criterion; Pocr: Poisson cure rate; Locr: logarithmic cure rate; NBcr: negative binomial cure rate; GBcr: generalized binomial cure rate.
Table 5. Estimates and standard errors (in parenthesis) for the PE-GBcr and WEI-GBcr models with q = 1 and r = 2 for the lobular carcinoma data.
Table 5. Estimates and standard errors (in parenthesis) for the PE-GBcr and WEI-GBcr models with q = 1 and r = 2 for the lobular carcinoma data.
GBcr-PE q = 1 and r = 2
β ^ 0 = 4.2648 ( 0.4299 ) λ ^ 1 = 0.0302 ( 0.0113 ) λ ^ 10 = 0.0401 ( 0.0155 ) λ ^ 19 = 0.0362 ( 0.0153 )
β ^ 11 = 1.4223 ( 0.2481 ) λ ^ 2 = 0.0185 ( 0.0070 ) λ ^ 11 = 0.0515 ( 0.0196 ) λ ^ 20 = 0.0957 ( 0.0400 )
β ^ 12 = 2.4360 ( 0.2434 ) λ ^ 3 = 0.0147 ( 0.0054 ) λ ^ 12 = 0.0436 ( 0.0169 ) λ ^ 21 = 0.0770 ( 0.0330 )
β ^ 13 = 3.8478 ( 0.2545 ) λ ^ 4 = 0.0294 ( 0.0110 ) λ ^ 13 = 0.0396 ( 0.0156 ) λ ^ 22 = 0.0446 ( 0.0197 )
β ^ 2 = 0.0142 ( 0.0035 ) λ ^ 5 = 0.0313 ( 0.0118 ) λ ^ 14 = 0.0487 ( 0.0193 ) λ ^ 23 = 0.0579 ( 0.0263 )
λ ^ 6 = 0.0269 ( 0.0102 ) λ ^ 15 = 0.0489 ( 0.0192 ) λ ^ 24 = 0.0793 ( 0.0379 )
λ ^ 7 = 0.0202 ( 0.0078 ) λ ^ 16 = 0.0494 ( 0.0197 ) λ ^ 25 = 0.2693 ( 0.1463 )
λ ^ 8 = 0.0392 ( 0.0147 ) λ ^ 17 = 0.0776 ( 0.0313 )
λ ^ 9 = 0.0206 ( 0.0079 ) λ ^ 18 = 0.0890 ( 0.0363 )
GBcr-WEI q = 1 and r = 2
β ^ 0 = 3.6481 ( 0.5062 ) α ^ = 4.3117 ( 0.3998 )
β ^ 11 = 1.3520 ( 0.2417 ) ν ^ = 1.3298 ( 0.0554 )
β ^ 12 = 2.3024 ( 0.2405 )
β ^ 13 = 3.6081 ( 0.2621 )
β ^ 2 = 0.0129 ( 0.0034 )
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gómez, Y.M.; Santibañez, J.L.; Calsavara, V.F.; Gómez, H.W.; Gallardo, D.I. A Modified Cure Rate Model Based on a Piecewise Distribution with Application to Lobular Carcinoma Data. Mathematics 2024, 12, 883. https://doi.org/10.3390/math12060883

AMA Style

Gómez YM, Santibañez JL, Calsavara VF, Gómez HW, Gallardo DI. A Modified Cure Rate Model Based on a Piecewise Distribution with Application to Lobular Carcinoma Data. Mathematics. 2024; 12(6):883. https://doi.org/10.3390/math12060883

Chicago/Turabian Style

Gómez, Yolanda M., John L. Santibañez, Vinicius F. Calsavara, Héctor W. Gómez, and Diego I. Gallardo. 2024. "A Modified Cure Rate Model Based on a Piecewise Distribution with Application to Lobular Carcinoma Data" Mathematics 12, no. 6: 883. https://doi.org/10.3390/math12060883

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop