Next Article in Journal
Small but Powerful: The Human Vault RNAs as Multifaceted Modulators of Pro-Survival Characteristics and Tumorigenesis
Previous Article in Journal
KIT Mutations Correlate with Higher Galectin Levels and Brain Metastasis in Breast and Non-Small Cell Lung Cancer
Previous Article in Special Issue
Utility of Continuous Disease Subtyping Systems for Improved Evaluation of Etiologic Heterogeneity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spline Analysis of Biomarker Data Pooled from Multiple Matched/Nested Case–Control Studies

1
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02215, USA
2
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20814, USA
3
Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02215, USA
4
Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA 02215, USA
5
Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02215, USA
*
Author to whom correspondence should be addressed.
Cancers 2022, 14(11), 2783; https://doi.org/10.3390/cancers14112783
Submission received: 5 April 2022 / Revised: 23 May 2022 / Accepted: 31 May 2022 / Published: 3 June 2022
(This article belongs to the Special Issue The Application of Biostatistics in Cancers)

Abstract

:

Simple Summary

This paper presents methods to pool continuous biomarker measurements from multiple studies to estimate the dose–response curves that allow for the nonlinear association between biomarker values and disease risks in matched/nested case–control studies. The approach can be easily applied to pooling projects of cancer studies and user-friendly software for implementing the method can be found on the corresponding author’s website.

Abstract

Pooling biomarker data across multiple studies enables researchers to obtain precise estimates of the association between biomarker measurements and disease risks due to increased sample sizes. However, biomarker measurements often vary significantly across different assays and laboratories; therefore, calibration of the local laboratory measurements to a reference laboratory is necessary before pooling data. We propose two methods for estimating the dose–response curves that allow for a nonlinear association between the continuous biomarker measurements and log relative risk in pooling projects of matched/nested case–control studies. Our methods are based on full calibration and internalized calibration methods. The full calibration method uses calibrated biomarker measurements for all subjects, even for people with reference laboratory measurements, while the internalized calibration method uses the reference laboratory measurements when available and otherwise uses the calibrated biomarker measurements. We conducted simulation studies to compare these methods, as well as a naive method, where data are pooled without calibration. Our simulation and theoretical results suggest that, in estimating the dose–response curves for biomarker-disease relationships, the internalized and full calibration methods perform substantially better than the naive method, and the full calibration approach is the preferred method for calibrating biomarker measurements. We apply our methods in a pooling project of nested case–control studies to estimate the association of circulating Vitamin D levels with risk of colorectal cancer.

1. Introduction

Pooling biomarker data from multiple studies can result in more precise estimates of the associations between the biomarker levels and disease risks due to increased sample sizes and can also facilitate subgroup analysis [1,2,3]. Examples of pooling project examining biomarker-disease associations include the Circulating 25-Hydroxyvitamin D (25(OH)D) and Colorectal Cancer [4], the 25(OH)D and the Risk of Rarer Cancers [5], the Endogenous Hormones, Nutritional Biomarkers and Prostate Cancer Collaborative Group Studies [6,7] and the NCI Breast and Prostate Cancer Cohort Consortium [8].
A statistical challenge in the pooling analysis is potential between-study variation in the biomarker measurements stemmed from assay or laboratory differences. For instance, the measurements of serum 25(OH)D concentration can vary from 20–40% across different assays, laboratories and even seasons of a year [9,10,11]. Also, hormones such as estradiol and testosterone have high variation across assays and laboratories [1,2,3]. Sloan et al. [12] proposed methods for pooling biomarker data from matched/nested case–control studies based on the regression calibration approach [13,14], which is widely used for handling the covariate measurement error problems. To be specific, a calibration procedure is performed by first selecting a reference laboratory, to which a subset of biospecimens from each study are sent for re-assaying. A calibration model is then fitted for each study based on the study-specific local laboratory and the reference laboratory measurements among the subset of biospecimens selected for re-assaying. The models are used to impute the reference laboratory measurements for the remaining observations with only local laboratory measurements available. Due to their rarity, cases are not usually used for assay calibration. Instead, controls are usually chosen for re-assaying in a reference laboratory [15].
However, the existing methods assume that biomarker measurements have a linear relationship with the log relative risk (RR) of diseases [12]. The nonlinear dose–response relationship between biomarker measurement and disease risk is often observed in practice [16,17]. Restricting the biomarker–disease relationship to be linear may lead to biased estimates when the true association is nonlinear. In this paper, we proposed the method for pooling matched/nested case–control studies to allow for a nonlinear relationship between biomarker levels and log relative risk of diseases by using spline functions [12,18]. We first calibrate biomarker measurements across multiple studies. We then obtain estimates for the coefficients of the spline functions based on an approximate conditional likelihood. We also propose an analytic variance formula for the estimated parameters of interest, which takes account of the uncertainty due to estimating the calibration parameters.
In Section 2, we present the models and statistical methods. In Section 3, we evaluate the performance of our methods in simulation studies. In Section 4, we apply the methods to a pooling project investigating the relationship between circulating Vitamin D levels and colorectal cancer, and finally, we present a discussion in Section 5.

2. Methods

2.1. Notation and Model

The logistic regression model with spline functions can be written as:
logit ( P ( Y s j i = 1 | X s j i , Z s j i ) ) = β 0 s j + β X T f ( X s j i ) + β Z T Z s j i ,
where s { 1 , , S } index the studies in the pooling projects, where the first Q studies only use local laboratories for the measurement of biomarkers and thus require calibration; n s j and m s j denote the number of cases and controls, respectively, in the j-th stratum of the s-th study, and within each stratum, i { 1 , 2 , , m s j } denotes controls and i { m s j + 1 , m s j + 2 , , m s j + n s j } denotes cases. Y s j i denotes the binary disease status; X s j i is the biomarker measurements from the reference laboratory, which is included in the model through a nonlinear function f ( · ) , and Z s j i is a p-dimensional vector of potential confounders. β 0 s j is the study and stratum specific intercept, and β X and β Z are vectors of the corresponding regression coefficients. Without further specification, all vectors are column vectors in this paper.
Note that, in nested case–control studies with density sampling [19], β X T f ( X ) f ( X ) represents the log relative risk (RR) comparing two distinct biomarker levels X and X . We focus on methods for point and interval estimates of β X in this paper. To model the possible nonlinear relationship between the biomarker and disease risk, we propose to use the restricted cubic spline method [18]. Our approach works for other spline functions as well. The restricted cubic spline method has advantages of being parsimonious, while allowing for flexibility in characterizing nonlinear curves. Typically, the model fit is not heavily affected by the location of knots, but depends more on the number of knots selected. The recommended numbers of knots are 3, 4, 5, 6, and 7 [20].
Let W s j i be the biomarker measurements from local laboratories. For each study using a local laboratory, a subset of samples were selected and sent to the reference laboratory for re-assaying to obtain reference biomarker measurements; we refer to this subset as the calibration subset. Therefore, for studies that use local laboratories for biomarker measurements, X s j i are available only in the calibration subsets, and W s j i are available for all individuals. Since the local laboratory measurements can vary systematically across different studies, using W s j i instead of X s j i in data analysis can lead to biased estimates of the biomarker–disease relationship.

2.2. Approximate Conditional Likelihood and Calibration Methods

Let vectors X s j and W s j and matrix Z s j contain the corresponding measurements of individuals from the j-th stratum of the s-th study. The conditional likelihood contribution from the j-th stratum of the s-th study using the reference laboratory measurements is:
L s j = P ( Y s j 1 = 0 , , Y s j , m s j = 0 , Y s j , m s j + 1 = 1 , , Y s j , m s j + n s j = 1 | X s j , Z s j , i = 1 m s j + n s j Y s j i = n s j ) = l = 1 n s j exp β 0 s j + β X T f ( X s j , m s j + l ) + β Z T Z s j , m s j + l ( i 1 , , i n s j ) A s j l = 1 n s j exp ( β 0 s j + β X T f ( X s j , i l ) + β Z T Z s j , i l ) = 1 + ( i 1 , , i n s j ) A s j exp β X T l = 1 n s j f ( X s j , i l ) f ( X s j , m s j + l ) + β Z T l = 1 n s j Z s j , i l Z s j , m s j + l 1 ,
where A s j is the set of all subsets of indices of size n s j from { 1 , 2 , , m s j , m s j + 1 , , m s j + n s j } ; { i 1 , i 2 , , i n s j } corresponds to one specific such subset in A s j , and A s j is A s j with the subset { i 1 = m s j + 1 , i 2 = m s j + 2 , , i n s j = m s j + n s j } excluded [19].
However, the conditional likelihood function based on L s j cannot be calculated directly, since X s j is not available to all individuals in the studies that only use local laboratories to measure the biomarkers. To derive an approximate observed conditional likelihood under the matched/nested case–control study design, we make the following ‘surrogacy’ assumption [12] that takes into account of the study design:
P ( Y s j | X s j , W s j , Z s j , i = 1 n s j + m s j Y s j i = n s j ) = P ( Y s j | X s j , Z s j , i = 1 n s j + m s j Y s j i = n s j ) .
This assumption implies that the local laboratory measurements W s j do not contain additional information about the outcome, given the reference laboratory measurements, other covariates of interest, and the matched/nested case–control study design scheme.
Under this surrogacy assumption, the observed likelihood contribution from a stratum based on local laboratory biomarker measurements
L s j = P Y s j 1 = 0 , , Y s j , m s j = 0 , Y s j , m s j + 1 = 1 , , Y s j , m s j + n s j = 1 | W s j , Z s j , i = 1 m s j + n s j Y s j i = n s j
can be written as:
L s j = E X s j | W s j , Z s j , i = 1 n s j + m s j Y s j i = n s j L s j ,
where L s j is defined in Equation (1). For Equation (2), we further expand L s j in Taylor series around X ˜ s j = E X s j | W s j , Z s j , i = 1 m s j + n s j Y s j i = n s j , yielding the following approximate likelihood contribution from the j-th stratum of the s-th study:
L ˜ s j = 1 + ( i 1 , , i n s j ) A exp β X T l = 1 n s j f ( X ˜ s j , i l ) f ( X ˜ s j , m s j + l ) + β Z T l = 1 n s j Z s j , i l Z s j , m s j + l 1 .
This approximation performs best when the conditional variance and covariance of X s j are small or when the biomarker effect is not strong. Section S1 of the Supplementary Materials provides a detailed derivation of the approximate observed conditional likelihood and the conditions when the approximation works well.
To obtain an estimate of X ˜ s j i , for the studies that use local laboratories, the study-specific calibration models can be fitted in the calibration subsets where subjects were selected for re-assaying in the reference laboratory; these subjects therefore have biomarker measurements from both the local and reference laboratories. The calibration models can thus be used to impute the reference biomarker measurements for the remaining subjects of each study that only have local laboratory measurements.
We make the calibration assumption [12] that, given the local laboratory measurements, the mean reference laboratory measurements are approximately independent from other covariates, that is:
E X s j | W s j , Z s j , i = 1 m s j + n s j Y s j i = n s j E X s j | W s j , i = 1 m s j + n s j Y s j i = n s j .
We further assume a linear relationship between the reference laboratory measurements and local laboratory measurements, leading to the following calibration model:
E X s j i | W s j i , i = 1 m s j + n s j Y s j i = n s j = a s + b s W s j i ,
where a s and b s are study-specific model parameters. Note that these calibration parameters are the same across different strata in each study. However, we can relax this constraint by including matching factors in the calibration model; that is, assuming E X s j i | W s j i , M s j i , i = 1 m s j + n s j Y s j i = n s j = a s + b s W s j i + c s T M s j i , where M s j i is the vector of matching factors. Sloan et al. [15] suggested that Model (3) is sufficient in most study settings. In (3), although a linear term of W s j i is typically sufficient to model the X s j i - W s j i relationship, nonlinear terms in W s j i can also be included if appropriate.
The study-specific calibration models are usually fitted only among controls because case biospecimens are often not available in the calibration study subsets. Therefore, the calibration model in practice is typically:
E X s j i | W s j i , Y s j i = 0 = a s , c o + b s , c o W s j i ,
where we use a s , c o and b s , c o to denote the parameters in the calibration model fitted among controls only. Note that a ^ s , c o and b ^ s , c o are generally not consistent estimates of a s and b s in (3). Sloan et al. [12] provide conditions for a ^ s , c o a ^ s and b ^ s , c o b ^ s under the bivariate normality of X s j and W s j in a 1:1 matched/nested case–control study. It is straightforward to generalize their results to the n s j : m s j matching scenarios. Specifically, b ^ s , c o b ^ s when V a r ( X s j | i = 1 m s j + n s j Y s j i = n s j ) V a r ( X s j | Y s j = 0 ) , and a ^ s , c o a ^ s when V a r ( X s j | i = 1 m s j + n s j Y s j i = n s j ) V a r ( X s j | Y s j = 0 ) and E ( X s j | i = 1 m s j + n s j Y s j i = n s j ) E ( X s j | Y s j = 0 ) . In addition, if the biomarker effect is small (i.e., β X 0 ), a ^ s , c o and b ^ s , c o will also be close to a ^ s and b ^ s .
In studies that used the local laboratory for measurement, for the internalized calibration method, the biomarker value in the approximate likelihood L ˜ s j is X ˜ s j i = X s j i if reference laboratory measurement X s j i is available, and X ˜ s j i = E ^ ( X s j i | W s j i , Y s j i = 0 ) otherwise. For the full calibration method, X ˜ s j i = E ^ ( X s j i | W s j i , Y s j i = 0 ) regardless of whether reference laboratory measurement X s j i is available or not [12]. Therefore, for studies using local laboratories for biomarker measurement, all participants’ biomarker measurements are calibrated under the full calibration method while, under the internalized calibration method, the biomarker measurements are calibrated for participants who only have local laboratory measurements available.

2.3. Parameter Estimation

Define a = a 1 , a 2 , , a Q , b = b 1 , b 2 , , b Q and the dose–response parameters β = β X , β Z . The collective set of parameters to be estimated is therefore θ = a , b , β . The joint estimating equations are ψ a , ψ b , ψ β X , ψ β Z = 0 , where ψ a , ψ b , ψ β X , ψ β Z are the estimating functions for their corresponding parameters. Section S2 in the Supplementary Materials contains the technical details.
We can obtain the point estimate of β using a two-step pseudo-maximum likelihood method [21]. In the first step, the estimates of a and b of the calibration models are obtained by fitting linear regressions on the subset of controls chosen for re-assaying in the reference laboratory, and in the second step, β is obtained using pseudo-maximum conditional likelihood method by solving the estimating equations ψ β X ( a ^ , b ^ ) , ψ β Z ( a ^ , b ^ ) = 0 , where a ^ and b ^ are the estimates obtained in the first step.
We can use the sandwich variance formula over the joint estimating equations [ ψ a , ψ b , ψ β X , ψ β Z ] = 0 to estimate V a r ( θ ^ ) . See Section S3 of the Supplementary Materials for technical details.

3. Simulations

We performed a simulation study for a 1:1 matched case–control study design. Define ϵ s j i as the error term in the linear model of X s j i on W s j i . We assumed a similar multivariate normal distribution of X s j i , W s j i and ϵ s j i , as in Sloan et al. [12], such that:
X s j i W s j i ϵ s j i MVN μ x ( μ x a s ) / b s 0 , σ x 2 b s σ w s 2 σ x 2 b s 2 σ w s 2 b s σ w s 2 σ w s 2 0 σ x 2 b s 2 σ w s 2 0 σ x 2 b s 2 σ w s 2 .
This distribution yields the calibration model E ( X s j i | W s j i ) = a s + b s W s j i and C o v ( W s j i , ϵ s j i ) = 0 . The data were generated for each stratum of each study first, and then a case and a control were randomly chosen in each stratum. In the simulation, we set μ x = 0 and σ x 2 = 1 . We assumed four studies in the pooled analysis with 500 case–control pairs (i.e., 1000 total subjects) in each study, and the calibration parameters were set to be a = 3 , 1 , 1 , 3 and b = 0.5 , 0.75 , 1.25 , 1.5 . We set V a r ( W s j i ) = σ w s 2 = 3.8 , 1.7 , 0.6 , 0.4 to ensure a wide range of variation in local laboratory measurements. The stratum-specific intercept β 0 s j was assumed to follow a normal distribution with a mean of 0 and a variance of 0.01, which was chosen to save computer time in the data generation process.
The spline functions were chosen to be restricted cubic splines and for presentational simplicity, we chose three knots, fixed at the (25th, 50th, 75th) quantiles of N ( 0 , 1 ) . We assumed the following risk model without additional covariates:
logit P ( Y s j i = 1 | X s j i ) = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) ,
where f 1 ( X s j i ) = X s j i , f 2 ( X s j i ) = ( X s j i t 1 ) + 3 ( X s j i t 2 ) + 3 t 3 t 1 t 3 t 2 + ( X s j i t 3 ) + 3 t 2 t 1 t 3 t 2 , and t 1 , t 2 , t 3 are the three knots mentioned above [18]. Note that β X 2 = 0 implies a linear relationship between the biomarker level and the log relative risk of disease.
The simulations were performed 1000 times for different combinations of ( β X 1 , β X 2 ) , and calibration proportions, defined as the proportion of controls re-assayed in the reference laboratory. We set | β X 1 | at relatively large values to evaluate the performance of our method, even when the effect of the biomarker is relatively strong, and β X 2 was determined so that the nonlinear relationships across the ranges of β X 1 considered in the tables is moderate. The calibration proportions were 5%, 10%, or 30% in each contributing study, which represents a reasonable range for the size of the calibration subsets in practice.
We compared the performance of both the internalized (IN) and full (FC) calibration methods in terms of percent bias ( β ^ β β × 100 % ) over the simulation replicates and coverage rate, which is defined as the proportion of the estimated 95% confidence intervals containing the true value. We also included the naive (N) method for comparison, where no calibration was performed, and the conditional logistic regression was fitted using the local laboratory measurements directly.
The simulation results in Table 1 and Table 2 are for a biomarker that had an inverse effect on the disease risk and the Supplementary Tables S1 and S2 in Supplementary Materials are for a biomarker that had a positive association with the disease risk. The naive method performed poorly in all scenarios regardless of the calibration proportions. The percent biases of the naive estimates were typically larger than 30%, and the coverage rates were typically below 70%. The internalized and full calibration estimates had consistently better performance than the naive method. Full calibration estimates were robust over many combinations of coefficients and calibration proportions, where the percent biases were typically below 10% when the calibration proportion was 5% and below 5% when calibration proportions were 15% and 30%. The coverage rates ranged from 93% to 97%, which were close to the 95% nominal level. The internalized calibration estimates were less robust than the full calibration estimates, and they tended to be more biased when the calibration proportion was large. Section S4 of the Supplementary Materials contains a mathematical justification for the relative performance of the internalized and full calibration methods.
We also plotted the curves reflecting the biomarker-disease association. The x-axis represents the biomarker values, and the y-axis is the log RR. Figure 1 is for the scenario where β X 1 = log ( 1.5 ) 0.41 and β X 2 = 0.14 , and the calibration proportions were set to be 5% and 30%, respectively. We can see that the curve estimated using the naive method deviated from the true curve substantially, while the curves estimated using the internalized and full calibration methods were closer to the true curve. As the calibration proportion increased, the curve estimated using the full calibration method was closer to the true curve than the internalized calibration method. Supplementary Figure S1 in the Supplementary Materials is for the scenario when β X 1 = log ( 1.75 ) 0.56 and β X 2 = 0.16 , and the calibration proportions were also set to be 5% and 30%, respectively. We can see similar behaviors of the three estimated curves, where the full calibration method led to the estimated curve that was closest to the true curve and was robust over all calibration proportions.
In addition, we changed σ w s 2 in the simulation setup to vary the ratio of σ w s 2 σ x 2 . This ratio for each study was chosen among 0.75, 0.85, 0.90, and 0.95. The simulation results in Table 3 shows that the performance of the calibration models improved when this variance ratio increased; that is, when the error term in the calibration model X | W became smaller. The full calibration method was more robust than the internalized calibration method with smaller percent biases and coverage rates closer to the 95% nominal level for all the variance ratios considered.
Lastly, we set the coefficient of the nonlinear term to 0 (i.e., β X 2 = 0 ) and evaluated the performance of our methods for testing the null hypothesis of no nonlinear effect. Simulation results are reported in Supplementary Tables S4 and S5. Both the full and internalized calibration methods have coverage rates of 95% confidence intervals close to 95% for β X 2 = 0 , suggesting that when there is no nonlinear effect, the Wald test based on the point and variance estimates in the full and internalized calibration methods will have a type-I error rate close to 0.05. However, the naive method has coverage rates ranging from 8% to 70%, indicating that directly pooling biomarker measurements together without calibration may lead to false positive evidence for a nonlinear effect.

4. Applied Example

We applied our methods to evaluate the association of 25(OH)D with colorectal cancer incidence. This example was based on two large cohort studies in the United States: Nurses’ Health Study (NHS) [22] and Health Professionals Follow-up Study (HPFS) [23]. In the NHS, 121,701 female nurses aged 30 to 55 were enrolled in 1976, while in the HPFS, 51,529 male health professionals aged 40 to 75 were enrolled in 1986. To account for assay differences or laboratory drift over time, within each study, all measurements were calibrated to a common assay prior to the analyses. In all, our pooling analysis consisted of 1876 subjects with a nested case–control study design. The matching factors mainly included age at blood collection and date of blood collection. For the calibration subsets, controls in each study were divided into 10 deciles based on the 25(OH)D levels, and three subjects were randomly sampled in each decile, except for one decile, where only two subjects were selected [4]. A total of twenty-nine controls in each nested case–control study were selected to have their blood samples re-assayed at Heartland Assays, LLC (Ames, IA, USA), the reference laboratory, from 2011 to 2013. We refer readers to the paper by McCullough et al. [4] for detailed patient selection criteria.
Table 4 presents sample sizes and parameter estimates along with standard errors of the study-specific calibration models. The potential confounders that were adjusted for in the conditional logistic regression model included smoking (yes/no), BMI (greater or less than 25), physical activity (continuous), and family history of myocardial infarction (yes/no). We used the restricted cubic spline with three knots at the 25%, 50%, and 75% quantiles of the reference 25(OH)D measurements to estimate how the log RR changes with the Vitamin D levels. Table 5 presents the coefficient estimates along with corresponding 95% confidence intervals.
As shown in Table 5, we obtained similar point and confidence interval estimates of coefficients from the internalized and full calibration methods. We observed a significant linear relationship between 25(OH)D measurements and log RR of colorectal cancer (p-value = 0.0211 and 0.0217, for the internalized and full calibration methods, respectively), while the nonlinear term was not significant (p-value = 0.2162 and 0.2219, for the internalized and full calibration methods, respectively). Therefore, we concluded that circulating Vitamin D level had a significant linear association with the log relative risk of colorectal cancer.
After dropping the nonlinear term from the conditional logistic regression model and refitting the model using the linear method proposed by Sloan et al. [12], in Table 5, the point estimate of the biomarker effect on the log RR of colorectal cancer based on full calibration method was 0.0059 ( RR = 0.9941 ) for a 1 nmol/L increase in 25(OH)D, with a p-value of 0.0177, and the 95% confidence interval was ( 0.0108 , 0.0010 ) , suggesting a significant negative linear relationship between levels of circulating 25(OH)D measurements and the log relative risk of colorectal cancer.
In Figure 2, we plotted the log RR of colorectal cancer on circulating 25(OH)D measurements under both models with and without the nonlinear spline term. We set the reference level to be individuals with the minimum 25(OH)D measurement 9.734 nmol/L in the aggregated study.

5. Discussion

In this paper, we propose statistical methods for analyzing pooled matched/nested case–control studies. To apply our method, an assumption is that, for each study, the relationship between the measurements from the local laboratory and those from the reference laboratory estimated in the calibration subset represents that in the entire study. This assumption is likely to be violated for laboratories that do not follow good laboratory practices, even if the calibration subset is selected randomly. Our methods can estimate a possibly nonlinear dose–response curve between biomarker measurements and the diseases and can evaluate whether the relationship is linear or not. We focus on the common situation in which only controls are selected for re-assaying in the reference laboratory. We derived an analytic expression for the variance–covariance matrix of the estimated coefficients in the conditional logistic regression model that takes account of the uncertainty from fitting the study-specific calibration models.
Several remarks and recommendations can be drawn from our work. The full calibration method led to estimates with smaller percent biases in all simulation scenarios, and coverage rates were closer to the 95% nominal level. As the calibration proportion increased, the internalized calibration method became more biased than the full calibration method. Since the calibration model was fitted among controls only, estimates of the model parameters were slightly biased. The bias in the intercept was canceled out in the approximate likelihood function in the full calibration method but not in the internalized calibration method. Therefore, we recommend using the full calibration method for analyses that require the calibration of local laboratory measurements to a reference laboratory. When designing a new study, as discussed in Sloan et al. [15], the calibration set sample size should be sufficiently large so that the estimates in the calibration models are stable. Based on our simulation study, the full calibration method can perform reasonably well when the calibration proportion is at least 5% in studies with 500 case–control pairs and the laboratory errors are moderate.
Our method can be used to estimate dose–response curves between biomarkers and disease risks. Based on the proposed analytic variance estimators that account for the calibration process, the method can also be used to perform statistical tests to evaluate the existence of nonlinear trends. When, in fact, β X 2 = 0 , the estimated coefficient of the linear term from our method including both linear and nonlinear terms should be similar to that from the linear model method by Sloan et al. [12], but the variance in the estimated coefficient of the linear term from our model may be slightly larger than that from the model including the linear term alone due to estimating additional coefficients for the nonlinear terms in our method. If there is not enough evidence to reject the null hypothesis of no nonlinear effects, we recommend re-fitting the model using the linear method proposed by Sloan et al. [12].
The R code for pooling matched/nested case–control study using restricted cubic spline functions is available at https://www.hsph.harvard.edu/molin-wang/section-3-pooling-biomarker-data/ (accessed on 1 April 2022).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers14010000/s1. Supplementary Table S1: β X 2 = 0.16 ; Supplementary Table S2: β X 1 = log ( 1.5 ) ; Supplementary Table S3: β X 1 = 0.48 , β X 2 = 0.16 ; Supplementary Tables S4 and S5: We set the nonlinear effect to be 0 (i.e., β X 2 = 0 ). Supplementary Figure S1: Curves reflecting the association of biomarker measurements on disease risk when β X 1 = log ( 1.75 ) , β X 2 = 0.16 .

Author Contributions

Conceptualization, M.W.; methodology, M.W., Y.W. and M.G.; software, Y.W.; formal analysis, Y.W.; investigation, Y.W.; resources, R.Z. and S.S.-W.; data curation, R.Z. and S.S.-W.; writing—original draft preparation, Y.W.; writing—review and editing, M.W., M.G., R.Z. and S.S.-W.; visualization, Y.W.; supervision, M.W.; project administration, M.W.; funding acquisition, M.W. and S.S.-W. All authors have read and agreed to the published version of the manuscript.

Funding

M.W. and Y.W. were supported in part by NIH/NCI grant R03CA212799. This work was supported in part by U01CA167552, UM1CA167552, UM1CA186107, R35CA197735, and the Intramural Program of the National Cancer Institute, Division of Cancer Epidemiology and Genetics.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health (2001P001945) on 20 October 2020.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets used in the real data analysis are not publicly available. Further information including the procedures to obtain and access data from the Nurses’ Health Studies and the Health Professionals Follow-up Study is described at https://www.nurseshealthstudy.org/researchers/ and https://sites.sph.harvard.edu/hpfs/for-collaborators/ (accessed on 1 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Key, T.J.; Appleby, P.N.; Allen, N.E.; Reeves, G.K. Pooling biomarker data from different studies of disease risk, with a focus on endogenous hormones. Cancer Epidemiol. Biomarkers Prev. 2010, 19, 960–965. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Smith-Warner, S.A.; Spiegelman, D.; Ritz, J.; Albanes, D.; Beeson, W.L.; Bernstein, L.; Berrino, F.; Van Den Brandt, P.A.; Buring, J.E.; Cho, E.; et al. Methods for pooling results of epidemiologic studies: The Pooling Project of Prospective Studies of Diet and Cancer. Am. J. Epidemiol. 2006, 163, 1053–1064. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Tworoger, S.S.; Hankinson, S.E. Use of biomarkers in epidemiologic studies: Minimizing the influence of measurement error in the study design and analysis. Cancer Causes Control 2006, 17, 889–899. [Google Scholar] [CrossRef] [PubMed]
  4. McCullough, M.L.; Zoltick, E.S.; Weinstein, S.J.; Fedirko, V.; Wang, M.; Cook, N.R.; Eliassen, A.H.; Zeleniuch-Jacquotte, A.; Agnoli, C.; Albanes, D.; et al. Circulating vitamin D and colorectal cancer risk: An international pooling project of 17 cohorts. J. Natl. Cancer Inst. 2019, 111, 158–169. [Google Scholar] [CrossRef] [PubMed]
  5. Gallicchio, L.; Helzlsouer, K.J.; Chow, W.H.; Freedman, D.M.; Hankinson, S.E.; Hartge, P.; Hartmuller, V.; Harvey, C.; Hayes, R.B.; Horst, R.L.; et al. Circulating 25-hydroxyvitamin D and the risk of rarer cancers: Design and methods of the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers. Am. J. Epidemiol. 2010, 172, 10–20. [Google Scholar] [CrossRef] [PubMed]
  6. Crowe, F.L.; Appleby, P.N.; Travis, R.C.; Barnett, M.; Brasky, T.M.; Bueno-de Mesquita, H.B.; Chajes, V.; Chavarro, J.E.; Chirlaque, M.D.; English, D.R.; et al. Circulating fatty acids and prostate cancer risk: Individual participant meta-analysis of prospective studies. J. Natl. Cancer Inst. 2014, 106, dju240. [Google Scholar] [CrossRef] [PubMed]
  7. Key, T.J.; Appleby, P.N.; Travis, R.C.; Albanes, D.; Alberg, A.J.; Barricarte, A.; Black, A.; Boeing, H.; Bueno-de Mesquita, H.B.; Chan, J.M.; et al. Carotenoids, retinol, tocopherols, and prostate cancer risk: Pooled analysis of 15 studies. Am. J. Clin. Nutr. 2015, 102, 1142–1157. [Google Scholar] [CrossRef] [Green Version]
  8. Tsilidis, K.K.; Travis, R.C.; Appleby, P.N.; Allen, N.E.; Lindström, S.; Albanes, D.; Ziegler, R.G.; McCullough, M.L.; Siddiq, A.; Barricarte, A.; et al. Insulin-like growth factor pathway genes and blood concentrations, dietary protein and risk of prostate cancer in the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). Int. J. Cancer 2013, 133, 495–504. [Google Scholar] [CrossRef] [Green Version]
  9. Barake, M.; Daher, R.T.; Salti, I.; Cortas, N.K.; Al-Shaar, L.; Habib, R.H.; Fuleihan, G.E.H. 25-hydroxyvitamin D assay variations and impact on clinical decision making. J. Clin. Endocrinol. Metab. 2012, 97, 835–843. [Google Scholar] [CrossRef] [Green Version]
  10. Lai, J.K.; Lucas, R.M.; Banks, E.; Ponsonby, A.L.; Ausimmune Investigator Group. Variability in vitamin D assays impairs clinical assessment of vitamin D status. Intern. Med. J. 2012, 42, 43–50. [Google Scholar] [CrossRef] [Green Version]
  11. Snellman, G.; Melhus, H.; Gedeborg, R.; Byberg, L.; Berglund, L.; Wernroth, L.; Michaelsson, K. Determining vitamin D status: A comparison between commercially available assays. PLoS ONE 2010, 5, e11555. [Google Scholar] [CrossRef]
  12. Sloan, A.; Smith-Warner, S.A.; Ziegler, R.G.; Wang, M. Statistical methods for biomarker data pooled from multiple nested case–control studies. Biostatistics 2021, 22, 541–557. [Google Scholar] [CrossRef] [Green Version]
  13. Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models: A Modern Perspective; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
  14. Rosner, B.; Spiegelman, D.; Willett, W. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: The case of multiple covariates measured with error. Am. J. Epidemiol. 1990, 132, 734–745. [Google Scholar] [CrossRef]
  15. Sloan, A.; Song, Y.; Gail, M.H.; Betensky, R.; Rosner, B.; Ziegler, R.G.; Smith-Warner, S.A.; Wang, M. Design and analysis considerations for combining data from multiple biomarker studies. Stat. Med. 2019, 38, 1303–1320. [Google Scholar] [CrossRef]
  16. Abbas, S.; Chang-Claude, J.; Linseisen, J. Plasma 25-hydroxyvitamin D and premenopausal breast cancer risk in a German case-control study. Int. J. Cancer 2009, 124, 250–255. [Google Scholar] [CrossRef]
  17. Bauer, S.R.; Hankinson, S.E.; Bertone-Johnson, E.R.; Ding, E.L. Plasma vitamin D levels, menopause, and risk of breast cancer: Dose-response meta-analysis of prospective studies. Medicine 2013, 92, 123. [Google Scholar] [CrossRef] [Green Version]
  18. Durrleman, S.; Simon, R. Flexible regression models with cubic splines. Stat. Med. 1989, 8, 551–561. [Google Scholar] [CrossRef]
  19. Breslow, N.; Day, N.; Halvorsen, K.; Prentice, R.; Sabai, C. Estimation of multiple relative risk functions in matched case-control studies. Am. J. Epidemiol. 1978, 108, 299–307. [Google Scholar] [CrossRef]
  20. Harrell, F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis; Springer: Berlin/Heidelberg, Germany, 2015; Volume 3. [Google Scholar]
  21. Gong, G.; Samaniego, F.J. Pseudo maximum likelihood estimation: Theory and applications. Ann. Stat. 1981, 9, 861–869. [Google Scholar] [CrossRef]
  22. Colditz, G.A.; Manson, J.E.; Hankinson, S.E. The Nurses’ Health Study: 20-year contribution to the understanding of health among women. J. Women’s Health 1997, 6, 49–62. [Google Scholar] [CrossRef]
  23. Choi, H.K.; Atkinson, K.; Karlson, E.W.; Curhan, G. Obesity, weight change, hypertension, diuretic use, and risk of gout in men: The health professionals follow-up study. Arch. Intern. Med. 2005, 165, 742–748. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The average of the dose–response curves over 1000 simulations. X-axis is the biomarker measurement, and y-axis is the log RR of the disease. The solid line is the true curve, and the dotted and dashed lines were estimated using internalized (IN) and full calibration methods (FC), respectively, while the dashed-dotted line is estimated using the naive method (N). The calibration proportion is 5% (left) and 30% (right), and the coefficients of the spline functions are set to be log ( 1.5 ) 0.41 and 0.14, respectively.
Figure 1. The average of the dose–response curves over 1000 simulations. X-axis is the biomarker measurement, and y-axis is the log RR of the disease. The solid line is the true curve, and the dotted and dashed lines were estimated using internalized (IN) and full calibration methods (FC), respectively, while the dashed-dotted line is estimated using the naive method (N). The calibration proportion is 5% (left) and 30% (right), and the coefficients of the spline functions are set to be log ( 1.5 ) 0.41 and 0.14, respectively.
Cancers 14 02783 g001
Figure 2. Log colorectal cancer RR for levels of circulating 25(OH)D compared to the reference level, 9.734 nmol/L, based on the full calibration method.
Figure 2. Log colorectal cancer RR for levels of circulating 25(OH)D compared to the reference level, 9.734 nmol/L, based on the full calibration method.
Cancers 14 02783 g002
Table 1. Comparison of operating characteristics for β X under the model P Y s j i = 1 | X s j i = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) for Internalized calibration (IN), Full calibration (FC), and Naive methods (N). Relative bias is computed using β ^ β β , and the reported value in the table is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 2 is fixed at 0.08.
Table 1. Comparison of operating characteristics for β X under the model P Y s j i = 1 | X s j i = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) for Internalized calibration (IN), Full calibration (FC), and Naive methods (N). Relative bias is computed using β ^ β β , and the reported value in the table is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 2 is fixed at 0.08.
Calib. Size β X 1 Relative Bias of β X 1 (SD)Coverage Rate of β X 1 Relative Bias of β X 2 (SD)Coverage Rate of β X 2
INFCNINFCNINFCNINFCN
5% log ( 1.25 ) −1.6% (2.621)−0.1% (2.959)−44.4% (0.110)0.9700.9720.458−3.6% (0.192)−0.6% (0.196)−72.2% (0.018)0.9680.9710.222
log ( 1.5 ) −1.2% (2.601)−0.3% (2.972)−23.7% (0.160)0.9640.9660.518−5.2% (0.220)−2.1% (0.228)−36% (0.025)0.9620.9620.736
log ( 1.75 ) −1.1% (2.846)−0.4% (3.288)−16.6% (0.207)0.9640.9660.569−8.5% (0.258)−5.3% (0.257)−7.6% (0.033)0.9570.9590.941
log ( 2 ) −1.4% (2.828)−0.8% (3.212)−12.8% (0.261)0.9680.9700.642−11.9% (0.337)−8.7% (0.339)18.2% (0.040)0.9550.9580.888
log ( 2.25 ) −1% (3.707)−0.5% (4.481)−10% (0.324)0.9750.9800.707−10.8% (0.357)−7.5% (0.371)41.1% (0.051)0.9540.9550.766
log ( 2.5 ) −1.1% (3.785)−0.6% (4.486)−8.2% (0.405)0.9610.9630.754−12% (0.385)−8.5% (0.393)62% (0.066)0.9440.9440.609
log ( 2.75 ) −1% (4.444)−0.6% (5.136)−7.2% (0.452)0.9650.9630.788−12% (0.517)−8.7% (0.535)78% (0.075)0.9540.9540.497
15% log ( 1.25 ) −7.1% (0.694)−2.3% (0.874)−44.1% (0.120)0.9660.9690.476−13.6% (0.095)−4.3% (0.096)−70.8% (0.019)0.9440.9530.267
log ( 1.5 ) −3.9% (0.854)−1.1% (1.134)−24.2% (0.154)0.9570.9560.511−13.4% (0.115)−3.8% (0.115)−36.4% (0.024)0.9470.9520.739
log ( 1.75 ) −2.4% (0.836)−0.3% (1.123)−17.1% (0.197)0.9550.9620.564−12.8% (0.145)−2.9% (0.147)−8.8% (0.031)0.9410.9500.937
log ( 2 ) −1.8% (0.911)0% (1.243)−12.7% (0.265)0.9610.9660.638−11.9% (0.174)−1.9% (0.175)19.4% (0.043)0.9510.9490.897
log ( 2.25 ) −1.4% (1.218)0.3% (1.797)−9.7% (0.326)0.9610.9720.722−14.9% (0.215)−4.6% (0.217)42.1% (0.051)0.9440.9470.765
log ( 2.5 ) −2.1% (1.108)−0.6% (1.595)−8.3% (0.395)0.9410.9480.761−18.4% (0.247)−7.6% (0.249)62.0% (0.064)0.9380.9500.620
log ( 2.75 ) −1.4% (1.273)0.1% (1.926)−6.7% (0.461)0.9480.9600.805−16.9% (0.295)−6.1% (0.299)79.7% (0.076)0.9410.9500.481
30% log ( 1.25 ) −9.9% (0.519)−0.3% (0.455)−43.3% (0.113)0.9430.9550.477−21.1% (0.083)−2.3% (0.084)−70.8% (0.019)0.9370.9540.265
log ( 1.5 ) −5.1% (0.514)0.6% (0.561)−23.2% (0.164)0.9440.9510.533−19.7% (0.096)−0.5% (0.096)−35.9% (0.026)0.9480.9570.751
log ( 1.75 ) −4.5% (0.564)−0.2% (0.659)−16.7% (0.211)0.9400.9600.576−23.6% (0.129)−4.1% (0.129)−7.7% (0.034)0.9410.9620.926
log ( 2 ) −4.0% (0.581)−0.3% (0.755)−12.8% (0.258)0.9370.9610.631−25.5% (0.156)−5.4% (0.155)18.7% (0.042)0.9360.9560.900
log ( 2.25 ) −3.6% (0.661)−0.3% (0.885)−10.1% (0.328)0.9260.9310.698−28.1% (0.193)−7.4% (0.193)41.1% (0.052)0.9220.9330.769
log ( 2.5 ) −3.1% (0.719)0.1% (0.988)−8.3% (0.409)0.9390.9540.749−24.2% (0.233)−3% (0.234)62.1% (0.065)0.9380.9480.613
log ( 2.75 ) −3.5% (0.835)−0.6% (1.109)−7.2% (0.459)0.9260.9540.79430.1% (0.274)−8.9% (0.272)78.6% (0.076)0.9100.9430.506
Table 2. Comparison of operating characteristics for β X under the model P Y s j i = 1 | X s j i = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) for Internalized calibration (IN), Full calibration (FC), and Naive methods (N). Relative bias is computed using β ^ β β , and the reported value in the table is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 1 is fixed at log ( 1.5 ) .
Table 2. Comparison of operating characteristics for β X under the model P Y s j i = 1 | X s j i = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) for Internalized calibration (IN), Full calibration (FC), and Naive methods (N). Relative bias is computed using β ^ β β , and the reported value in the table is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 1 is fixed at log ( 1.5 ) .
Calib. Size β X 2 Relative Bias of β X 1 (SD)Coverage Rate of β X 1 Relative Bias of β X 2 (SD)Coverage Rate of β X 2
INFCNINFCNINFCNINFCN
5%0.02−1.3% (2.838)−0.4% (3.140)−6.5% (0.178)0.9680.9700.898−21.6% (0.215)−9.3% (0.218)188.8% (0.029)0.9620.9580.624
0.06−2.2% (3.199)−1.2% (3.528)−18.1% (0.159)0.9600.9670.686−12.9% (0.234)−8.8% (0.247)−12.4% (0.025)0.9490.9540.938
0.10−2.5% (2.515)−1.6% (2.956)−29.5% (0.144)0.9680.9690.339−8.1% (0.197)−5.6% (0.206)50.8% (0.023)0.9590.9570.399
0.14−1.5% (2.815)−0.6% (3.179)−41.7% (0.131)0.9670.9700.078−4.1% (0.194)−2.3% (0.198)−69.3% (0.021)0.9530.9510.010
0.18−2.1% (2.636)−1.2% (3.149)−52.8% (0.127)0.9680.9690.012−5.0% (0.219)−3.6% (0.225)−78.0% (0.020)0.9500.9540.000
15%0.02−2.7% (0.889)0.1% (1.072)−5.8% (0.176)0.9510.9530.922−49.7% (0.126)−11.7% (0.126)191.9% (0.027)0.9280.9380.630
0.06−3.3% (0.771)−0.5% (0.994)−17.7% (0.161)0.9540.9700.699−19.0% (0.122)−6.2% (0.123)−11.6% (0.026)0.9380.9480.933
0.10−3.3% (0.867)−0.5% (0.982)−29.8% (0.143)0.9420.9440.335−10.3% (0.114)−2.6% (0.113)−52.1% (0.023)0.9330.9480.376
0.14−3.7% (0.790)−0.9% (1.059)−41.8% (0.136)0.9520.9610.083−7.9% (0.113)−2.3% (0.114)−69.2% (0.022)0.9540.9620.008
0.18−4.3% (0.851)−1.4% (1.163)−53.6% (0.118)0.9590.9640.003−7.2% (0.108)−2.7% (0.109)−78.5% (0.019)0.9560.9580.000
30%0.02−6.0% (0.546)−0.5% (0.569)−6.5% (0.175)0.9450.9630.916−85.6% (0.110)−10.4% (0.111)189.7% (0.028)0.9320.9570.640
0.06−5.6% (0.556)0.0% (0.583)−18.0% (0.162)0.9390.9550.682−28.5% (0.101)−2.9% (0.101)−12.3% (0.026)0.9260.9520.928
0.10−6.1% (0.536)−0.4% (0.619)−29.4% (0.151)0.9270.9430.342−19.2% (0.097)−3.6% (0.097)−51.7% (0.024)0.9290.9420.369
0.14−6.4% (0.458)−0.8% (0.528)−41.1% (0.135)0.9510.9540.080−13.9% (0.092)−2.7% (0.091)−68.2% (0.022)0.9270.9530.011
0.l8−6.9% (0.493)−1.2% (0.618)−53.6% (0.125)0.9350.9570.006−11.8% (0.090)−2.8% (0.091)−78.6% (0.020)0.9200.9500.000
Table 3. Comparison of operating characteristics for β X under the model P Y s j i = 1 | X s j i = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) for Internalized calibration (IN), Full calibration (FC), and Naive methods (N) with different σ w s 2 σ X 2 . Relative bias is computed using β ^ β β , and the reported value is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 1 = 0.25 , β X 2 = 0.08 .
Table 3. Comparison of operating characteristics for β X under the model P Y s j i = 1 | X s j i = β 0 s j + β X 1 f 1 ( X s j i ) + β X 2 f 2 ( X s j i ) for Internalized calibration (IN), Full calibration (FC), and Naive methods (N) with different σ w s 2 σ X 2 . Relative bias is computed using β ^ β β , and the reported value is the average over the 1000 simulation replicates. Coverage rate is the proportion of simulations that yield a 95% confidence interval covering the true parameter. Standard deviation is the square root of the empirical variance of parameter estimates over all replicates; we report 10 3 times the standard deviation. The calibration proportions (denoted as Calib. size in the table) were set to be 5%, 15%, and 30%. β X 1 = 0.25 , β X 2 = 0.08 .
Calib. Size σ ws 2 σ x 2 Relative Bias of β X 1 (SD)Coverage Rate of β X 1 Relative Bias of β X 2 (SD)Coverage Rate of β X 2
INFCNINFCNINFCNINFCN
5%0.75−14.1% (22.094)−5.1% (27.250)−40.6% (0.145)0.9700.9830.509−35.0% (1.966)−15.2% (2.444)−69.5% (0.023)0.9420.9680.375
0.85−9.0% (8.468)−4.4% (9.954)−40.9% (0.128)0.9660.9760.479−20.5% (0.700)−10.4% (0.800)−68.2% (0.020)0.9440.9580.364
0.90−5.3% (4.876)−2.5% (5.373)−40.5% (0.125)0.9670.9760.460−10.6% (0.316)−4.4% (0.336)−67.0% (0.020)0.9430.9540.338
0.95−1.5% (2.398)−0.2% (2.626)−38.0% (0.123)0.9500.9530.504−4.7% (0.184)−1.7% (0.187)−64.7% (0.019)0.9470.9530.352
15%0.75−31.1% (5.298)−3.9% (6.133)−40.7% (0.151)0.9080.9760.528−71.3% (0.392)−11.7% (0.420)−68.9% (0.024)0.8080.9570.383
0.85−15.6% (2.250)−1.8% (2.826)−39.8% (0.135)0.9540.9750.500−35.4% (0.200)−5.0% (0.213)−67.0% (0.021)0.9270.9660.344
0.90−10.4% (1.456)−1.7% (1.794)−38.7% (0.124)0.9510.9600.504−24.6% (0.145)−5.4% (0.152)−65.5% (0.020)0.9330.9600.351
0.95−4.5% (0.732)−0.5% (0.838)−38.6% (0.130)0.9610.9610.476−11.4% (0.097)−2.5% (0.098)−65.6% (0.020)0.9490.9600.318
30%0.75−58.0% (3.127)−4.4% (3.189)−40.3% (0.144)0.6390.9770.547−130.8% (0.209)−13.4% (0.224)−68.3% (0.023)0.3900.9510.379
0.85−31.6% (1.467)−4.0% (1.350)−40.1% (0.132)0.8370.9670.494−70.5% (0.128)−9.8% (0.131)−67.3% (0.021)0.7800.9510.361
0.90−18.9% (0.913)−1.7% (1.008)−39.1% (0.127)0.910.9610.478−42.8% (0.105)−4.9% (0.108)−66.4% (0.020)0.8910.9570.336
0.95−9.5% (0.475)−1.4% (0.456)−37.7% (0.122)0.9240.9350.492−22.9% (0.083)−5.1% (0.084)−64.0% (0.019)0.9200.9350.336
Table 4. Number of cases and controls, size of the calibration study ( n c a l ), and the estimated intercept and slope of the calibration model for each study in the pooled analysis.
Table 4. Number of cases and controls, size of the calibration study ( n c a l ), and the estimated intercept and slope of the calibration model for each study in the pooled analysis.
StudyCases/Controls n cal a ^ (SE) b ^ (SE)
NHS348/69429−3.56 (2.72)1.13 (0.97)
HPFS267/519293.38 (2.95)0.05 (0.04)
Table 5. Point estimates and 95% confidence intervals for the nonlinear (and linear) association of circulating 25(OH)D (nmol/L) with colorectal cancer after adjusting for BMI (overweight or not), physical activity (continuous), smoking (never/ever), and family history of colorectal cancer (yes/no).
Table 5. Point estimates and 95% confidence intervals for the nonlinear (and linear) association of circulating 25(OH)D (nmol/L) with colorectal cancer after adjusting for BMI (overweight or not), physical activity (continuous), smoking (never/ever), and family history of colorectal cancer (yes/no).
Method β X 1 β X 2
Internalized calibration−0.0116 (−0.0214, −0.0017)7.9307 × 10 6 (−4.6375 × 10 6 , 2.0499 × 10 5 )
Full calibration−0.0115 (−0.0213, −0.0017)7.9885 × 10 6 (−4.8290 × 10 6 , 2.0806 × 10 5 )
Linear model (FC)−0.0059 (−0.0108, −0.0010)-
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, Y.; Gail, M.; Smith-Warner, S.; Ziegler, R.; Wang, M. Spline Analysis of Biomarker Data Pooled from Multiple Matched/Nested Case–Control Studies. Cancers 2022, 14, 2783. https://doi.org/10.3390/cancers14112783

AMA Style

Wu Y, Gail M, Smith-Warner S, Ziegler R, Wang M. Spline Analysis of Biomarker Data Pooled from Multiple Matched/Nested Case–Control Studies. Cancers. 2022; 14(11):2783. https://doi.org/10.3390/cancers14112783

Chicago/Turabian Style

Wu, Yujie, Mitchell Gail, Stephanie Smith-Warner, Regina Ziegler, and Molin Wang. 2022. "Spline Analysis of Biomarker Data Pooled from Multiple Matched/Nested Case–Control Studies" Cancers 14, no. 11: 2783. https://doi.org/10.3390/cancers14112783

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop