Next Article in Journal
Antimicrobial Contribution of Chitosan Surface-Modified Nanoliposomes Combined with Colistin against Sensitive and Colistin-Resistant Clinical Pseudomonas aeruginosa
Previous Article in Journal
Convection-Enhanced Delivery of a First-in-Class Anti-β1 Integrin Antibody for the Treatment of High-Grade Glioma Utilizing Real-Time Imaging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Algorithm for Nonparametric Estimation of a Multivariate Mixing Distribution with Applications to Population Pharmacokinetics

by
Walter M. Yamada
1,†,
Michael N. Neely
1,2,†,
Jay Bartroff
3,†,
David S. Bayard
1,4,†,
James V. Burke
5,†,
Mike van Guilder
1,†,
Roger W. Jelliffe
1,†,
Alona Kryshchenko
1,6,†,
Robert Leary
7,†,
Tatiana Tatarinova
8,† and
Alan Schumitzky
1,3,*
1
Laboratory of Applied Pharmacokinetics and Bioinformatics, Children’s Hospital of Los Angeles, Los Angeles, CA 90027, USA
2
Pediatric Infectious Diseases, Children’s Hospital of Los Angeles, Keck School of Medicine, University of Southern California, Los Angeles, CA 90027, USA
3
Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA
4
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA
5
Department of Mathematics, University of Washington, Seattle, WA 98195, USA
6
Department of Mathematics, California State University Channel Islands, University Dr, Camarillo, CA 93012, USA
7
Certara, Raleigh, NC 27606, USA
8
Department of Biology, University of La Verne, La Verne, CA 91750, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Pharmaceutics 2021, 13(1), 42; https://doi.org/10.3390/pharmaceutics13010042
Submission received: 5 November 2020 / Revised: 11 December 2020 / Accepted: 23 December 2020 / Published: 30 December 2020
(This article belongs to the Section Pharmacokinetics and Pharmacodynamics)

Abstract

:
Population pharmacokinetic (PK) modeling has become a cornerstone of drug development and optimal patient dosing. This approach offers great benefits for datasets with sparse sampling, such as in pediatric patients, and can describe between-patient variability. While most current algorithms assume normal or log-normal distributions for PK parameters, we present a mathematically consistent nonparametric maximum likelihood (NPML) method for estimating multivariate mixing distributions without any assumption about the shape of the distribution. This approach can handle distributions with any shape for all PK parameters. It is shown in convexity theory that the NPML estimator is discrete, meaning that it has finite number of points with nonzero probability. In fact, there are at most N points where N is the number of observed subjects. The original infinite NPML problem then becomes the finite dimensional problem of finding the location and probability of the support points. In the simplest case, each point essentially represents the set of PK parameters for one patient. The probability of the points is found by a primal-dual interior-point method; the location of the support points is found by an adaptive grid method. Our method is able to handle high-dimensional and complex multivariate mixture models. An important application is discussed for the problem of population pharmacokinetics and a nontrivial example is treated. Our algorithm has been successfully applied in hundreds of published pharmacometric studies. In addition to population pharmacokinetics, this research also applies to empirical Bayes estimation and many other areas of applied mathematics. Thereby, this approach presents an important addition to the pharmacometric toolbox for drug development and optimal patient dosing.

Graphical Abstract

1. Introduction

Pharmacokinetic studies in healthy volunteers commonly collect multiple observations in each subject. These datasets often arise from Phase I clinical trials and have been traditionally analyzed by noncompartmental methods or by modeling via the standard two-stage approach. Patient datasets often contain complex dosage regimens and only a few (or one) observation(s) per patient. To analyze such sparse datasets, e.g., in pediatric or critically ill patients, population modeling is required, since noncompartmental analysis and the standard two-stage approach are not applicable [1,2]. Population modeling has been shown to estimate PK parameters without bias and with good precision for such sparse datasets [3]. Parametric population modeling algorithms are commonly used and assume typically either normal or log-normal distributions for the between subject variability of PK parameters. While this assumption is made for virtually every parametric population PK model, it is difficult to prove, especially for datasets with a small number of subjects. Nonparametric population modeling can describe a multivariate distribution of PK parameters without assuming any shape of the PK parameter distribution. This is a key advantage of the nonparametric approach that is based on the exact log-likelihood. The present work comprehensively describes the foundation of this nonparametric estimation algorithm for the first time. This algorithm has been used in hundreds of peer-reviewed papers.
Pharmacometric observations can be described statistically by a mixture model. In this case, the probability of random variable arguments (the PK population model) of the pharmacokinetic compartmental model are described by a mixing distribution. The problem of estimating the mixing distribution from a set of pharmacometric observations can be stated as follows. Let Y 1 , , Y N be a sequence of independent but not necessarily identically distributed random vectors constructed from one or more observations from each of N subjects in the population. Let θ 1 , , θ N be a sequence of independent and identically distributed random vectors that represent unknown parameter values of N subjects. The θ belong to a compact subset Θ of Euclidean space with common but unknown distribution F. In other words, F is a distribution of the parameters in the population model. Θ represents the parameter space, and the dimension of this space is the number of parameters in the population model. The { θ i } are not observed. It is assumed that the conditional densities p ( Y i | θ i ) are known, for i = 1 , , N . p ( Y i | θ i ) represents the model of observations y i given parameter values θ i including uncertainties of the measurement protocol. The mixing distribution of Y i with respect to F is given by p ( Y i | F ) = p ( Y i | θ i ) d F ( θ i ) . Because of independence of the { Y i } , the mixing distribution of the { Y i } with respect to F is given by
L ( F ) = p ( Y 1 , . . . , Y N | F ) = i = 1 N p Y i | θ i d F θ i
The mixing distribution problem is to maximize the likelihood function L ( F ) with respect to all probability distributions F on Θ .
Remark 1.
The distribution F M L that maximizes L ( F ) is a consistent estimator of the true mixing distribution; which means that F M L will converge to the true distribution if the number of subjects is large. This was proved originally by Kiefer and Wolfowitz in 1956 [4]. The consistency of F M L is especially important for our application to population pharmacokinetics where F M L is used as a prior distribution for Bayesian dosage regimen design [5].
The algorithm described in this paper differs from most other published methods in a number of ways. Our algorithm allows for high dimensional parameter space Θ . Most published methods require the dimension of Θ to be small and many require the dimension of Θ to be 1, i.e., these methods required the number of parameters in the population model to be small. We have treated examples where the dimension of Θ is as high as 29, see Section 3.
Most published algorithms require the { Y i } to be identically distributed and assume that the population model { p ( Y i | θ i ) } is rather simple, such as p ( Y i | θ i ) is a multivariate normal density with mean vector θ i and covariance matrix Σ . Even if Σ is unknown and has to be estimated, the structure of this model is straightforward. However, the estimation of Σ has to be done carefully to avoid singularities, see Wang and Wang [6]. As will be described in Section 3, we allow p ( Y i | θ i ) to be calculated from a system of nonlinear ordinary differential–algebraic equations.
We now describe the details of our algorithm. It was proved by Lindsay [7] and Mallet [8], under simple hypotheses on the population model { p ( Y i | θ i ) } , that the global maximizer F M L of L ( F ) could be represented by a discrete distribution with support on at most N points, i.e., a distribution with nonzero probability located on at most N points.
Remark 2.
One way to motivate this result by Lindsay and Mallet is as follows: Suppose by some lucky chance, we knew the exact parameters for each subject. How can we package this into a distribution? Answer: The “empirical distribution” of the exact parameters. That is the discrete distribution supported at the N exact parameters with equal weights. It turns out in this case that this empirical distribution is also the nonparametric maximum likelihood (NPML) distribution of the parameters. What is remarkable is that if we only have noisy measurements ( Y 1 , . . . , Y N ) of the N subjects only indirectly related to the subject parameters, the structure of the NPML is the same. A discrete distribution supported at N points. Of course, in this real case, the position and weights of the N support points are unknown. Finding these positions and weights is the subject of this paper.
This result leads immediately to a finite dimensional optimization problem for F M L , namely to maximize the likelihood function
L ( λ , ϕ ) = i = 1 N k = 1 K λ k p Y i | ϕ k
with respect to the support points ϕ = ( ϕ 1 , , ϕ K ) and weights λ = ( λ 1 , , λ K ) such that ϕ k Θ , λ k 0 for k = 1 , , K , K N and k = 1 K λ k = 1 .
In our algorithm l ( λ , ϕ ) = log L ( λ , ϕ ) is maximized, so that
l ( λ , ϕ ) = i = 1 N log k = 1 K λ k p Y i | ϕ k
and the maximization problem becomes
maximize l ( λ , ϕ )
such that ϕ Θ K , λ = ( λ 1 , . . . , λ K ) R + K , K N and k = 1 K λ k = 1 .
Although the maximization problem in Equation (4) is finite dimensional, it is still high-dimensional. The dimension of the maximization problem in Equation (4) is N ( dim Θ ) + ( N 1 ) .
The optimization problem in Equation (4) is naturally divided into two problems:
Problem 1. Given a set of support points { ϕ k } , find the optimal weights { λ k } .
Problem 2. Given the solution to Problem 1, find a better set of support points.
Problems 1 and 2 are solved cyclically until convergence, i.e., no significant improvement in l ( λ , ϕ ) .
Problem 1 is a convex programming problem. In our algorithm, we solve this problem by the primal-dual interior-point (PDIP) method. This type of method is standard in convex optimization theory (see Boyd and Vandenberghe [9]). However, the exact implementation for a specific problem varies from problem to problem. The exact details of our implementation are described in the Appendix. See also Bell [10], Baek [11] and Yamada et al. [12]. Our PDIP implementation is fast and can easily handle thousands of variables.
Finding a better set of support points in Problem 2 is a more difficult problem. This location problem is a nonconvex global optimization problem with many local extrema and whose dimension is potentially N × dim Θ . The details of our algorithm, called the adaptive grid (AG) method, will be described in Section 2.3 and in Algorithm 1.
Algorithm 1 Nonparametric adaptive grid (NPAG) algorithm. Input: ( Y , ϕ 0 , a , b , Δ D , Δ L , Δ F , Δ e , Δ λ ) , a and b are the lists of lower and upper bounds, respectively, of Θ ; Δ D is the minimum distance allowable between points in the estimated F M L . Δ x see Section 2.7. Output: ( ϕ , λ , l ( λ , ϕ ) )
1:procedureNPAG( Y , ϕ 0 , a , b , Δ D )▹ Estimate F M L given Y
2:      Initialization: ϕ = ϕ 0 , L o g L i k e = 10 30 , F 0 = 10 30 , F 1 = 2 F 0 , e p s = 0.2 , Δ e = 10 4 , Δ F = 10 2 , Δ L = 10 4 , Δ λ = 10 3 , n = 0
3:      while e p s Δ e or | F 1 F 0 | Δ F do
4:        Calculate Ψ ( ϕ ) N × K matrix { p ( Y i | ϕ k ) }
5:         [ λ ^ ( ϕ ) , l ( λ ^ ( ϕ ) , ϕ ) ] PDIP ( Ψ ( ϕ ) ) Appendix A
6:        if MAXCYCLES = = 0 then
7:            F e s t M L l ( λ ^ ( ϕ ) , ϕ )
8:            λ λ ^ ( ϕ )
9:           return [ ϕ , λ , F e s t M L ]
10:        end if
11:         n n + 1
12:         ϕ c CONDENSE ( ϕ , λ ^ ( ϕ ) , Δ λ ) ▹ Algorithm 3
13:         [ λ ^ ( ϕ c ) , l ( λ ^ ( ϕ c ) , ϕ c ) ] PDIP ( Ψ ( ϕ c ) ) PDIP returns G n
14:         N e w L o g L i k e = l ( λ ^ ( ϕ c ) , ϕ c )
15:        if n > MAXCYCLES then
16:            F e s t M L l ( λ ^ ( ϕ c ) , ϕ c )
17:            λ λ ^ ( ϕ c )
18:           return [ ϕ , λ , F e s t M L ]
19:        end if
20:        if | N e w L o g L i k e L o g L i k e | Δ L and e p s > Δ e then
21:            e p s = e p s / 2 ▹ Adjust precision
22:        end if
23:        if e p s Δ e then▹ check EXIT conditions
24:            F 1 = N e w L o g L i k e
25:           if | F 1 F 0 | Δ F then
26:                F e s t M L F 1
27:                ϕ ϕ c ; λ λ ^ ( ϕ c )
28:               return [ ϕ , λ , F e s t M L ]
29:           else
30:                F 0 = F 1 ; e p s = 0.2 ▹ Reset Algorithm
31:           end if
32:        end if
33:         ϕ ϕ e EXPAND ( ϕ c , e p s , a , b , Δ D ) ▹ Algorithm 2
34:         L o g L i k e N e w L o g L i k e
35:    end while
36:end procedure
Roughly speaking, Problems 1 and 2 are solved as follows. An initial large grid of possible support points is defined in the hypercube Θ . Problem 1 is solved on this large grid. After PDIP, most of the original grid points are removed due to near-zero weights, leaving a smaller high-probability grid. Problem 1 is then solved on this smaller grid. Then, the adaptive grid method for Problem 2 takes place. For each remaining grid point, up to 2 × dim Θ new (daughter) support points are added. A daughter point outside the search space Θ or too close to a parent point is discarded. The new grid contains the current high-probability points plus the added daughter points. The algorithm is then ready for Problem 1 again. By construction, each iteration increases the value of l ( λ , ϕ ) . This process continues until the function l ( λ , ϕ ) does not significantly change.

1.1. Comparable Methods

Because of space limitations, in this section, we only discuss NPML methods that optimize Equation (4), methods that treat multivariate distributions, and methods which allow general conditional probabilities { P ( Y i , θ i ) } . As explained in this paper, any such NPML algorithm has to address two problems: locations of support points and weights of support points. NPAG does locations by an adaptive grid method and weights by the primal-dual interior-point (PDIP) method. The algorithms discussed in this seection are summarized in Table 1.
The original methods of Lindsay [7] and Mallett [8] were based on algorithms of optimal design in the style of Fedorov [13]. In Schumitzky [14], an algorithm was proposed which did both locations and weights by the expectation maximization algorithm (EM). It was stable but also slow.
In Lesperance and Kalbfleisch [15], a new method was introduced which did weights by the dual method described in Section 5 of Lindsay [7] and locations by what they called the intra-simplex direction method (ISDM). Even though the Lesperance and Kalbfleisch paper was restricted to univariate distributions, the ISDM method has been generalized to the multivariate case. To briefly describe ISDM, let D ( θ , F ) be the directional derivative of log L ( F ) in the direction of the Dirac distribution δ θ supported at θ Θ . (This function is defined in Section 4 below). ISDM is an iterative algorithm. At stage k, let F k be the current estimate F M L . Then, find all the local maxima of D ( θ , F k ) . These local maxima are added to the current set of support points and a new F k + 1 is calculated. If there are no new local maxima, then the algorithm is done.
In Pilla, Bartolucci, and Lindsay [16], another new method was developed where the locations were found by an initial fine grid. However, the weights were found by a dual version of the PDIP method.
In Savic, Kjellsson, and Karlsson [17], a nonparametric method (NONMEM-NP) was added to the popular NONMEM® program. NONMEM-NP is a hybrid parametric–nonparametric approach The locations of support points were found by a parametric maximum likelihood algorithm. Then, the weights were found by maximizing Equation (4) relative to the newly found support points. NONMEM-NP can handle high-dimensional and complex multivariate distributions. An extension to NONMEM-NP was developed in Savic and Karlsson [18] where additional support points are added to the original set. A comparison between NONMEM-NP and NPAG is discussed in Leary [19].
In Wang and Wang [6], a new algorithm was developed for multivariate distributions. The locations were found by a combination of EM and a variant of ISDM. The weights were found by a family of quadratic programs. In [6], examples are performed for 8- and 13-dimensional mutivariate mixing distributions.
Table 1. Table of NPML Methods.
Table 1. Table of NPML Methods.
Author(s)/Reference/DateProblem 1 MethodProblem 2 Method
Lindsay [7] (1983)Convex GeometryVDM
Mallet [8] (1986)Optimal DesignVDM
Lesparance and Kalbfleisch [15] (1992)Semi-Infinite ProgrammingISDM
Savic, Kjellsson and Karlsson [17] (2009)Parametric NONMEMNone
Pilla, Bartolucci and Lindsay [16] (2006)Dual of PDIPAdaptive Grid
Schumitzky [14] (1991)EMEM
X. Wang and Y. Wang [6] (2015)Quadratic ProgrammingISDM
NPAG [20] (2001)PDIPAdaptive Grid
Legend: VDM vertex direction method; EM expectation–maximization; ISDM intra-simplex direction method; NPML nonparametric maximum likelihood; PDIP primal-dual interior-point method.
Note: The quadratic programming algorithm (QP) of Wang and Wang [6] has an attractive feature. For a prescribed set of support points, QP finds the zero probabilities exactly. Thus, QP avoids the grid condensation step where support points from PDIP with sufficiently low probabilities are deleted. However, QP and PDIP are based on different numerical methods and a comparison of the efficiency of both algorithms has not been determined.
We finally mention that the NPML problem is a special case of a finite mixture model problem with unknown supports and weights. For a discussion of this approach, see Tatarinova and Schumitzky [21].
The algorithms which have shown by published examples to handle the highest dimensional multivariate problems are NONMEM-NP, Wang and Wang [6], and NPAG.

1.2. Benders Decomposition

For any set of grid points ϕ = ( ϕ 1 , . . . , ϕ m ) in Θ m , let λ = λ ^ ( ϕ ) be the corresponding set of optimal weights given by the PDIP method. Then, the function F ( ϕ ) = l ( λ ^ ( ϕ ) , ϕ ) depends only on ϕ and can be maximized directly. For optimization methods, this technique is called Benders decomposition. The NPAG algorithm maximizes F ( ϕ ) by an adaptive search method. In a method proposed by James Burke, F ( ϕ ) is maximized by a Newton-type method. Since the function F ( ϕ ) is not necessarily differentiable, a relaxed Newton method must be used similar to what is described in the Appendix for the primal-dual algorithm. For details of Benders decomposition as applied to our problem, see Bell [10], Baek [11] and Jordan-Squire [22].
Founded on this prior work, the present study aimed to comprehensively describe, for the first time, the nonparametric adaptive grid algorithm (NPAG). This approach uses the exact log-likelihood to solve population modeling problems and does not make any assumptions about the shape of the PK parameter distributions. We illustrated the features and capabilities of this algorithm using a population PK modeling example. This algorithm presents the computational foundation of several hundred peer-reviewed papers, to date, and is ideally suited to optimize individual patient dosage regimens. The output of the NPAG algorithm becomes the input of the BestDose™ patient dosing software which is used at the bedside in real-time [23].

2. Materials and Methods

2.1. Pmetrics

The simulations and NPAG optimizations in this paper can be duplicated in R, using programs in the Pmetrics package [24]. R and Pmetrics are free software. R is available from many download sites. Pmetrics is available from lapk.org. NPAG is run using the NPrun() command in Pmetrics. Sample datasets and compartmental models are also available at lapk.org.

2.2. NPAG Subprograms

NPAG is a Fortran program consisting of a number of subroutines as described below. The main program performs the adaptive grid (AG) method (consisting of expansion and compression algorithms) and calls the primal-dual interior-point (PDIP) subprogram. The PDIP algorithm solves the maximization problem of Equation (4) for a fixed grid and is described precisely in the Appendix.

2.3. NPAG Implementation (NPAG—Algorithm 1)

For the purpose of this discussion, we can think of PDIP as a function λ ^ from Θ m into the set S m = { λ R + m : k = 1 m λ k = 1 } defined as follows: If ϕ = ( ϕ 1 , . . . , ϕ m ) then λ ^ ( ϕ ) = ( λ ^ 1 , . . . , λ ^ m ) maximizes Equation (4) relative to the fixed set of grid points ( ϕ 1 , . . . , ϕ m ) . In this case, we write G = ( ϕ , λ ^ ( ϕ ) ) and l ( G ) = l ( ϕ , λ ^ ( ϕ ) ) .
In NPAG, there are two types of grids: expanded and condensed. The expanded grids are the initial grid and the grids after grid expansion (Algorithm 2). The condensed grids are generated by grid condensation (Algorithm 3). Each cycle of NPAG begins with an expanded grid. The likelihood calculation is done on the condensed grids.
Now for the adaptive grid method. Assume that Θ is a bounded Q-dimensional hyper-rectangle. Initially, we let ϕ e x p a n d e d 0 = ( ϕ 1 0 , . . . , ϕ M 0 ) be the set of M Faure grid points in Θ (see [25,26,27]). Alternatively, we could initially let ϕ e x p a n d e d 0 be generated by a uniform distribution on Θ or by a prior run of the program.
Remark. The Faure grid points for a hyper-rectangle Θ are a low-discrepancy set which in some sense optimally and uniformly covers Θ . In our implementation of NPAG, the Faure point sets come in discrete sizes which nest with each other. (Allowable number of points equals 2129, 5003, 10,007, 20,011, 40,009, 80,021, and multiples of 80,021.) This nesting property is useful for checking the optimality of F M L (see Section 4). We have found that replacing the initial Faure set with a set generated by a uniform distribution on Θ increases the time to convergence but results in the same optimal distribution.
Now set G e x p a n d e d 0 = ( ϕ 0 , λ ^ ( ϕ 0 ) ) . Our approach is to generate a sequence of solutions G n to Equation (4) of increasingly greater likelihood, where unless otherwise specified, G n refers to the condensed grid at the n t h cycle of the algorithm. If G n has log-likelihood negligibly different than G n 1 , then G n is considered the optimal solution to Equation (4) and is relabeled F M L . If not, then the process continues using the ϕ n as the new seed. This loop is repeated until F M L is found.
The stopping conditions for NPAG are defined precisely in Algorithm 1. If the stopping conditions are not met prior to a set maximum number of iterations, the program will exit after writing the last calculated G n into a file.
Algorithm 2 EXPAND. Input: ϕ = ( ϕ 1 , , ϕ K ) , Δ G , Θ = [ a 1 , b 1 ] × [ a 2 , b 2 ] × . . . × [ a Q , b Q ] , a = [ a 1 , , a Q ] , b = [ b 1 , , b Q ] , Δ D . Output: ϕ = ( ϕ 1 , , ϕ M ) , where M K ( 1 + 2 Q ) . Note: In this algorithm, ϕ = ( ϕ 1 , , ϕ K ) is a Q × K matrix, with Q = dim Θ
functionExpand( ϕ , Δ G , a , b , Δ D )
2:    Initialize: [ Q , K ] = s i z e ( ϕ ) , I = Q × Q Identity matrix, new ϕ ϕ
    for k = 1 , , K do K = number of of input support points
4:        for d = 1 , , Q do Q = dim Θ
            T ( d ) = Δ G ( b ( d ) a ( d ) )
6:           if ϕ ( d , k ) + T ( d ) b ( d ) then▹ Check upper boundary
                ϕ + = ϕ ( : , k ) + T ( d ) I ( : , d )
8:                d i s t = 10 30
           end if
10:           for k i n = 1 : length ( new ϕ ) do
                newdist = abs ( ϕ + new ϕ ( : , k i n ) ) . / ( b a ) ▹ x ./y done
component-wise
12:                d i s t = min ( d i s t , newdist )
           end for
14:           if d i s t Δ D then▹ Check distance to new support point
                new ϕ new ϕ , ϕ +
16:           end if
           if ϕ ( d , k ) T ( d ) a ( d ) then▹ Check lower boundary
18:                ϕ = ϕ ( : , k ) T ( d ) I ( : , d )
                d i s t = 10 30
20:           end if
           for k i n = 1 : length ( new ϕ ( 1 , : ) ) do
22:                newdist = ( a b s ( ϕ new ϕ ( : , k i n ) ) . / ( b a ) ) ▹ x./y done
component-wise
                d i s t = min ( d i s t , newdist )
24:           end for
           if d i s t Δ D then▹ Check distance to new support point
26:                new ϕ [ new ϕ , ϕ ]
           end if
28:        end for
    end for
30:     ϕ n e w ϕ
end function
Algorithm 3 Condense algorithm. Input: ( ϕ , λ , Δ λ ) , Output: ϕ c Note: ϕ c is considered a subset of ϕ
functionCondense( ϕ , λ , Δ λ )
     ind = find ( λ > ( max λ ) Δ λ )▹ Inequality and max are performed component-wise
     ϕ c = ϕ ( : , ind )
     return ϕ c
end function

2.4. Grid Expansion (EXPAND—Algorithm 2)

The crux of the adaptive grid method is how to go from G 0 to G 1 or, more generally, from G n to G n + 1 . The details of doing this are now explained roughly below and precisely in Algorithm 1.
Let Q be the dimension of Θ . Suppose at stage n we have a grid of high-probability support points ϕ n . We then add 2 Q daughter points for each support point ϕ k ϕ n . The daughter points are the vertices of a small hyper-rectangle centered at each ϕ k with size proportional to the original size of the hyper-rectangle defining Θ . The size of this small hyper rectangle decreases as the accuracy of the estimates increases. (See Algorithm 2).
Let ϕ e x p a n d e d n + 1 = ϕ n Daughter Points . Then the PDIP subprogram is applied to ϕ e x p a n d e d n + 1 resulting in the new solution set G e x p a n d e d n + 1 = ( ϕ e x p a n d e d n + 1 , λ ^ ( ϕ e x p a n d e d n + 1 ) ) (see Algorithm 1). The solution set G e x p a n d e d n + 1 is now ready for grid condensation.

2.5. Grid Condensation (CONDENSE—Algorithm 3)

The above solution set G e x p a n d e d n + 1 may have many support points with low probability. We remove all support points which have corresponding probability less than ( max λ ) Δ λ , where λ is the vector of current probabilities and the default for Δ λ is 10 3 . (Note that the remaining probabilities are not normalized at this point.) The probabilities of the remaining support points are normalized by a second call to the PDIP subprogram. This second call to PDIP is fast. The likelihood associated with these remaining support points and normalized probabilities is then used to update the program control parameters and check for convergence (Algorithm 1 and Section 2.7). If convergence is attained, then the output of this second call to PDIP provides the support points and probabilities of the final solution. If convergence is not attained, then the remaining support points are sent to the grid expansion subprogram (Algorithm 2), initializing the next cycle.
At the end of the program, the output of this second call to PDIP provides the location and weights of the final solution.

2.6. PDIP Subprogram—See Appendix A

The PDIP subprogram finds the optimal solution to Equation (4) with respect to λ for fixed ϕ . PDIP employs a primal-dual interior-point method that uses a relaxed Newton method to solve the corresponding Karush–Kuhn–Tucker equations. (See Equations (14)–(17) of Appendix A).
For any Y =( Y 1 , . . . , Y N ) and any ϕ =( ϕ 1 , . . . , ϕ K ) Θ K , the input to the PDIP subprogram is the N × K matrix p ( Y i | ϕ k ) . The output consists of the optimal weights λ ^ ( ϕ ) and the corresponding log-likelihood l ( λ ^ ( ϕ ) , ϕ ) . An in-depth description of the PDIP algorithm and its implementation is presented in Appendix A. See also [10,11,12].

2.7. NPAG Stopping Conditions

As explained above, a potential solution to F M L is not accepted as a global optimum until successive sequences of G n produce final distributions evaluating to sufficiently close log-likelihood. The various upper and lower bounds Δ for NPAG control and stopping conditions are defined below and are used in Algorithms 1–3.
ΔL
Primary upper bound on the allowable difference between two successive estimated log-likelihoods; the default initialization is 10 4 .
ΔF
Secondary upper bound on the allowable difference between two successive estimated log-likelihoods of potential F M L ; the default initialization is 10 2 .
Δe
Sets an upper bound on the accuracy variable e p s of Algorithm 1. The default initialization for Δ e is 10 4 . The default initialization for e p s is 0.2 and is stepped down until e p s Δ e Δ F and Δ e define the two stopping conditions for Algorithm 1.
ΔD
Sets a lower bound on how close two support points can get; the default initialization is 10 4 .
Δλ
Sets a lower bound factor on the probabilities of the weights λ ; the default initialization is 10 3 .

2.8. Calculation of p ( Y i ϕ k )

Given observations Y i , i = 1 , . . . , N and grid points ϕ k , k = 1 , . . . , K , the PDIP subprogram only depends on the N × K matrix p ( Y i | ϕ k ) . NPAG can be used for any problem once this matrix is defined. However, the default setting of NPAG is for the problem of population pharmacokinetics. For a good background of population pharmacokinetics, see Davidian and Giltinan [28,29].
In population pharmacokinetics, generally Y i = ( y i , 1 , . . . , y i , M ) is a matrix of vector observations for the ith subject. Since NPAG allows multiple outputs, each y i , m is itself a q-dimensional vector y i , m = ( y i , m , 1 , , y i , m , q ) . The observations y i , m , j , are then typically given by a regression equation of the form:
y i , m , j = f i , m , j ( θ i ) + ν i , m , j , j = 1 , , q ν i , m , j N ( 0 , ( σ i , m , j ( θ i ) ) 2 ) θ i are unobserved parameters specific for Y i
In the above Equation (5), f i , m , j is a known nonlinear function depending on the model structure, the dosage regimen, the sampling schedule, all covariates and of course the subject-specific parameter vector θ i . Except for simple models, f i , m , j requires the solution of (possibly nonlinear) ordinary differential equations.
In the current implementation of NPAG, it is assumed that the ( y i , 1 , . . . , y i , M ) are independent. Then
p ( Y i | ϕ k ) = exp 1 2 m = 1 M ( y i , m f i , m ( ϕ k ) ) Σ i , m 1 ( ϕ k ) ( Y i , m f i , m ( ϕ k ) ) T m = 1 M ( 2 π ) q det Σ i , m ( ϕ k )
where f i , m = ( f i , m , 1 , . . . , f i , m , q ) and Σ i , m = d i a g ( σ i , m , 1 2 , . . . , σ i , m , q 2 ) . For the purposes of matrix multiplication in Equation (6),we think of y i , m and f i , m as q-dimensional row vectors.
To complete the description of Equation (6) we need to model the standard deviation terms σ i , m , j of the assay noise. In our implementation of NPAG, four different models are allowed. Let
α i , m , j ( ϕ k ) = c 0 + c 1 f i , m , j ( ϕ k ) + c 2 f i , m , j 2 ( ϕ k ) + c 3 f i , m , j 3 ( ϕ k )
and set
σ i , m , j = α i , m , j assay error polynomial only γ α i , m , j multiplicative error α i , m , j 2 + γ 2 additive error γ constant level of error
The parameter γ in Equation (8) is a variance factor. Artificially increasing the variance during the first several cycles of NPAG increases the likelihood for each ϕ , allowing the algorithm to use these cycles to find a better initial state from which to begin optimization. NPAG also has an option to “optimize” γ . This changes NPAG from a nonparametric method to a “semiparametric” method and will not be discussed here. The interested reader can consult [12].
Next, if c 0 = 0 in Equation (7), then α i , m , j can become small for certain values of ϕ that in early iterations can be far from optimal. This, in turn, causes numerical problems as the likelihood is infinite if σ i , m , j = 0 . One way to avoid this problem is to take σ i , m , j = c o n s t a n t . Another way would be to assume that α i , m , j is known and is given by
α i , m , j = c 0 + c 1 y i , m , j + c 2 y i , m , j 2 + c 3 y i , m , j 3
That is, to approximate σ by using a polynomial of the observed values rather than model predicted values. In our experience with NPAG, the approximation of Equation (9) is useful for ensuring computational stability (especially during the early cycles of the algorithm). However, from a theoretical perspective, this change violates the conditions of maximum likelihood and will not be discussed here. Again, the interested reader can consult [12].

2.9. Convergence

For a given initial grid ϕ 0 , the NPAG algorithm is only guaranteed to find a local maximum of L ( F ) . More precisely, if ϕ * is the final grid of NPAG starting from ϕ 0 , then λ ^ ( ϕ * ) is a global maximum on ϕ * but the support points ϕ * may be only a local maximum.
Global convergence of a nonparameteric maximum likelihood method for estimation of a multivariate mixing distribution is difficult. For one-dimensional distributions the problem is straightforward. The idea of proof goes back to at least Fedorov [13] in 1972, which involves the use of directional derivatives.
Let F be any distribution on Θ . Then, the directional derivative of log L ( F ) in the direction of the Dirac distribution δ θ supported at θ is defined by
D ( θ , F ) = [ i = 1 N P ( Y i | θ ) / P ( Y i | F ) ] N , θ Θ , where p ( Y i | F ) = p ( Y i | θ ) d F ( θ ) . Let F k be the current NPML estimate at iteration k. The Fedorov method involves maximizing D ( θ , F k ) for θ Θ , at every iteration. Then, the point at which the maximum occurs is added in an optimal way to F k to give F k + 1 . Under the assumptions of regularity, Fedorov shows that L ( F k ) converges to L ( F M L ) , see Fedorov [13], (Theorem 2.5.3). Many improvements to this method have been made. In Lesperance and Kalbfleisch [15] and Wang and Wang [6], instead of just adding the point at which D ( θ , F k ) occurs, all the points where local maxima occur are added in an optimal way. Again, under the assumptions of regularity, convergence as above is proved. In one dimension, these methods are efficient. In higher dimensions, these methods are not computationally practical.
We now suggest a method to check whether the final distribution of NPAG is globally optimal and, if not optimal, how close it is to the optimal. It also involves the use of the directional derivative D ( θ , F ) , but only at the last iteration of NPAG. Now define
D ( F ) = max θ Θ D ( θ , F )
Note that the m a x in the above expression is only over Θ and not over Θ N . It is proved in Lindsay [7] that F * is a global maximum of L ( F ) , i.e.,  F * = F M L , if and only if D ( F * ) = 0 . Even if D ( F * ) 0 , it is useful to make this computation as it is also proved in Lindsay [7] that L ( F M L ) L ( F * ) D ( F * ) , so this last expression gives an estimate of the accuracy of the final NPAG result.
Now, even though we said above it is not practical to calculate D ( F ) at every iteration of an algorithm, we are just suggesting to make this calculation at the end of the algorithm. This calculation can be performed by a deterministic or stochastic optimization algorithm.

3. Examples

First of all, the NPAG program has been used successfully in high-dimensional and very complex pharmacokinetic–pharmacodynamic models. In Ramos-Martin et al. [30], the NPAG program was used for a population model of the pharmacodynamics of vancomycin for coagulase-negative staphylococci (CoNS) infection in neonates. Vancomycin is an antibiotic used to treat a number of serious bacterial infections. CoNS are the most commonly isolated pathogens in the neonatal intensive care unit. This model had 7 nonlinear differential equations and 11 random parameters. The population was a combination of 300 experimental and animal subjects. In Drusano et al. [31], the NPAG program was used for a population model of two drugs for the treatment of tuberculosis. This model had 5 nonlinear differential equations, 3 nonlinear algebraic equations, 1671 observations from 6 outputs and 29 random parameters. In the algebraic equations, the state variables were only defined implicitly and had to be solved for by an iterative method.
The above two examples are too complex to use for simulation purposes. Consequently, we present here a simpler model which has an analytic solution and which can be checked by other algorithms. Nevertheless, the estimation of parameters in this model is not trivial. We consider a three-compartment PK model with a continuous IV infusion into the central compartment and a bolus input into the absorption compartment. The individual subject model is described by the following differential equations:
d x 1 d t = K a x 1 , x 1 ( t ) = 0 for   0 t < 5 b if   t = 5 d x 2 d t = K a x 1 K e + K c p x 2 + K p c x 3 + r ( t ) , x 2 ( 0 ) = 0 d x 3 d t = K c p x 2 K p c x 3 , x 3 ( 0 ) = 0
and output equation
y 1 ( t ) = x 2 ( t ) / V c + w ( t ) , w ( t ) N ( 0 , σ 2 ) , σ = 5.5
The inputs are a bolus b = 2000  mg at t = 5  Hr and a continuous infusion r ( t ) = 500 Hr 1 , for 0 < t < 16  Hr. This model has 5 random parameters (V, K a , K e , K c p , K p c ). A diagram of this model is given in Figure A1. It is known that this model is structurally identifiable, see Godfrey [32]. However, we have found that for a continuous IV infusion, the parameters K c p and K p c can be difficult to estimate in a noisy environment.
The details of the simulation are as follows. There were 300 simulated subjects. The random variables ( V , K a , K c p , K p c ) were independently simulated from normal distributions with means respectively equal to ( 1.2 , 0.8 , 2.0 , 0.2 ) and standard deviations equal to 25% coefficient of variation.
The random variable K e was independently simulated from a bimodal mixture of two normal distributions with means respectively equal to 0.5 and 1.5 , with standard deviations equal to 10 % coefficient of variation, and with weights equal to 0.2 and 0.8 . This distribution would apply to an elimination rate constant with a bimodal distribution where 80 % of the subjects have a mean of 1.5 , and only 20 % have a mean of 0.5 . The power of the nonparametric method allows the detection of the 20 % group.
Eleven observations were taken at times t = 0.25 , 1.0 , 4.98 , 5.25 , 5.5 , 6.0 , 7.0 , 8.5 , 10.0 , 13.0 , 16.0 .
These sampling times were chosen in an ad hoc fashion and are not to be considered optimal. In Figure A2, we show the profiles of the 300 noisy model outputs y 1 . These profiles are plotted as piecewise linear functions with nodes at the observation times.
The initial Faure set had 80 , 321 support points dispersed in the volume
K a , V , K e , K c p , K p c ( 0.01 , 2.0 ) , ( 0.01 , 2.5 ) , ( 0.0001 , 2.0 ) , ( 0.0 , 4.0 ) , ( 0.0001 , 2.0 )
The assay error (Equation (8)) is not always known. An approximate assay error polynomial can be inferred from literature and Pmetrics includes a routine to estimate the assay error polynomial from the data. Another approach to analyzing data with unknown measurement error is to run successive NPAG optimizations using decreasing error magnitude for each new run. An advantage of this approach is that model development is faster. The first cycle of NPAG, which begins with a relatively large measurement error, can be initialized using a relatively small number of support points. Each NPAG solution is used as a prior to skip the first (and most computationally burdensome) step in the next NPAG run. We demonstrate this approach can converge to the correct solution on this simulated data.
Convergence for this problem was accomplished after applying NPAG four times. For the first application, σ = 0.025 Y s i m u l a t e d . The output distribution of this first application was used as a prior to start NPAG again, this time with σ = 7.96 + 0.0125 Y s i m u l a t e d . The output of this second application was used as a prior to start a third NPAG run, this time with σ = 7.96 + 0.0065 Y s i m u l a t e d . Finally, the output density of this third run was used as a prior to run NPAG a fourth time, this time with σ = 5.5 , the same as for the simulation. The step down in assumed observation error happened at convergence for each previous error levels: at cycles 4513, 5972, and 6791. Final convergence happened at cycle 8012. There are 284 support points in the final density.
The simulated and estimated marginal distributions are shown in Figure A3 and Figure A4. It is seen that the estimated marginal distributions are similar to the simulated histograms. In particular, the bimodal shape of K e was uncovered. Similarity is tested using the R routine mtsknn.eq(..., k = 3), which returns a p v a l = 0.5809 . mtsknn.eq applies a K—nearest neighbors approach to estimate the probability that two non-parametric distributions arise from the same distribution.
NPAG is designed to estimate the whole joint distribution of the parameters. As mentioned earlier, the estimate F M L is especially important for our application to population pharmacokinetics where F M L is used as a prior distribution for Bayesian dosage regimen design. However, F M L is a consistent estimator of the true mixing distribution and consequently, the moments of F M L should be consitent estimators of the true moments. Means and variances of parameter estimates for F M L can be easily obtained by integrating the corresponding marginal distributions. So as a check of this fact, in Table A1, the comparisons of estimated versus simulated means and variances are shown. Again, results are quite accurate (see Table A1 and Table A2).
Finally, in Figure A5 we include a graph of Predicted versus Observed values which shows the all around good fit of the data. The predicted (right panel: Predicted Bayesian) values are gotten as follows: For each subject, the Bayesian mean estimate of the parameters are found using the final NPAG distribution as a prior and that subject’s observations. Then, based on these parameter means, the subject’s concentration profile is calculated. The predicted (left panel: Predicted Population) is the weighted average of the evaluation of all support points for subject model.

4. Conclusions

We have provided the first comprehensive description of the NPAG algorithm for estimating multivariate mixing distributions. This algorithm can describe between subject variability without assuming any shape for the distribution of PK parameters and is excellently suited for optimizing patient dosing [33,34,35]; NPAG is based on an iterative algorithm employing the Primal-Dual Interior-Point method and an Adaptive Grid method. This approach can handle pharmacokinetic, pharmacodynamic and other models with a high number of estimated model parameters and is based on the exact log-likelihood. A detailed description of NPAG is provided along with an application for a common two-compartment PK models. Finally, the NPAG algorithm is arguably the most efficiently parallelizable population modeling algorithm, since it can parallelize over both subjects and support points. This allows one to readily implement this algorithm on supercomputers as our laboratory has done on research projects. In addition to population pharmacokinetics, this research also applies to empirical Bayes estimation, see Koenker and Mizera [36] and to many other areas of applied mathematics, see Banks et al. [37]. Overall, the NPAG algorithm provides an important addition to the pharmacometric toolbox for drug development and optimal patient dosing at the interface of applied mathematics, biomedical scientists, and clinicians.

5. Supplementary Material

5.1. Recent References Using Npag

As a means for readers to more quickly assess the applicability of NPAG in a wide variety of pharmacometric studies, we refer here to the first 10 references on a recent PubMed search for papers that use NPAG. Note that all population PK studies using Pmetrics employ NPAG.
  • Pharmacodynamics of Posaconazole in Experimental Invasive Pulmonary Aspergillosis: Utility of Serum Galactomannan as a Dynamic Endpoint of Antifungal Efficacy. Gastine S, Hope W, Hempel G, Petraitiene R, Petraitis V, Mickiene D, Bacher J, Walsh TJ, Groll AH. Antimicrob Agents Chemother. 2020 Nov 9:AAC.01574-20. doi: 10.1128/AAC.01574-20. Online ahead of print. PMID: 33168606
  • Cerebrospinal fluid penetration of ceftolozane/tazobactam in critically ill patients with an indwelling external ventricular drain. Sime FB, Lassig-Smith M, Starr T, Stuart J, Pandey S, Parker SL, Wallis SC, Lipman J, Roberts JA. Antimicrob Agents Chemother. 2020 Oct 19:AAC.01698-20. doi: 10.1128/AAC.01698-20. Online ahead of print. PMID: 33077655
  • Population Pharmacokinetics of Continuous-Infusion Meropenem in Febrile Neutropenic Patients with Hematologic Malignancies: Dosing Strategies for Optimizing Empirical Treatment against Enterobacterales and P. aeruginosa. Cojutti PG, Candoni A, Lazzarotto D, Filì C, Zannier M, Fanin R, Pea F. Pharmaceutics. 2020 Aug 19;12(9):785. doi: 10.3390/pharmaceutics12090785. PMID: 32825109 Free PMC article.
  • Caspofungin Weight-Based Dosing Supported by a Population Pharmacokinetic Model in Critically Ill Patients. Märtson AG, van der Elst KCM, Veringa A, Zijlstra JG, Beishuizen A, van der Werf TS, Kosterink JGW, Neely M, Alffenaar JW. Antimicrob Agents Chemother. 2020 Aug 20;64(9):e00905-20. doi: 10.1128/AAC.00905-20. Print 2020 Aug 20. PMID: 32660990 Free PMC article.
  • Ethionamide Population Pharmacokinetic Model and Target Attainment in Multidrug-Resistant Tuberculosis. Al-Shaer MH, Märtson AG, Alghamdi WA, Alsultan A, An G, Ahmed S, Alkabab Y, Banu S, Houpt ER, Ashkin D, Griffith DE, Cegielski JP, Heysell SK, Peloquin CA. Antimicrob Agents Chemother. 2020 Aug 20;64(9):e00713-20. doi: 10.1128/AAC.00713-20. Print 2020 Aug 20. PMID: 32631828
  • Development and validation of a dosing nomogram for amoxicillin in infective endocarditis. Rambaud A, Gaborit BJ, Deschanvres C, Le Turnier P, Lecomte R, Asseray-Madani N, Leroy AG, Deslandes G, Dailly É, Jolliet P, Boutoille D, Bellouard R, Gregoire M; Nantes Anti-Microbial Agents PK/PD (NAMAP) study group. J Antimicrob Chemother. 2020 Oct 1;75(10):2941-2950. doi: 10.1093/jac/dkaa232. PMID: 32601687
  • Population Pharmacokinetics and Target Attainment of Cefepime in Critically Ill Patients and Guidance for Initial Dosing. Al-Shaer MH, Neely MN, Liu J, Cherabuddi K, Venugopalan V, Rhodes NJ, Klinker K, Scheetz MH, Peloquin CA. Antimicrob Agents Chemother. 2020 Aug 20;64(9):e00745-20. doi: 10.1128/AAC.00745-20. Print 2020 Aug 20. PMID: 32601155
  • Baclofen self-poisoning: Is renal replacement therapy efficient in patient with normal kidney function? Brunet M, Léger M, Billat PA, Lelièvre B, Lerolle N, Boels D, Le Roux G. Anaesth Crit Care Pain Med. 2020 Oct 14:S2352-5568(20)30230-7. doi: 10.1016/j.accpm.2020.07.021. Online ahead of print. PMID: 33068797
  • Ceftriaxone dosing in patients admitted from the emergency department with sepsis. Heffernan AJ, Curran RA, Denny KJ, Sime FB, Stanford CL, McWhinney B, Ungerer J, Roberts JA, Lipman J. Eur J Clin Pharmacol. 2020 Sep 24. doi: 10.1007/s00228-020-03001-z. Online ahead of print. PMID: 32974748
  • Population Pharmacokinetic Models of Anti-Tuberculosis Drugs in Patients: a Systematic Critical Review. Otalvaro JD, Hernandez B E AM, Rodriguez CA, Zuluaga AF. Ther Drug Monit. 2020 Sep 18. doi: 10.1097/FTD.0000000000000803. Online ahead of print. PMID: 32956238

5.2. Recent Studies to Which NPAG Can Be Applied

The references below use excellent parametric methods. All of these methods have the same mathematical structure as our nonparametric NPAG. The main difference is that in the parametric methods, it is assumed that the population distribution is multivariate normal with unknown mean vector and unknown covariance matrix. In NPAG, we do not make any parametric assumptions about the population distribution, as discussed in our paper. Otherwise, the population analysis problem is the same. Bulitta et al. [38] and Shah et al. [39] use the method S-ADAPT. S-ADAPT is based on the ADAPT package developed by D’Argenio, Wang and Schumitzky. Ishihara et al. [40] and Soraluce et al. [41] use the industry standard parametric version of NONMEM. Allard et al. [42] uses the stochastic approximation expectation maximization method (SAEM). NPAG can run any problem that is run by any of these programs.

5.3. Comparison of NPAG to Classical Population Analysis Programs

Several studies have compared NPAG to a parametric algorithm. Bustad et al. [43] compared the statistical consistency and efficiency of ITS and two older NONMEM® routines, FO and FOCE to that of NPEM and NPAG on simulated datasets; the nonparametric methods were more consistent and efficient. Prémaud et al. [44] also compared NPAG to NONMEM® FOCE, but the two algorithms converged to distinctly different results. NPAG converged to an estimator with better predictive performance allowing for its use in therapeutic drug monitoring. More recently, de Velde et al. [45] compared NONMEM® FOCE-I to NPAG, with both methods converging to similar parameter estimates. In each of the above studies, NPAG converged to a distribution with greater variance.

Author Contributions

Conceptualization, J.V.B., R.L. and A.S.; data curation, M.v.G., M.N.N. and W.M.Y.; formal analysis, J.V.B. and A.S.; funding acquisition, J.V.B., R.W.J. and M.N.N.; investigation, M.N.N. and R.W.J.; methodology, R.L., T.T. and A.S.; project administration, A.S.; resources, M.N.N. and R.W.J.; software, M.v.G., A.K. and W.M.Y.; supervision, A.S.; validation, J.B., D.S.B., J.V.B., A.K., R.L., T.T., A.S. and W.M.Y.; visualization, W.M.Y.; writing—original draft preparation, W.M.Y.; writing—review and editing, J.B., D.S.B., J.V.B., A.K., R.L., T.T., A.S. and W.M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by grants from NIH: RR11526, GM65619, GM068968, EB005803, EB001978, HD070886. JB was supported in part by NSF/DMS-0505712.

Data Availability Statement

Data is available on request from contact@lapk.org.

Acknowledgments

The NPAG program was developed at the USC Laboratory of Applied Pharmacokinetics. James Burke (University of Washington) developed the Primal-Dual Interior-Point method discussed in the Appendix. Robert Leary (Pharsight Corporation) developed the Adaptive Grid method and wrote the original Fortran program for NPAG. Michael Neely, MD (USC Children’s Hospital of Los Angeles) developed the program package Pmetrics which contains NPAG as a subprogram. Pmetrics is an R package for nonparametric and parametric population modeling and simulation and is available at www.lapk.org, see Neely et al. [24].

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AGAdaptive grid
CoNSCoagulase-negative S t a p h y l o c o c c i
EMExpectation–maximization algorithm
ISDMIntra-simplex direction method
NPAGNonparametric adaptive grid algorithm
NPMLNonparametric maximum likelihood
PDIPPrimal-dual interior-point method
QPQuadratic programming

Appendix A. A Primal-Dual Interior-Point Algorithm (PDIP)

To make this paper self-contained, we outline here the PDIP algorithm which was written by James Burke. This algorithm is a FORTRAN subroutine of NPAG. The description below is based on the Matlab and C++ codes found in Bradley Bell’s website, see [10]. Definition of general terms and theorems can be found in Boyd and Vandenberghe [9].

Appendix A.1. Duality Theory and the Basic Problem

Given a set of support points { ϕ k } , the problem of finding the optimal weights { λ k } in Equation (4) can be posed as the following optimization problem
P min Φ Ψ λ s . t . 0 λ , e λ = 1 ,
where Ψ R n × m is the matrix whose ( i , j ) entry is p ( y i ϕ j ) and where in general, the function Φ : R k R { + } is given by
Φ ( z ) = i = 1 k log z i , 0 < z , and + , otherwise .
The symbol e is always to be interpreted as the vector of all ones of the appropriate dimension.
The problem P is a convex programming problem since the objective function Φ is convex and the constraining region is a convex set. The Fenchel–Rockafellar dual of the convex program P is the problem
D min Φ ( ω ) s . t . L ω m e .
From Boyd, we obtain the following Karush–Kuhn–Tucker (KKT) equations relating the solutions to the problem P and D .
m e = Ψ w + y
e = W Ψ λ
0 = Λ Y e
where for any vector x , we define X to be the diagonal matrix having x along the diagonal.

Appendix A.2. An Interior-Point Path-Following Algorithm

The relaxed KKT is given by
m e = Ψ w + y
e = W Ψ λ
μ e = Λ Y e
0 λ , 0 w , 0 y ,
for μ > 0 . ( μ is the relaxation parameter.) A damped Newton’s method is used to solve the above system.
Consider the function F : R 2 m + n R 2 m + n given by
F ( λ , w , y ) = Ψ w + y W Ψ λ Λ Y e .
A triple ( λ , w , y ) solves Equations (A5) to () if and only if
F λ , w , y = m e e μ e
and 0 λ , 0 ω , and 0 y . Path-following algorithms attempt to solve (A9) by applying Newton’s method for progressively smaller values of the relaxation parameter μ . We first need the derivative of F. It follows
F ( λ , ω , y ) = 0 Ψ I W Ψ Z 0 Y 0 Λ
where z = Ψ λ .
At the kth iteration of the algorithm, the Newton step is given by the solution to the nonsingular linear system
F λ k , w k , y k + F λ k , w k , y k * Λ k , W k , Y k = e m , e n , μ k e m
where y is constrained to satisfy the first KKT condition y k = e m Ψ w k .
The above set of equations can be reduced by standard techniques. It follows:
Δ w = H 1 r 2
Δ y = Ψ Δ ω
Δ λ = r 1 λ D 1 Δ y
where H = D 2 Ψ D 1 Ψ , D 2 = Z W 1 , D 1 = Λ Y 1 , r 1 = μ Y 1 e , r 2 = W 1 e Ψ r 1 where the superscript k is suppressed for simplicity.

Appendix A.3. The Algorithm

To describe the algorithm, we need to define the variables: q = 1 m i = 1 m λ i y i ρ = e W Z e and the scaled duality gap γ = | Φ ( ω ) + Φ ( Ψ λ ) | 1 + | Φ ( Ψ λ ) | .

Appendix A.3.1. Initialization

Initially choose λ 0 = e m / m , w 0 = e n / Ψ λ 0 , and y 0 = e m Ψ w 0 . (Division of two vectors is performed component-wise.) Set ε = 10 8 .

Appendix A.3.2. Iteration

At iteration k + 1 , set
μ k + 1 = σ k q k
where the reduction factor σ is defined by
σ = 1 , if   μ ε   and   ρ > ε , min ( 0.3 , ( 1 δ 1 ) 2 ) , ( 1 δ 2 ) 2 , | ρ μ | ρ + 100 τ , otherwise .
The next iterates are given by λ k + 1 = λ k + δ 1 [ Δ λ k ] , ω k + 1 = ω k + δ 2 [ Δ ω k ] and y k + 1 = y k + δ 2 [ Δ y k ] , where the “damping” factors δ 1 and δ 2 are defined by
δ 1 , 0 = min ( min ( Λ 1 Δ λ ) , 1 2 ) 1 δ 2 , 0 = min ( min ( Y 1 Δ y ) , min ( W 1 Δ w ) , 1 2 ) 1 δ 1 = min ( 1 , 0.99995 δ 1 , 0 ) δ 2 = min ( 1 , 0.99995 δ 2 , 0 )

Appendix A.3.3. Exit Conditions

Iterate Equations (A11)–(A13) until μ ε and ρ ε and γ ε .
If these conditions are not satisfied after a set number of iterations, then write “PDIP did not converge in the given number of iterations”.
Figure A1. Model.
Figure A1. Model.
Pharmaceutics 13 00042 g0a1
Figure A2. The 300 simulated observed profiles, frame a, are generated in Pmetrics using the function SIMrun(). The 0.05, 0.25, 0.5, 0.75, and 0.95 percent tiles of the profiles are plotted in frame b, with the C I 95 % in grey. The observation times are marked by a vertical dotted line in frame b. Other details are in the text. Of note: inspection of these profiles does not suggest a bimodal elimination parameter.
Figure A2. The 300 simulated observed profiles, frame a, are generated in Pmetrics using the function SIMrun(). The 0.05, 0.25, 0.5, 0.75, and 0.95 percent tiles of the profiles are plotted in frame b, with the C I 95 % in grey. The observation times are marked by a vertical dotted line in frame b. Other details are in the text. Of note: inspection of these profiles does not suggest a bimodal elimination parameter.
Pharmaceutics 13 00042 g0a2
Figure A3. Kernel densities of simulated PK parameters. All K are in H r 1 , Volume is in d L . Kernel density (dashed line) of the distribution is over-plotted on the histogram (grey), which are normalized to the total observations. Simulated support points are generated in Pmetrics using the function SIMrun(). Details are in the text. Kernel densities are calculated in R using the (S3) generic function density().
Figure A3. Kernel densities of simulated PK parameters. All K are in H r 1 , Volume is in d L . Kernel density (dashed line) of the distribution is over-plotted on the histogram (grey), which are normalized to the total observations. Simulated support points are generated in Pmetrics using the function SIMrun(). Details are in the text. Kernel densities are calculated in R using the (S3) generic function density().
Pharmaceutics 13 00042 g0a3
Figure A4. Kernel densities of NPAG estimated PK parameters (black line) vs. Simulated r.v.s (dashed with grey fill). K are in H r 1 , volume is in d L . NPAG estimation is calculated using the NPrun() function in Pmetrics. Similarity of these two distributions is verified in R using the function mtsknn.eq(). See text for further detail.
Figure A4. Kernel densities of NPAG estimated PK parameters (black line) vs. Simulated r.v.s (dashed with grey fill). K are in H r 1 , volume is in d L . NPAG estimation is calculated using the NPrun() function in Pmetrics. Similarity of these two distributions is verified in R using the function mtsknn.eq(). See text for further detail.
Pharmaceutics 13 00042 g0a4
Figure A5. Predicted vs. Observed. Population Fit: r 2 = 0.569 , intercept = 3.8 ( C I 95 % 5.85 , 13.5 ) , slope = 1.12 ( C I 95 % 1.08 , 1.15 ) , bias = 6.45 , imprecision = 366 . Individual Bayesian Posterior Fit: r 2 = 0.999 , intercept = 0.168 ( C I 95 % 0.267 , 0.603 ) , slope = 0.998 ( C I 95 % 0.997 , 1 ) , bias = 0.0612 , imprecision = 1.15 .
Figure A5. Predicted vs. Observed. Population Fit: r 2 = 0.569 , intercept = 3.8 ( C I 95 % 5.85 , 13.5 ) , slope = 1.12 ( C I 95 % 1.08 , 1.15 ) , bias = 6.45 , imprecision = 366 . Individual Bayesian Posterior Fit: r 2 = 0.999 , intercept = 0.168 ( C I 95 % 0.267 , 0.603 ) , slope = 0.998 ( C I 95 % 0.997 , 1 ) , bias = 0.0612 , imprecision = 1.15 .
Pharmaceutics 13 00042 g0a5
Table A1. Simulation versus optimization. Row 1: True simulated means for each parameter. Row 2: NPAG estimates of corresponding means.
Table A1. Simulation versus optimization. Row 1: True simulated means for each parameter. Row 2: NPAG estimates of corresponding means.
KaVcKeKcpKpc
μ S I M 0.79480801.19827321.31624031.97008610.2028168
μ N P A G 0.80035211.20547671.31083261.97805140.2059419
Table A2. Parameter Covariance Matrices.
Table A2. Parameter Covariance Matrices.
SimulationKaVKeKcpKpc
Ka0.0215919795
V0.00119164650.0468421011
Ke−0.0015398993−0.00058390480.1563336061
Kcp0.0009717487−0.0032613149−0.00196791550.1391046100
Kpc−0.00005729360.00040808010.00074021690.00046957830.0013055173
NPAGKaVKeKcpKpc
Ka0.0241232043
V0.00388274610.0537290654
Ke−0.0009243608−0.00757067830.1686441632
Kcp−0.0017343788−0.0083046624−0.00226654640.1617627081
Kpc0.00067697950.00056010720.00273589310.00077826490.0021142178

References

  1. Sheiner, L.; Beal, S. Evaluation of Methods for Estimating Population Pharmacokinetic Parameters. I. Biexponential Model and Experimental Pharmacokinetic Data. J. Pharmacokinet. Biopharm. 1980, 8, 553–571. [Google Scholar] [CrossRef]
  2. Sheiner, L.; Beal, S. Evaluation of Methods for Estimating Population Pharmacokinetic Parameters. II. Michaelis–Menten Model: Routine Clinical Pharmacokinetic Data. J. Pharmacokinet. Biopharm. 1981, 9, 635–651. [Google Scholar] [CrossRef]
  3. Bauer, R.; Guzy, S.; Ng, C. A Survey of Population Analysis Methods and Software for Complex Pharmacokinetic and Pharmacodynamic Models with Examples. AAPS J. 2007, 9, E60–E83. [Google Scholar] [CrossRef] [Green Version]
  4. Kiefer, J.; Wofowitz, J. Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely many Incidental Parameters. Ann. Math. Statist. 1956, 27, 887–906. [Google Scholar] [CrossRef]
  5. Jelliffe, R.; Bayard, D.; Milman, M.; Guilder, M.V.; Schumitzky, A. Achieving Target Goals most Precisely using Nonparametric Compartmental Models and ‘Multiple Model’ Design of Dosage Regimens. Ther. Drug Monit. 2000, 22, 346–353. [Google Scholar] [CrossRef]
  6. Wang, X.; Wang, Y. Nonparametric multivariate density estimation using mixtures. Stat. Comput. 2015, 25, 33–43. [Google Scholar] [CrossRef]
  7. Lindsay, B.G. The Geometry of Mixture Likelihoods: A general theory. Ann. Statist. 1983, 11, 86–94. [Google Scholar] [CrossRef]
  8. Mallet, A. A Maximum Likelihood Estimation Method for Random Coefficient Regression Models. Biometrika 1986, 73, 645–656. [Google Scholar] [CrossRef]
  9. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  10. Bell, B.; Non-Parametric Population Analysis, Seattle, WA, USA. Personal communication, 2012.
  11. Baek, Y. An Interior Point Approach to Constrained Nonparametric Mixture Models. Ph.D. Thesis, Department of Mathematics, University of Washington, Seattle, WA, USA, 2006. [Google Scholar]
  12. Yamada, W.; Bartroff, J.; Bayard, D.; Burke, J.; van Guilder, M.; Jelliffe, R.; Leary, R.; Neely, M.; Kryshchenko, A.; Schumitzky, A. The Nonparametric Adaptive Grid Algorithm for Population Pharmacokinetic Modeling; Technical Report TR-2014-1; Children’s Hospital Los Angeles: Los Angeles, CA, USA, 2014. [Google Scholar]
  13. Fedorov, V.V. Theory of Optimal Experiments; Studden, W.J., Klimko, E.M., Eds.; Academic Press: New York, NY, USA, 1972. [Google Scholar]
  14. Schumitzky, A. Nonparametric EM Algorithms For Estimating Prior Distributions. Appl. Math. Comput. 1991, 45, 143–157. [Google Scholar] [CrossRef]
  15. Lesperance, M.L.; Kalbfleisch, J.D. An algorithm for computing the nonparametric MLE of a mixing distribution. J. Am. Stat. Assoc. 1992, 87, 120–126. [Google Scholar] [CrossRef]
  16. Pilla, R.S.; Bartolucci, F.; Lindsay, B.G. Model building for semiparametric mixtures. arXiv 2006, arXiv:math/0606077. [Google Scholar]
  17. Savic, R.M.; Kjellsson, M.C.; Karlsson, M.O. Evaluation of the nonparametric estimation method in NONMEM VI. Eur. J. Pharm. Sci. 2009, 37, 27–35. [Google Scholar] [CrossRef]
  18. Savic, R.M.; Karlsson, M.O. Evaluation of an extended grid method for estimation using nonparametric distributions. AAPS J. 2009, 11, 615–627. [Google Scholar] [CrossRef] [Green Version]
  19. Leary, R. An overview of nonparametric estimation methods used in population analysis. In Abstracts of the Annual Meeting of the Population Approach Group in Europe 2017; Number Abstract 7383; Population Analysis Group Europe (PAGE); p. 26. Available online: https://www.page-meeting.org (accessed on 12 September 2020).
  20. Leary, R.; Jelliffe, R.; Schumitzky, A.; Guilder, M.V. An Adaptive Grid Non-Parametric Approach to Pharmacokinetic and Dynamic (PK/PD) Population Models. In Proceedings of the 14th IEEE Symposium on Computer-Based Medical Systems (CBMS’01), Bethesda, MD, USA, 26–27 March 2001; p. 0389. [Google Scholar] [CrossRef]
  21. Tatarinova, T.; Schumititzky, A. Nonlinear Mixture Models: A Bayesian Approach; Imperial College Press: London, UK, 2015. [Google Scholar]
  22. Jordan-Squire, C. Convex Optimization over Probability Measures. Ph.D. Thesis, Department of Mathematics, University of Washington, Washington, DC, USA, 2015. [Google Scholar]
  23. Neely, M.; Philippe, M.; Rushing, T.; Fu, X.; van Guilder, M.; Bayard, D.; Schumitzky, A.; Bleyzac, N.; Goutelle, S. Accurately Achieving Target Busulfan Exposure in Children and Adolescents With Very Limited Sampling and the BestDose Software. Ther. Drug Monit. 2016, 38, 332–342. [Google Scholar] [CrossRef] [Green Version]
  24. Neely, M.; van Guilder, M.; Yamada, W.; Schumitzky, A.; Jelliffe, R. Accurate Detection of Outliers and Subpopulations with Pmetrics: A non-parametric and parametric pharmacometric package for R. Ther. Drug Monit. 2012, 34, 467–476. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Faure, H. Discrépance de suites associées á un système de numération (en dimension s). Acta Arith. 1982, 41, 337–351. [Google Scholar] [CrossRef]
  26. Bratley, P.; Fox, B.L. Algorithm 659: Implementing Sobol’s Quasirandom Sequence Generator. ACM Trans. Math. Softw. 1988, 14, 88–100. [Google Scholar] [CrossRef]
  27. Fox, B.L. Algorithm 647: Implementation and Relative Efficiency of Quasirandom Sequence Generators. ACM Trans. Math. Softw. 1986, 12, 362–376. [Google Scholar] [CrossRef]
  28. Davidian, M.; Giltinan, D.M. Nonlinear Models for Repeated Measurement Data; Chapman and Hall/CRC Press: Boca Raton, FL, USA, 1995. [Google Scholar]
  29. Davidian, M.; Giltinan, D.M. Nonlinear Models for Repeated Measurement Data: An overview and update. J. Agric. Biol. Environ. Stat. 2003, 8, 387–419. [Google Scholar] [CrossRef]
  30. Ramos-Martin, V.; Johnson, A.; Livermore, J.; McEntee, L.; Goodwin, J.; Whalley, F.; Docobo-Perez, F.; Felton, T.W.; Zhao, W.; Jacqz-Aigrain, E.; et al. Pharmacodynamics of vancomycin for CoNS infection: Experimental basis for optimal use of vancomycin in neonates. J. Antimicrob. Chemother. 2016, 71, 992–1002. [Google Scholar] [CrossRef] [Green Version]
  31. Drusano, G.; Neely, M.; van Guilder, M.; Schumitzky, A.; Brown, D.; Fikes, S.; Peloquin, C.; Louie, A. Analysis of combination drug therapy to develop regimens with shortened duration treatment for tuberculosis. PLoS ONE 2014, 9, e101311. [Google Scholar] [CrossRef]
  32. Godfrey, K.R. The identifiability of parametric models used in biomedicine. Math. Model. 1986, 7, 1195–1214. [Google Scholar] [CrossRef] [Green Version]
  33. Åsberg, A.; Midtvedt, K.; van Guilder, M.; Størset, E.; Bremer, S.; Bergan, S.; Jelliffe, R.; Hartmann, A.; Neely, M. Inclusion of CYP3A5 genotyping in a nonparametric population model improves dosing of tacrolimus early after transplantation. Transpl. Int. Off. J. Eur. Soc. Organ Transplant. 2013, 26, 1198–1207. [Google Scholar] [CrossRef]
  34. Størset, E.; Åsberg, A.; Skauby, M.; Neely, M.; Bergan, S.; Bremer, S.; Midtvedt, K. Improved Tacrolimus Target Concentration Achievement Using Computerized Dosing in Renal Transplant Recipients–A Prospective, Randomized Study. Transplantation 2015, 99, 2158–2166. [Google Scholar] [CrossRef] [Green Version]
  35. Bayard, D.S.; Neely, M. Experiment design for nonparametric models based on minimizing Bayes Risk: Application to voriconazole. J. Pharmacokinet. Pharmacodyn. 2017, 44, 95–111. [Google Scholar] [CrossRef]
  36. Koenker, R.; Mizera, I. Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. J. Am. Stat. Assoc. 2014, 109, 674–685. [Google Scholar] [CrossRef]
  37. Banks, H.T.; Kenz, Z.R.; Thompson, W.C. A Review of Selected Techniques in Inverse Problem Nonparametric Probability Distribution Estimation. J. Inverse Ill-Posed Probl. 2012, 20, 429–460. [Google Scholar] [CrossRef]
  38. Bulitta, J.B.; Jiao, Y.; Landersdorfer, C.B.; Sutaria, D.S.; Tao, X.; Shin, E.; Höhl, R.; Holzgrabe, U.; Stephan, U.; Sorgel, F. Comparable Bioavailability and Disposition of Pefloxacin in Patients with Cystic Fibrosis and Healthy Volunteers Assessed via Population Pharmacokinetics. Pharmaceutics 2019, 11, 323. [Google Scholar] [CrossRef] [Green Version]
  39. Shah, N.R.; Bulitta, J.B.; Kinzig, M.; Landersdorfer, C.B.; Jiao, Y.; Sutaria, D.S.; Tao, X.; Höhl, R.; Holzgrabe, U.; Kees, F.; et al. Novel population pharmacokinetic approach to explain the differences between cystic fibrosis patients and healthy volunteers via protein binding. Pharmaceutics 2019, 11, 286. [Google Scholar] [CrossRef] [Green Version]
  40. Ishihara, N.; Nishimura, N.; Ikawa, K.; Karino, F.; Miura, K.; Tamaki, H.; Yano, T.; Isobe, T.; Morikawa, N.; Naora, K. Population Pharmacokinetic Modeling and Pharmacodynamic Target Attainment Simulation of PiperacillinTazobactam for Dosing Optimization in Late Elderly Patients with Pneumoni. Antibiotics 2020, 9, 113. [Google Scholar] [CrossRef] [Green Version]
  41. Soraluce, A.; Barrasa, H.; Asín-Prieto, E.; Sánchez-Izquierdo, J.Á.; Maynar, J.; Isla, A.; Rodríguez-Gascón, A. Novel Population Pharmacokinetic Model for Linezolid in Critically Ill Patients and Evaluation of the Adequacy of the Current Dosing Recommendation. Pharmaceutics 2020, 12, 54. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Allard, Q.; Djerada, Z.; Pouplard, C.; Repessé, Y.; Desprez, D.; Galinat, H.; Frotscher, B.; Berger, C.; Harroche, A.; Ryman, A.; et al. Real Life Population Pharmacokinetics Modelling of Eight Factors VIII in Patients with Severe Haemophilia A: Is It Always Relevant to Switch to an Extended Half-Life? Pharmaceutics 2020, 12, 380. [Google Scholar] [CrossRef] [PubMed]
  43. Bustad, A.; Terziivanov, D.; Leary, R.; Port, R.; Schumitzky, A.; Jelliffe, R. Parametric and Nonparametric Population Methods: Their comparative performance in analysing a clinical dataset and two Monte Carlo simulation studies. Clin. Pharmacokinet. 2006, 45, 365–383. [Google Scholar] [CrossRef] [PubMed]
  44. Prémaud, A.; Weber, L.T.; Tönshoff, B.; Armstrong, V.; Oellerich, M.; Urien, S.; Marquet, P.; Rousseau, A. Population pharmacokinetics of mycophenolic acid in pediatric renal transplant patients using parametric and nonparametric approaches. Pharmacol. Res. 2011, 63, 216–224. [Google Scholar] [CrossRef] [Green Version]
  45. de Velde, F.; de Winter, B.C.M.; Neely, M.N.; Yamada, W.M.; Koch, B.C.P.; Harbarth, S.; von Dach, E.; van Gelder, T.; Huttner, A.; Mouton, J.W. Population Pharmacokinetics of Imipenem in Critically Ill Patients: A Parametric and Nonparametric Model Converge on CKD-EPI Estimated Glomerular Filtration Rate as an Impactful Covariate. Clin. Pharmacokinet. 2020, 59, 885–898. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yamada, W.M.; Neely, M.N.; Bartroff, J.; Bayard, D.S.; Burke, J.V.; van Guilder, M.; Jelliffe, R.W.; Kryshchenko, A.; Leary, R.; Tatarinova, T.; et al. An Algorithm for Nonparametric Estimation of a Multivariate Mixing Distribution with Applications to Population Pharmacokinetics. Pharmaceutics 2021, 13, 42. https://doi.org/10.3390/pharmaceutics13010042

AMA Style

Yamada WM, Neely MN, Bartroff J, Bayard DS, Burke JV, van Guilder M, Jelliffe RW, Kryshchenko A, Leary R, Tatarinova T, et al. An Algorithm for Nonparametric Estimation of a Multivariate Mixing Distribution with Applications to Population Pharmacokinetics. Pharmaceutics. 2021; 13(1):42. https://doi.org/10.3390/pharmaceutics13010042

Chicago/Turabian Style

Yamada, Walter M., Michael N. Neely, Jay Bartroff, David S. Bayard, James V. Burke, Mike van Guilder, Roger W. Jelliffe, Alona Kryshchenko, Robert Leary, Tatiana Tatarinova, and et al. 2021. "An Algorithm for Nonparametric Estimation of a Multivariate Mixing Distribution with Applications to Population Pharmacokinetics" Pharmaceutics 13, no. 1: 42. https://doi.org/10.3390/pharmaceutics13010042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop