# A Maximum Entropy Procedure to Solve Likelihood Equations

^{*}

## Abstract

**:**

## 1. Introduction

## 2. A Maximum Entropy Solution to Score Equations

#### 2.1. Example 1: The Normal Case

#### 2.2. Example 2: The Poisson Case

`constrOptim.nl`of the

`R`package

`alabama`[32]. The algorithm converged successfully in few iterations. The recovered probabilities are as follows:

#### 2.3. Example 3: The Gamma Case

#### 2.4. Example 4: Logistic Regression

`R`package

`logistf`[37]. As expected, the NR algorithm fails to converge reporting divergent estimates. By contrast, the NRF procedure converges to non-divergent solutions. Interestingly, the maximum entropy solutions are more close to NRF estimates although they differ in magnitude.

## 3. Simulation Study

- (i)
- the sample size n at three levels: 15, 20, 200;
- (ii)
- the number of predictors p (excluding the intercept) at three levels: 1, 5, 10.

- Generate the matrix of predictors ${\mathbf{X}}_{{n}_{k}\times (1+{p}_{k})}=[{\mathbf{1}}_{{n}_{k}}|{\tilde{\mathbf{X}}}_{{n}_{k}\times {p}_{k}}]$, where ${\tilde{\mathbf{X}}}_{{n}_{k}\times {p}_{k}}$ is drawn from the multivariate standard normal distribution $\mathrm{N}({\mathbf{0}}_{{p}_{k}},{\mathbf{I}}_{{p}_{k}})$, whereas the column vector of all ones $\mathbf{1}$ stands for the intercept term;
- Generate the vector of predictors ${\mathit{\beta}}_{1+{p}_{k}}$ from the multivariate centered normal distribution $\mathrm{N}({\mathbf{0}}_{1+{p}_{k}},\sigma {\mathbf{I}}_{1+{p}_{k}})$, where $\sigma =2.5$ was chosen to cover the natural range of variability allowed by the logistic equation;
- Compute the vector ${\mathit{\pi}}_{{n}_{k}}$ via Equation (10) using ${\mathbf{X}}_{{n}_{k}\times (1+{p}_{k})}$ and ${\mathit{\beta}}_{{p}_{k}}$;
- For $q=1,\dots ,Q$, generate the vectors of response variables ${\mathbf{y}}_{{n}_{k}}^{\left(q\right)}$ from the binomial distribution $\mathrm{Bin}\left({\mathit{\pi}}_{{n}_{k}}\right)$, with ${\mathit{\pi}}_{{n}_{k}}$ being fixed;
- For $q=1,\dots ,Q$, estimate the vectors of parameters ${\widehat{\mathit{\beta}}}_{1+{p}_{k}}^{\left(q\right)}$ by means of Newton–Raphson (NR), bias-corrected Newton–Raphson (NRF), and ME-score (ME) algorithms.

`glm`and

`logistf`routines of the

`R`packages

`stats`[38] and

`logistf`[37]. By contrast, the ME-score problem was solved via the augmented Lagrangean adaptive barrier algorithm implemented in

`constrOptim.nl`routine of the

`R`package

`alabama`[32]. Convergences of the algorithms were checked using the built-in criteria of

`glm`,

`logistf`, and

`constrOptim.nl`. For each of the generated data ${\{\mathbf{y},\mathbf{X}\}}_{q=1,\dots ,Q}$, the occurrence of separation was checked using a linear programming-based routine to find infinite estimates in the maximum likelihood solution [39,40]. The whole simulation procedure was performed on a (remote) HPC machine based on 16 cpu Intel Xeon CPU E5-2630L v3 1.80 GHz, 16 × 4 GB Ram.

- In the simplest cases with no separation (i.e., $N=50\wedge P=1$, $N=200\wedge P=1$, $N=200\wedge P=5$), the ME solutions to the maximum likelihood equations were the same as those provided by standard Newton–Raphson (NR) and the bias-corrected version (NRF). In all these cases, the bias of the estimates approximated zero (see Table 5);
- In the cases of separation, ME showed comparable performances to NRF, which is known to provide the most efficient estimates in the case of logistic model under separation: Bias and MSE decreased as a function of sample size and predictors, with MSE being lower for ME than NRF in the case of $N=200\wedge P=5$ and $N=200\wedge P=10$;
- In the most complex scenario with a large sample and higher model complexity ($N=200\wedge P=5$, $N=200\wedge P=10$), ME and NRF algorithms showed similar relative bias, with ME estimates being less variable than NRF in $N=200\wedge P=10$ condition. The ME algorithm tended to over-estimate the population parameters, by contrast NRF tended to under-estimate the true model parameters.

## 4. Discussion and Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Cox, D.R. Principles of Statistical Inference; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Stigler, S.M. The epic story of maximum likelihood. Stat. Sci.
**2007**, 22, 598–620. [Google Scholar] [CrossRef] - Tanner, M.A. Tools for Statistical Inference; Springer: Berlin, Germany, 2012. [Google Scholar]
- Commenges, D.; Jacqmin-Gadda, H.; Proust, C.; Guedj, J. A newton-like algorithm for likelihood maximization: The robust-variance scoring algorithm. arXiv
**2006**, arXiv:math/0610402. [Google Scholar] - Albert, A.; Anderson, J.A. On the existence of maximum likelihood estimates in logistic regression models. Biometrika
**1984**, 71, 1–10. [Google Scholar] [CrossRef] - Shen, J.; Gao, S. A solution to separation and multicollinearity in multiple logistic regression. J. Data Sci.
**2008**, 6, 515. [Google Scholar] [PubMed] - Firth, D. Bias reduction of maximum likelihood estimates. Biometrika
**1993**, 80, 27–38. [Google Scholar] [CrossRef] - Kenne Pagui, E.; Salvan, A.; Sartori, N. Median bias reduction of maximum likelihood estimates. Biometrika
**2017**, 104, 923–938. [Google Scholar] [CrossRef] - Gao, S.; Shen, J. Asymptotic properties of a double penalized maximum likelihood estimator in logistic regression. Stat. Probabil. Lett.
**2007**, 77, 925–930. [Google Scholar] [CrossRef] - Abbasbandy, S.; Tan, Y.; Liao, S. Newton-homotopy analysis method for nonlinear equations. Appl. Math. Comput.
**2007**, 188, 1794–1800. [Google Scholar] [CrossRef] - Cordeiro, G.M.; McCullagh, P. Bias correction in generalized linear models. J. R. Stat. Soci. Ser. B Methodol.
**1991**, 53, 629–643. [Google Scholar] [CrossRef] - Wu, T.M. A study of convergence on the Newton-homotopy continuation method. Appl. Math. Comput.
**2005**, 168, 1169–1174. [Google Scholar] [CrossRef] - Golan, A. Foundations of Info-Metrics: Modeling and Inference with Imperfect Information; Oxford University Press: Oxford, UK, 2017. [Google Scholar]
- Golan, A.; Judge, G.; Robinson, S. Recovering information from incomplete or partial multisectoral economic data. Rev. Econ. Stat.
**1994**, 76, 541–549. [Google Scholar] [CrossRef] - Golan, A.; Judge, G.; Karp, L. A maximum entropy approach to estimation and inference in dynamic models or counting fish in the sea using maximum entropy. J. Econ. Dyn. Control
**1996**, 20, 559–582. [Google Scholar] [CrossRef] - Golan, A.; Judge, G.; Perloff, J.M. A maximum entropy approach to recovering information from multinomial response data. J. Am. Stat. Assoc.
**1996**, 91, 841–853. [Google Scholar] [CrossRef] - Marsh, T.L.; Mittelhammer, R.C. Generalized maximum entropy estimation of a first order spatial autoregressive model. In Spatial and Spatiotemporal Econometrics; Emerald Group Publishing Limited: Bingley, UK, 2004; pp. 199–234. [Google Scholar]
- Ciavolino, E.; Al-Nasser, A.D. Comparing generalised maximum entropy and partial least squares methods for structural equation models. J. Nonparametric Stat.
**2009**, 21, 1017–1036. [Google Scholar] [CrossRef] - Banerjee, A.; Dhillon, I.; Ghosh, J.; Merugu, S.; Modha, D.S. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J. Mach. Learn. Res.
**2007**, 8, 1919–1986. [Google Scholar] - Ciavolino, E.; Calcagnì, A. A Generalized Maximum Entropy (GME) estimation approach to fuzzy regression model. Appl. Soft Comput.
**2016**, 38, 51–63. [Google Scholar] [CrossRef] - Kapur, J.N. Maximum-Entropy Models in Science and Engineering; John Wiley & Sons: Hoboken, NJ, USA, 1989. [Google Scholar]
- Fang, S.C.; Rajasekera, J.R.; Tsao, H.S.J. Entropy Optimization and Mathematical Programming; Springer Science & Business Media: Berlin, Germany, 2012; Volume 8. [Google Scholar]
- El-Wakil, S.; Elhanbaly, A.; Abdou, M. Maximum entropy method for solving the collisional Vlasov equation. Phys. A Stat. Mech. Appl.
**2003**, 323, 213–228. [Google Scholar] [CrossRef] - Bryan, R. Maximum entropy analysis of oversampled data problems. Eur. Biophys. J.
**1990**, 18, 165–174. [Google Scholar] [CrossRef] - Calcagnì, A.; Lombardi, L.; Sulpizio, S. Analyzing spatial data from mouse tracker methodology: An entropic approach. Behav. Res. Methods
**2017**, 49, 2012–2030. [Google Scholar] [CrossRef] - Sukumar, N. Construction of polygonal interpolants: a maximum entropy approach. Int. J. Numer. Methods Eng.
**2004**, 61, 2159–2181. [Google Scholar] [CrossRef] - Golan, A.; Judge, G.G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with Limited Data; Wiley: New York, NY, USA, 1996. [Google Scholar]
- Golan, A.; Judge, G.; Miller, D. The maximum entropy approach to estimation and inference. In Applying Maximum Entropy to Econometric Problems; Emerald Group Publishing Limited: Bingley, UK, 1997; pp. 3–24. [Google Scholar]
- Papalia, R.B. A composite generalized cross-entropy formulation in small samples estimation. Econom. Rev.
**2008**, 27, 596–609. [Google Scholar] [CrossRef] - Ciavolino, E.; Calcagnì, A. A generalized maximum entropy (GME) approach for crisp-input/fuzzy-output regression model. Qual. Quant.
**2014**, 48, 3401–3414. [Google Scholar] [CrossRef] - Golan, A. Maximum entropy, likelihood and uncertainty: A comparison. In Maximum Entropy and Bayesian Methods; Springer: Berlin, Germany, 1998; pp. 35–56. [Google Scholar]
- Varadhan, R. Alabama: Constrained Nonlinear Optimization, R package version 2015.3-1; R Core Team: Vienna, Austria, 2015. [Google Scholar]
- Choi, S.C.; Wette, R. Maximum likelihood estimation of the parameters of the gamma distribution and their bias. Technometrics
**1969**, 11, 683–690. [Google Scholar] [CrossRef] - Pregibon, D. Logistic regression diagnostics. Ann. Stat. J.
**1981**, 9, 705–724. [Google Scholar] [CrossRef] - Lesaffre, E.; Albert, A. Partial separation in logistic discrimination. J. R. Stat. Soc. Ser. B
**1989**, 51, 109–116. [Google Scholar] [CrossRef] - Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen.
**1936**, 7, 179–188. [Google Scholar] [CrossRef] - Heinze, G.; Ploner, M. Logistf: Firth’s Bias-Reduced Logistic Regression, R package version 1.23; R Core Team: Vienna, Austria, 2018. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
- Konis, K. Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models. Ph.D. Thesis, Department of Statistics, University of Oxford, Oxford, UK, 2007. Available online: https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a (accessed on 14 June 2019).
- Konis, K. SafeBinaryRegression: Safe Binary Regression, R package version 0.1-3; R Core Team: Vienna, Austria, 2013. [Google Scholar]
- Brown, L.D. Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory; Lecture Notes-Monograph Series; Institute of Mathematical Statistics: Hayworth, CA, USA, 1986; Volume 9. [Google Scholar]

**Figure 1.**Simulation study: averaged bias, squared averaged bias, and mean squared error (MSE) for Newton–Raphson (NR), bias-corrected Newton–Raphson (NRF), maximum entropy (ME) algorithms. Note that the number of predictors p is represented column-wise (outside) whereas the sample sizes n is reported in the x-axis (inside). The measures are plotted on logarithmic scale.

**Figure 2.**Simulation study: relative bias for NRF and ME algorithms in the conditions $N=200\wedge P=5$ (A) and $N=200\wedge P=10$ (B). Note that plots are paired vertically by predictor. Rate of over-estimation (under-estimation): (A) ME = 0.54 (0.46), NRF = 0.49 (0.51); (B) ME = 0.53 (0.47), NRF = 0.47 (0.53).

ME | maximum entropy |

NR | Newton–Raphson algorithm |

NFR | bias corrected Newton–Raphson algorithm |

y | sample of observations |

$\mathcal{Y}$ | sample space |

$\mathit{\theta}$ | $J\times 1$ vector of parameters |

$\widehat{\mathit{\theta}}$ | estimated vector of parameters |

$\tilde{\mathit{\theta}}$ | reparameterized vector of parameters under ME |

$f(y;\mathit{\theta})$ | density function |

$l\left(\mathit{\theta}\right)$ | likelihood function |

$\mathcal{U}\left(\mathit{\theta}\right)$, $\mathcal{U}(\tilde{\mathit{\theta}})$ | score function |

$\mathbf{z}$ | $K\times 1$ vector of finite elements for $\tilde{\mathit{\theta}}$ |

$\mathbf{p}$ | $K\times 1$ vector of unknown probabilities for $\tilde{\mathit{\theta}}$ |

$\widehat{\mathbf{p}}$ | vector of estimated probabilities for $\tilde{\mathit{\theta}}$ |

**Table 2.**Finney’s data on vasoconstriction in the skin of the digits. The response Y indicates the occurrence ($Y=1$) or non-occurrence ($Y=0$) of the vasoconstriction.

Volume | Rate | Y |
---|---|---|

3.70 | 0.825 | 1 |

3.50 | 1.090 | 1 |

1.25 | 2.500 | 1 |

0.75 | 1.500 | 1 |

0.80 | 3.200 | 1 |

0.70 | 3.500 | 1 |

0.60 | 0.750 | 0 |

1.10 | 1.700 | 0 |

0.90 | 0.750 | 0 |

0.90 | 0.450 | 0 |

0.80 | 0.570 | 0 |

0.55 | 2.750 | 0 |

0.60 | 3.000 | 0 |

1.40 | 2.330 | 1 |

0.75 | 3.750 | 1 |

2.30 | 1.640 | 1 |

3.20 | 1.600 | 1 |

0.85 | 1.415 | 1 |

1.70 | 1.060 | 0 |

**Table 3.**Estimates for the iris logistic regression: ME (maximum entropy), NRF (biased-corrected Newton–Raphson), NR (Newton–Raphson). Note that the NRF algorithm implements the Firth’s bias correction [7].

ME | NRF | NR | |
---|---|---|---|

${\beta}_{0}$ | 17.892 | 12.539 | 445.917 |

${\beta}_{1}$ | −10.091 | −6.151 | −166.637 |

${\beta}_{2}$ | 12.229 | 6.890 | 140.570 |

**Table 4.**Simulation study: proportions of separation occurred in the data and non-convergence (nc) rates for NR, NRF, ME algorithms.

n | p | Separation | ${\mathbf{nc}}_{\mathbf{NR}}$ | ${\mathbf{nc}}_{\mathbf{NRF}}$ | ${\mathbf{nc}}_{\mathbf{ME}}$ |
---|---|---|---|---|---|

15 | 1 | 0.333 | 0.085 | 0.000 | 0.000 |

50 | 1 | 0.002 | 0.002 | 0.000 | 0.000 |

200 | 1 | 0.000 | 0.000 | 0.000 | 0.000 |

15 | 5 | 0.976 | 0.237 | 0.000 | 0.000 |

50 | 5 | 0.771 | 0.771 | 0.000 | 0.000 |

200 | 5 | 0.000 | 0.000 | 0.000 | 0.000 |

15 | 10 | 1.000 | 0.002 | 0.000 | 0.000 |

50 | 10 | 0.949 | 0.950 | 0.000 | 0.000 |

200 | 10 | 0.013 | 0.013 | 0.000 | 0.000 |

**Table 5.**Simulation study: averaged bias, squared averaged bias, and MSE for NR, NRF, ME algorithms.

NR | NRF | ME | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

n | p | $\widehat{\mathit{B}}$ | $\widehat{\mathit{V}}$ | ${\widehat{\mathit{B}}}^{\mathbf{2}}$ | $\mathit{MSE}$ | $\widehat{\mathit{B}}$ | $\widehat{\mathit{V}}$ | ${\mathit{B}}^{\mathbf{2}}$ | $\mathit{MSE}$ | $\widehat{\mathit{B}}$ | $\widehat{\mathit{V}}$ | ${\mathit{B}}^{\mathbf{2}}$ | $\mathit{MSE}$ |

15 | 1 | −5.54 | 236.70 | 30.67 | 267.36 | 0.22 | 0.35 | 0.05 | 0.40 | −1.17 | 6.28 | 1.37 | 7.64 |

50 | 1 | −0.13 | 3.42 | 0.02 | 3.44 | −0.00 | 1.41 | 0.00 | 1.41 | −0.12 | 1.99 | 0.01 | 2.00 |

200 | 1 | 0.03 | 0.11 | 0.00 | 0.11 | 0.00 | 0.10 | 0.00 | 0.10 | 0.03 | 0.11 | 0.00 | 0.11 |

15 | 5 | 10.68 | 1553.37 | 113.98 | 1667.33 | −1.22 | 3.00 | 1.50 | 4.49 | 0.20 | 5.32 | 0.04 | 5.36 |

50 | 5 | 7.46 | 1918.18 | 55.65 | 1973.78 | −0.44 | 2.20 | 0.20 | 2.39 | −0.11 | 1.45 | 0.01 | 1.46 |

200 | 5 | 0.24 | 1.58 | 0.06 | 1.64 | 0.01 | 0.50 | 0.00 | 0.50 | 0.12 | 0.42 | 0.02 | 0.44 |

15 | 10 | −0.97 | 177.40 | 0.95 | 178.35 | −0.13 | 4.82 | 0.02 | 4.84 | −0.38 | 8.10 | 0.14 | 8.24 |

50 | 10 | 2.80 | 1490.39 | 7.83 | 1498.20 | −0.07 | 1.23 | 0.00 | 1.23 | −0.02 | 1.53 | 0.00 | 1.53 |

200 | 10 | 0.66 | 15.29 | 0.43 | 15.72 | 0.02 | 0.86 | 0.00 | 0.86 | 0.10 | 0.48 | 0.01 | 0.50 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Calcagnì, A.; Finos, L.; Altoé, G.; Pastore, M.
A Maximum Entropy Procedure to Solve Likelihood Equations. *Entropy* **2019**, *21*, 596.
https://doi.org/10.3390/e21060596

**AMA Style**

Calcagnì A, Finos L, Altoé G, Pastore M.
A Maximum Entropy Procedure to Solve Likelihood Equations. *Entropy*. 2019; 21(6):596.
https://doi.org/10.3390/e21060596

**Chicago/Turabian Style**

Calcagnì, Antonio, Livio Finos, Gianmarco Altoé, and Massimiliano Pastore.
2019. "A Maximum Entropy Procedure to Solve Likelihood Equations" *Entropy* 21, no. 6: 596.
https://doi.org/10.3390/e21060596