# Applying Benford’s Law to Monitor Death Registration Data: A Management Tool for the COVID-19 Pandemic

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods and Empirical Procedure

#### 2.1. Description of Benford’s Law

_{(n)}is the probability of a number having the first non-zero digit n.

#### 2.2. Chi-Square Test

^{2}(Chi-square) test. Through the χ

^{2}test we tested whether the n entries in a set of data were compatible with BL (Equation (2)). That is to say, we tested the null hypothesis for the first digit probabilities, ${p}_{i}={P}_{r}$ (${D}_{1}=i)$. Thus, we tested the hypothesis specified below [38].

_{0}) that the first digit is the same as expected on BL basis. Hence, the chi-square test points to those sets of numbers in which we must look into the possible causes of noncompliance with BL, and are those for which we can reject the H

_{0}.

#### 2.3. Sensitivity Analysis Steps

^{2}test, we designed a sensitivity analysis following the steps detailed below. As the observed figures are random, and that randomization depends on chance, we ran this sensitivity analysis to validate results.

- Step 1.
- First, the series of observed values were modified by random perturbations assuming that:
- (i)
- such a disturbance was unintentional;
- (ii)
- the applied perturbations were independent of each other;
- (iii)
- the perturbation size varied over a 20% range, and within that range any possible outcome was equally likely. This assumption implies consideration of the uniform probability distribution taking values within the interval [−0.1, +0.1]. Denoted as U [−0.1; +0.1].

- Step 2.
- From the observed mortality rate of a specific AC, an arbitrarily large set of alternative series with a generated perturbation was obtained through a Montecarlo simulation. Specifically, we generated 1000 replications for each series. Therefore, given the observed series $\left\{{x}_{1}^{obs},{x}_{2}^{obs},{x}_{3}^{obs}\dots {x}_{n}^{obs}\right\}$, we obtained the ith series modified as ${x}_{k}^{i}={x}_{k}^{obs}\xb7\left(1+{u}_{k}^{i}\right),k=1,\dots .,n$ where $i=$ 1, …, 1000, $\{{{u}_{k}^{i}\}}_{k=1}^{n}$ are n values obtained by simulation from the distribution U [−0.1; +0.1].
- Step 3.
- The BL test was applied to each series ${i}_{0},\{{u}_{k}^{{i}_{0}}{\}}_{k=1}^{n}$ generated synthetically, by calculating the statistics distance of χ
^{2}and the p-value test for that series. Then, we obtained 1000 synthetic series, with their 1000 p-values $\{{p}_{1}^{i},{p}_{2}^{i},\dots ,{p}_{n}^{i}{\}}_{i=1}^{1000}$ and their 1000 χ^{2}distances.

^{2}. In addition, we calculated quantiles of α-order for those p-values, ${q}_{\alpha}$.

- Step 4.
- From ${q}_{\alpha}$ it was possible to obtain the equivalent of a confidence interval that allowed validation of the decision of BL fulfillment with the observed data. That is to say, our goal was to check if the decision for observed data could be kept for data with perturbations. Then, we set a ${q}_{1-\alpha}$ value and took a decision according to the scheme displayed in Table 1:

## 3. Data and Source

## 4. Results

^{2}test) and the COVID mortality rate by ACs. Among those regions for which we reject the hypothesis that BL is fulfilled, two kinds of interpretation can be offered. The majority of the ACs for which we reject the H

_{0}are ranked at the top of the mortality rate ranking (above the Spanish rate). In these cases, the explanation for the deviation from BL relates to mistakes in the registration of the daily number of deaths, or an uncontrolled pandemic crisis providing skyrocketing figures. The region with the largest χ

^{2}value is Catalonia, that is to say, the one with the largest deviation from what was expected according to BL. In fact, Catalonia rectified up to approximately 20% of the data initially supplied to the Ministry, confirming that there had been errors in the registry or in the counting of cases.

^{2}test.

## 5. Discussion

## 6. Conclusions

_{0}are ranked at the top of the mortality rate ranking (above the Spanish rate). In these cases, the explanation for the deviation from BL relates to mistakes in the registration of the daily number of deaths (this is the case with Cataluña, Navarra and Madrid among others). These mistakes in data registration may be due to delays in information reporting (events on a specific day may be recorded afterwards), human error and differences in counting or recording criteria, among others. In fact, anomalous figures, such as the case of Catalonia, have been often reported in the press. In this AC, one of the main recording errors was the delay in reporting and recording information (sometimes attributing to a single day death cases from previous days) [44].

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Anirudh, A. Mathematical modeling and the transmission dynamics in predicting the COVID-19—What next in combating the pandemic. Infect. Dis. Model.
**2020**, 5, 366–374. [Google Scholar] [CrossRef] [PubMed] - Mohamadou, Y.; Halidou, A.; Kapen, P.T. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID-19. Appl. Intell.
**2020**, 50, 3913–3925. [Google Scholar] [CrossRef] [PubMed] - Ahmad, F.; Almuayqil, S.N.; Humayun, M.; Naseem, S.; Khan, W.A.; Junaid, K. Prediction of COVID-19 Cases Using Machine Learning for Effective Public Health Management. Comput. Mater. Contin.
**2021**, 66, 2265–2282. [Google Scholar] [CrossRef] - Yadav, D.; Maheshwari, H.; Chandra, U. Outbreak prediction of covid-19 in most susceptible countries. Glob. J. Environ. Sci. Manag.
**2020**, 6, 11–20. [Google Scholar] [CrossRef] - Li, S.; Lin, Y.; Zhu, T.; Fan, M.; Xu, S.; Qiu, W.; Chen, C.; Li, L.; Wang, Y.; Yan, J.; et al. Development and external evaluation of predictions models for mortality of COVID-19 patients using machine learning method. Neural Comput. Appl.
**2021**, 1–10, Epub ahead of print. [Google Scholar] [CrossRef] - Giuliani, D.; Dickson, M.M.; Espa, G.; Santi, F. Modelling and predicting the spatio-temporal spread of COVID-19 in Italy. BMC Infect. Dis.
**2020**, 20, 700. [Google Scholar] [CrossRef] - Alboaneen, D.; Pranggono, B.; Alshammari, D.; Alqahtani, N.; Alyaffer, R. Predicting the Epidemiological Outbreak of the Coronavirus Disease 2019 (COVID-19) in Saudi Arabia. Int. J. Environ. Res. Public Health
**2020**, 17, 4568. [Google Scholar] [CrossRef] - Alsayed, A.; Sadir, H.; Kamil, R.; Sari, H. Prediction of Epidemic Peak and Infected Cases for COVID-19 Disease in Malaysia, 2020. Int. J. Environ. Res. Public Health
**2020**, 17, 4076. [Google Scholar] [CrossRef] - Jia, L.; Li, K.; Jiang, Y.; Guo, X.; Zhao, T. Prediction and Analysis of Coronavirus Disease 2019. arXiv
**2020**, arXiv:2003.05447. [Google Scholar] - Qin, L.; Sun, Q.; Wang, Y.; Wu, K.-F.; Chen, M.; Shia, B.-C.; Wu, S.-Y. Prediction of Number of Cases of 2019 Novel Coronavirus (COVID-19) Using Social Media Search Index. Int. J. Environ. Res. Public Health
**2020**, 17, 2365. [Google Scholar] [CrossRef][Green Version] - Ayyoubzadeh, S.M.; Ayyoubzadeh, S.M.; Zahedi, H.; Ahmadi, M.; Niakan Kalhori, S.R. Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study. JMIR Public Health Surveill.
**2020**, 6, e18828. [Google Scholar] [CrossRef] [PubMed] - Ganasegeran, K.; Ch’ng, A.S.H.; Looi, I. What Is the Estimated COVID-19 Reproduction Number and the Proportion of the Population That Needs to Be Immunized to Achieve Herd Immunity in Malaysia? A Mathematical Epidemiology Synthesis. COVID
**2021**, 1, 13–19. [Google Scholar] [CrossRef] - Park, S.W.; Champredon, D.; Weitz, J.S.; Dushoff, J. A practical generation-interval-based approach to inferring the strength of epidemics from their speed. Epidemics
**2019**, 27, 12–18. [Google Scholar] [CrossRef] - Caballer-Tarazona, M.; Moya-Clemente, I.; Vivas-Consuelo, D.; Barrachina-Martínez, I. A model to measure the efficiency of hospital performance. Math. Comput. Model.
**2010**, 52, 1095–1102. [Google Scholar] [CrossRef] - Benford, F. The Law of Anomalous Numbers. Proc. Am. Philos. Soc.
**1938**, 78, 551–572. Available online: http://www.jstor.org/stable/984802 (accessed on 20 January 2021). - Lee, K.-B.; Han, S.; Jeong, Y. COVID-19, flattening the curve, and Benford’s law. Phys. A
**2020**, 559, 125090. [Google Scholar] [CrossRef] - Maher, M.; Akers, M. Using Benfords Law to Detect Fraud in the Insurance Industry. Account. Fac. Res. Publ.
**2011**, 1, 1–4. [Google Scholar] [CrossRef][Green Version] - Cabeza García, P.M. Aplicación de la ley de Benford en la detección de fraudes. Rev. Univ. Soc.
**2019**, 11, 421–427. [Google Scholar] - Cerioli, A.; Barabesi, L.; Cerasa, A.; Menegatti, M.; Perrotta, D. Newcomb–Benford law and the detection of frauds in international trade. Proc. Natl. Acad. Sci. USA
**2019**, 116, 106–115. [Google Scholar] [CrossRef] [PubMed][Green Version] - Barabesi, L.; Cerasa, A.; Cerioli, A.; Perrotta, D. Goodness-of-Fit Testing for the Newcomb-Benford Law with Application to the Detection of Customs Fraud. J. Bus. Econ. Stat.
**2018**, 36, 346–358. [Google Scholar] [CrossRef] - Diekmann, A. Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientif ic Data. J. Appl. Stat.
**2007**, 34, 321–329. [Google Scholar] [CrossRef][Green Version] - Stoerk, T. Statistical corruption in Beijing’s air quality data has likely ended in 2012. Atmos. Environ.
**2016**, 127, 365–371. [Google Scholar] [CrossRef][Green Version] - Gómez-Camponovo, M.; Moreno, J.; Idrovo, Á.J.; Páez, M.; Achkar, M. Monitoring the Paraguayan epidemiological dengue surveillance system (2009–2011) using Benford’s law. Biomedica
**2016**, 36, 583–592. [Google Scholar] [CrossRef] [PubMed][Green Version] - Burlac, L.; Giannakis, N. Benford’s Law: Analysis of the Trustworthiness of COVID-19 Reporting in the Context of Different Political Regimes; School of Education, Culture and Communication, Mälardalen University: Vasteros, Sweden, 2021. [Google Scholar]
- Tasri, Y.D.; Tasri, E.S. Improving clinical records: Their role in decision-making and healthcare management—COVID-19 perspectives. Int. J. Healthc. Manag.
**2020**, 13, 325–336. [Google Scholar] [CrossRef] - Koch, C.; Okamura, K. Benford’s Law and COVID-19 reporting. Econ. Lett.
**2020**, 196, 109573. [Google Scholar] [CrossRef] [PubMed] - Iosa, M.; Paolucci, S.; Morone, G. Covid-19: A Dynamic Analysis of Fatality Risk in Italy. Front. Med.
**2020**, 7, 185. [Google Scholar] [CrossRef] - Armocida, B.; Formenti, B.; Ussai, S.; Palestra, F.; Missoni, E. The Italian health system and the COVID-19 challenge. Lancet Public Health
**2020**, 5, e253. [Google Scholar] [CrossRef] - Caballer-Tarazona, M.; Clemente-Collado, A.; Vivas-Consuelo, D. A cost and performance comparison of Public Private Partnership and public hospitals in Spain. Health Econ. Rev.
**2016**, 6, 17. [Google Scholar] [CrossRef][Green Version] - De Sanidad, M. BOE-A-2020-3953. «BOE» Núm. 78, de 21 de Marzo de 2020, Páginas 26505 a 26510 (6 Págs.). 2020. Available online: https://www.boe.es/diario_boe/txt.php?id=BOE-A-2020-3953 (accessed on 14 January 2021).
- Moreno Küstner, B. La información sanitaria se enreda en la informática. Gac. Sanit.
**2011**, 25, 343–344. [Google Scholar] [CrossRef][Green Version] - Gil-Suay, V. Los Sistemas de información sanitaria en el marco de un Sistema Nacional de Salud descentralizado. Arbor
**2005**, 180, 327–342. [Google Scholar] [CrossRef][Green Version] - Kazemitabar, J.; Kazemitabar, J. Measuring the conformity of distributions to Benford’s law. Commun. Stat.-Theory Methods
**2020**, 49, 3530–3536. [Google Scholar] [CrossRef] - Sambridge, M.; Tkalčić, H.; Jackson, A. Benford’s law in the natural sciences. Geophys. Res. Lett.
**2010**, 37. [Google Scholar] [CrossRef] - Giles, D.E. Benford’s law and naturally occurring prices in certain ebaY auctions. Appl. Econ. Lett.
**2007**, 14, 157–161. [Google Scholar] [CrossRef][Green Version] - Nigrini, M.J. A taxpayer compliance application of Benford’s Law. J. Am. Tax. Assoc.
**1996**, 18, 72. Available online: https://www.proquest.com/scholarly-journals/taxpayer-compliance-application-benfords-law/docview/211023799/se-2?accountid=14777 (accessed on 22 January 2021). - Goh, C. Applying visual analytics to fraud detection using Benford’s law. J. Corp. Account. Financ.
**2020**, 31, 202–208. [Google Scholar] [CrossRef] - Lesperance, M.; Reed, W.J.; Stephens, M.A.; Tsao, C.; Wilton, B. Assessing Conformance with Benford’s Law: Goodness-of-Fit Tests and Simultaneous Confidence Intervals. PLoS ONE
**2016**, 11, e0151235. [Google Scholar] [CrossRef] - Whyman, G.; Shulzinger, E.; Bormashenko, E. Intuitive considerations clarifying the origin and applicability of the Benford law. Results Phys.
**2016**, 6, 3–6. [Google Scholar] [CrossRef][Green Version] - DATADISTA. Coronavirus Disease 2019 (COVID-19) in Spain. 2020. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GPFFAQ (accessed on 23 January 2021).
- Silva, L.; Figueiredo Filho, D. Using Benford’s law to assess the quality of COVID-19 register data in Brazil. J. Public Health
**2021**, 43, 107–110. [Google Scholar] [CrossRef] - Roy, D.; Mukherjee, S.P. A Note on Characterisations of the Weibull Distribution. Sankhyā Indian J. Stat. Ser. A
**1986**, 48, 250–253. Available online: http://www.jstor.org/stable/25050594 (accessed on 27 January 2021). - Vazquez, A. Exact solution of infection dynamics with gamma distribution of generation intervals. Phys. Rev. E
**2021**, 103, 42306. [Google Scholar] [CrossRef] - Sevillano, E.; Linde, P. El Desbarajuste de las cifras del Coronavirus: Sanidad Rebaja en casi 2.000 las Muertes desde que Empezó la Pandemia; El País: Madrid, Spain, 2020. [Google Scholar]

**Figure 2.**Frequency distribution of the first digit of the number of deaths per day by COVID in Spain.

**Figure 3.**Frequency distribution of the first digit of the number of COVID deaths per day by AA.CC. (As in Figure 2, in this figure, the Y axes represent the frequency and X axes represent the first digit of the number of COVID deaths.).

Decision for Observed Data | If q_{0}_{:}_{95} > α for Data with Perturbations | If q_{0}_{:}_{95} < α for Data with Perturbations |
---|---|---|

H_{0} Fail to Reject | We can keep the decision | We cannot keep the decision |

H_{0} Reject | We cannot keep the decision | We can keep the decision |

ACs Code | Autonomous Communities (ACs) | χ^{2} ValueEstimator | χ^{2} Testp-Value | Mortality Rate | Mortality Rate Ranking |
---|---|---|---|---|---|

(×10^{5}) | |||||

9 | Cataluña | 291.947 | 0.000293 *** | 74.5 | 7 |

15 | Navarra | 217.510 | 0.005398 *** | 81.5 | 6 |

12 | Galicia | 214.966 | 0.005938 *** | 23.4 | 14 |

13 | Madrid | 195.582 | 0.012143 ** | 127.4 | 2 |

17 | La Rioja | 178.412 | 0.022448 ** | 116.2 | 4 |

7 | Castilla y León | 177.992 | 0.022782 ** | 117.2 | 3 |

11 | Extremadura | 169.880 | 0.030233 * | 49.2 | 10 |

8 | Castilla La Mancha | 164.449 | 0.036437 * | 143.4 | 1 |

2 | Aragón | 139.270 | 0.083687 * | 82.2 | 5 |

0 | Spain | 128.710 | 0.116364 | 60.9 | 9 |

1 | Andalucía | 118.706 | 0.157069 | 17.3 | 16 |

6 | Cantabria | 114.593 | 0.177005 | 36.0 | 11 |

10 | C. Valenciana | 98.378 | 0.276588 | 29.0 | 13 |

16 | País Vasco | 93.121 | 0.316654 | 70.9 | 8 |

4 | Baleares | 55.347 | 0.699181 | 19.8 | 15 |

3 | Asturias | 54.368 | 0.710025 | 32.8 | 12 |

14 | Murcia | 34.197 | 0.905324 | 10.0 | 17 |

_{0}at levels * 5%,** 3%, *** 1%.

AC | Initial Decision (for Observed Data) | q_{95%} | Final Decision (for Data with Perturbations) |
---|---|---|---|

Cataluña | Rejection | 0.00013 | Rejection |

Navarra | Rejection | 0.00610 | Rejection |

Madrid | Rejection | 0.00058 | Rejection |

La Rioja | Rejection | 0.00633 | Rejection |

Galicia | Rejection | 0.00583 | Rejection |

Castilla León | Rejection | 0.04743 | Rejection |

Spain | Fail to reject | 0.67138 | Fail to reject |

C. Valenciana | Fail to reject | 0.34756 | Fail to reject |

Andalucía | Fail to reject | 0.47747 | Fail to reject |

Cantabria | Fail to reject | 0.28642 | Fail to reject |

Baleares | Fail to reject | 0.33385 | Fail to reject |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Morillas-Jurado, F.G.; Caballer-Tarazona, M.; Caballer-Tarazona, V.
Applying Benford’s Law to Monitor Death Registration Data: A Management Tool for the COVID-19 Pandemic. *Mathematics* **2022**, *10*, 46.
https://doi.org/10.3390/math10010046

**AMA Style**

Morillas-Jurado FG, Caballer-Tarazona M, Caballer-Tarazona V.
Applying Benford’s Law to Monitor Death Registration Data: A Management Tool for the COVID-19 Pandemic. *Mathematics*. 2022; 10(1):46.
https://doi.org/10.3390/math10010046

**Chicago/Turabian Style**

Morillas-Jurado, Francisco Gabriel, María Caballer-Tarazona, and Vicent Caballer-Tarazona.
2022. "Applying Benford’s Law to Monitor Death Registration Data: A Management Tool for the COVID-19 Pandemic" *Mathematics* 10, no. 1: 46.
https://doi.org/10.3390/math10010046