# A Software Tool for Estimating Uncertainty of Bayesian Posterior Probability for Disease

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Diagnosis in Medicine

#### 1.1.1. Bayes’ Theorem in Medical Diagnostics

#### 1.1.2. Challenges in Applying Bayesian Inference

#### Computational Complexity

#### Statistical Distributions in Diagnostics

#### Uncertainty of Bayesian Posterior Probabilities

#### 1.1.3. Quantifying Uncertainty in Diagnostics

#### Combined Uncertainty

#### Measurement Uncertainty

#### Sampling Uncertainty

## 2. Methods

#### 2.1. Computational Methods

#### 2.1.1. Bayes’ Theorem

**θ**, as follows:

#### 2.1.2. Parametric Distributions

- Normal distribution
- Lognormal distribution
- Gamma distribution.

#### 2.1.3. Uncertainty Quantification

#### Measurement Uncertainty

#### Sampling Uncertainties of Means and Standard Deviations

#### Sampling Uncertainty of Prior Probability for Disease

#### Combined Uncertainty of Posterior Probability for Disease

#### 2.1.4. Expanded Uncertainty of Posterior Probability for Disease

#### 2.2. The Software

#### 2.2.1. Program Overview

Ver. 13.3 (Wolfram Research, Inc., Champaign, IL, USA). Bayesian Diagnostic Uncertainty was designed to estimate and plot the standard sampling, measurement, and combined uncertainty and the confidence intervals of the Bayesian posterior probability for disease of a screening or diagnostic test (See Figure 1).

(Wolfram Research, Inc., Champaign, IL, USA (2023)) or Wolfram Mathematica

(see Appendix A.2). Due to the complexity of the calculations required, it is computationally intensive.

#### 2.2.2. Input Parameters

- Normal distribution
- Lognormal distribution
- Gamma distribution.

#### Measurement Uncertainty

#### 2.2.3. Output Specifications

#### Visualizations

- Uncertainty of posterior probability for disease: Plots are generated to show the standard sampling, measurement, and combined uncertainty of the posterior probability for disease.
- Relative uncertainty of posterior probability for disease: Plots are generated to show the relative standard sampling, measurement, and combined uncertainty of the posterior probability for disease.
- Confidence intervals of posterior probability for disease: Plots are generated to show the confidence intervals of the posterior probability for disease, for a user defined confidence level.

#### Tables

- The standard sampling, measurement, and combined uncertainty of the posterior probability for disease.
- The relative standard sampling, measurement, and combined uncertainty of the posterior probability for disease.
- The confidence intervals of the posterior probability for disease for a user defined confidence level.

## 3. Illustrative Case Study

- Valid FPG and OGTT results (n = 13,836).
- A negative response to NHANES question DIQ010 regarding a diabetes diagnosis [34] (n = 13,465).
- Age 70–80 years (n = 976).

## 4. Results

- It is substantially affected by measurement uncertainty of FPG.
- Two local maxima are observed, corresponding to the regions near the steepest segments of the posterior probability curve, which exhibits an approximately double sigmoidal configuration. These maxima are quantitatively defined as following:
- 2.1.
- At an FPG value of 58.7 mg/dL, the posterior probability for disease is equal to 0.585, while the combined standard uncertainty is equal to 0.893.
- 2.2.
- At an FPG value of 133.2 mg/dL, the posterior probability for disease is equal to 0.725, while the combined standard uncertainty is equal to 0.182.

- At an FPG value of 64.1 mg/dL, the posterior probability for disease is equal to 0.257, while the relative combined standard uncertainty is equal to 2.044.
- At an FPG value of 128.1 mg/dL, the posterior probability for disease is equal to 0.561, while the relative combined standard uncertainty is equal to 0.278.

## 5. Discussion

#### 5.1. Reevaluation of Traditional Diagnostic Methods

- Cardiac troponin for diagnosing myocardial injury and infarction [40];
- Natriuretic peptides for the diagnosis of heart failure [41];
- D-dimer for diagnosing thromboembolic events [42];
- FPG, OGTT, and glycated hemoglobin (HbA1c) for diagnosing diabetes [31];
- OGTT for the diagnosis of gestational diabetes [43];
- Thyroid stimulating hormone (TSH), free serum triiodothyronine (T
_{3}), and free serum thyroxine (T_{4}) for diagnosing thyroid dysfunction [44]; - Protein-to-creatinine ratio for the diagnosis of preeclampsia [45];
- Creatinine or cystatin C derived glomerular filtration rate (GFR), and albuminuria for diagnosing chronic kidney disease [46].

#### 5.2. Limitations of the Program

- Underlying assumptions:
- 1.1.
- 1.2.
- The hypothesis of parametric distribution of measurements or their transformations. However, existing literature underlines the robustness of nonparametric techniques in capturing complex data distributions [55].
- 1.3.

- 2.
- The use of first-order Taylor series approximations in uncertainty propagation calculations, where higher-order approximations may provide more accurate estimations [15].
- 3.
- The approximation of the uncertainty of the prior probability for disease using the Agresti–Coull-adjusted Waldo interval, despite more accurate methods being available [58].
- 4.
- 5.

#### 5.3. Limitations of the Case Study

#### 5.4. Challenges in Bayesian Analysis for Disease Diagnosis

#### 5.5. Implications of Incomplete Data

- Over-reliance on prior probabilities: Limited empirical data may cause an overdependence on prior probabilities, leading to distorted posterior probabilities and potentially flawed clinical decisions [73].
- Increased uncertainty: Insufficient data amplifies the uncertainty of computed posterior probabilities, which in turn could exacerbate clinical indecision [74].
- Bias risks: Unrepresentative datasets could introduce systemic bias, increasing the uncertainty in Bayesian computations [5].

#### 5.6. Analysis of the Double Sigmoidal Curve in Posterior Probability Estimation and Its Impact on Uncertainty

#### 5.7. Software Comparison

ver. 0.20.0, Mathematica

ver. 14.0, Matlab

ver. R2023b, MedCalc

ver. 20.2.1, metRology ver. 2023, NCSS

ver. 24.0.0, NIST Uncertainty Machine ver. 2.0.0, OpenBUGS ver. 3.3.0, R ver. 4.3.1, SAS

ver. 9.5, SPSS

ver. 29, Stan ver. 2.33.0, Stata

ver. 19, and UQLab ver. 2.0.0) provides this range of plots and tables without requiring advanced programming.

## 6. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A

#### Appendix A.1. Formalisms and Notation

- Acronyms

- Notation

- Bayes’ Theorem

**θ**is calculated as:

#### Appendix A.1.1. Parametric Distributions

- The domain of a random variable X following a normal distribution is the set of all real numbers, denoting $-\infty <X<\infty $.
- The domain of a random variable X following a lognormal distribution is the set of all positive real numbers, denoting $0<X<\infty $.
- The domain of a random variable X following a gamma distribution is the set of all positive real numbers, denoting $0<X<\infty $.

#### Appendix A.1.2. Calculations of the Posterior Probability for Disease and Its Uncertainty

#### Appendix A.2. Software Availability and Requirements

ver. 12.0+ is required, freely available at: https://www.wolfram.com/player/ (accessed 18 December 2023) or Wolfram Mathematica

ver. 12.0+

i9™ or equivalent CPU and 32 GB of RAM

#### Appendix A.3. A Note about the Program

- About the Program Controls

- Range of input parameters

**Figure 1.**A simplified flowchart of the program Bayesian Diagnostic Uncertainty with the number of input parameters and of output types for each submodule.

**Figure 3.**The estimated PDF of the FPG (mg/dL) in diabetic participants, assuming a lognormal distribution and negligible measurement uncertainty, and the histogram of the respective NHANES dataset, with the parameters of the distribution in Table 2.

**Figure 4.**The estimated PDF of the FPG (mg/dL) in nondiabetic participants, assuming a lognormal distribution and negligible measurement uncertainty, and the histogram of the respective NHANES dataset, with the parameters of the distribution in Table 2.

**Figure 5.**Standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus FPG curve plot, with the settings of the program in Table 2.

**Figure 6.**Relative standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus FPG curve plot, with the settings of the program in Table 2.

**Figure 7.**Confidence intervals of the posterior probability for diabetes versus FPG curves plot, with the settings of the program in Table 2.

**Figure 8.**Standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus measurement uncertainty constant contribution ${b}_{0}$ curve plot, with the settings of the program in Table 2.

**Figure 9.**Relative standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus measurement uncertainty constant contribution ${b}_{0}$ curve plot, with the settings of the program in Table 2.

**Figure 10.**Confidence intervals of the posterior probability for diabetes versus measurement uncertainty constant contribution ${b}_{0}$ curves plot, with the settings of the program in Table 2.

**Figure 11.**Standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus measurement uncertainty proportionality constant ${b}_{1}$ curve plot, with the settings of the program in Table 2.

**Figure 12.**Relative standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus measurement uncertainty proportionality constant ${b}_{1}$ curve plot, with the settings of the program in Table 2.

**Figure 13.**Confidence intervals of the posterior probability for diabetes versus measurement uncertainty proportionality constant ${b}_{1}$ curves plot, with the settings of the program in Table 2.

**Figure 14.**Standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus total population sample size n curve plot, with the settings of the program in Table 2.

**Figure 15.**Relative standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus total population sample size n curve plot, with the settings of the program in Table 2.

**Figure 16.**Confidence intervals of the posterior probability for diabetes versus total population sample size n curves plot, with the settings of the program in Table 2.

**Figure 17.**Table of the standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes, with the settings of the program in Table 2.

**Figure 18.**Table of the relative standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes, with the settings of the program in Table 2.

**Figure 19.**Confidence intervals of the posterior probability for diabetes, with the settings of the program in Table 2.

Diabetic Participants | Nondiabetic Participants | |
---|---|---|

n | 154 | 822 |

Mean | 120.7 | 102.6 |

Median | 117.0 | 102.0 |

Standard Deviation | 19.1 | 10.9 |

Skewness | 1.448 | 0.523 |

Kurtosis | 6.354 | 3.445 |

**Table 2.**Descriptive statistics of the estimated lognormal distributions of the diabetic and nondiabetic populations.

Diabetic Participants | Nondiabetic Participants | |||
---|---|---|---|---|

Estimated Distribution | ${L}_{D}$ | ${l}_{D}$ | ${L}_{\overline{D}}$ | ${l}_{\overline{D}}$ |

Mean Uncertainty | 1.586 | 0 | 1.028 | 0 |

Mean | 120.7 | 120.7 | 102.6 | 102.6 |

Median | 119.4 | 119.4 | 102.1 | 102.1 |

Standard Deviation | 17.8 | 17.7 | 10.9 | 10.7 |

Skewness | 0.446 | 0.444 | 0.315 | 0.312 |

Kurtosis | 3.355 | 3.352 | 3.177 | 3.174 |

p-value (Cramér–von Mises test) | 0.294 | 0.295 | 0.281 | 0.299 |

Settings | Figure 5 and Figure 6 | Figure 7 | Figure 8 and Figure 9 | Figure 10 | Figure 11 and Figure 12 | Figure 13 | Figure 14 and Figure 15 | Figure 16 | Figure 17 and Figure 18 | Figure 19 |
---|---|---|---|---|---|---|---|---|---|---|

p | - | 0.95 | - | 0.95 | - | 0.95 | - | 0.95 | - | 0.95 |

x | 31.0–192.0 | 31.0–192.0 | 126.0 | 126.0 | 126.0 | 126.0 | 126.0 | 126.0 | 126.0 | 126.0 |

${\mu}_{D}$ | 120.7 | 120.7 | 120.7 | 120.7 | 120.7 | 120.7 | 120.7 | 120.7 | 120.7 | 120.7 |

${\sigma}_{D}$ | 17.7 | 17.7 | 17.7 | 17.7 | 17.7 | 17.7 | 17.7 | 17.7 | 17.7 | 17.7 |

${n}_{D}$ | 154 | 154 | 154 | 154 | 154 | 154 | - | - | 154 | 154 |

${\mu}_{\overline{D}}$ | 102.7 | 102.7 | 102.7 | 102.7 | 102.7 | 102.7 | 102.7 | 102.7 | 102.7 | 102.7 |

${\sigma}_{\overline{D}}$ | 10.7 | 10.7 | 10.7 | 10.7 | 10.7 | 10.7 | 10.7 | 10.7 | 10.7 | 10.7 |

${n}_{\overline{D}}$ | 822 | 822 | 822 | 822 | 822 | 822 | - | - | 822 | 822 |

n | - | - | - | - | - | - | 65–5000 | 65–5000 | - | - |

r | - | - | - | - | - | - | 0.158 | 0.158 | - | - |

${b}_{0}$ | 0.866 | 0.866 | 0.0–0.161 | 0.0–0.161 | 0.866 | 0.866 | 0.866 | 0.866 | 0.866 | 0.866 |

${b}_{1}$ | 0.0109 | 0.0109 | 0.0109 | 0.0109 | 0.0–0.1 | 0.0–0.1 | 0.0109 | 0.0109 | 0.0109 | 0.0109 |

${n}_{U}$ | - | 1350 | - | 1350 | - | 1350 | - | 1350 | - | 1350 |

${l}_{D}$ | lognormal | lognormal | lognormal | lognormal | lognormal | lognormal | lognormal | lognormal | normal lognormal gamma | normal lognormal gamma |

${l}_{\overline{D}}$ | lognormal | lognormal | lognormal | lognormal | lognormal | lognormal | lognormal | lognormal | normal lognormal gamma | normal lognormal gamma |

