# Gout Staging Diagnosis Method Based on Deep Reinforcement Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Applications of Machine Learning in the Medical Field

#### 2.2. Reinforcement Learning

## 3. Gout Staging Diagnosis Method Based on Deep Reinforcement Learning

#### 3.1. Overall Process Design

#### 3.2. Data Preprocessing

#### 3.3. Gout Diagnosis

_{1}of the gout quantitative assignment table has a decisive impact on whether gout is diagnosed, and f

_{1}is the set of feature values of the detection fields corresponding to the table. The medical record data that meet the diagnosis conditions in F1 in the “healthy” data set are found, and this part of the data is added to the data set “diagnosed” as gout to form the original data set S3 for further staging.

#### 3.4. Deep Reinforcement Learning Model for Gout Staging Diagnosis

- According to the characteristics of the staging diagnosis model currently used, the agent forms a hyperparameter set of all hyperparameter configuration spaces, that is, the current state ${S}_{t}$ at time t. Assuming that there are n hyperparameters in the hyperparameter set, the state is ${S}_{\mathrm{t}}={\{\mathrm{s}}_{1}{,\mathrm{s}}_{2}{,\mathrm{s}}_{3},\dots {,\mathrm{s}}_{\mathrm{n}}\}$.
- A certain hyperparameter of the hyperparameter space set St is adjusted; that is, use the ε-greedy algorithm to make an action selection ${A}_{t}$ and then obtain the F1-Score value of the staging diagnosis model under the hyperparameter combination at this moment, which is recorded as ${F}_{t}$, and set the staging at the previous moment. The F1-Score value of the diagnostic model is ${F}_{t-1}$, then the reward value generated by the environment at time t is ${R}_{t}={F}_{t}-{F}_{t-1}$. State ${S}_{t}$ transitions to ${S}_{t}{}_{-1}$ after action ${A}_{t}$ is complete.
- The above steps are repeated until the cumulative discount reward is the largest.

_{t}represents the immediate reward obtained by the agent at time step t. The sum of cumulative discounted rewards in the formula represents the sum of rewards obtained by the agent from the initial state, performing a series of actions until the end. The immediate reward ${r}_{t}$ at each time step is multiplied by the discount factor $\gamma $; $\gamma $ represents the attenuation of future rewards, because the reward after each new action is executed cannot be determined. If the reward is not discounted for each time, the total reward sum will approach infinity with the increase in time, making learning enter an endless loop.

## 4. Experiment and Result Analysis

#### 4.1. Evaluation Index

#### 4.2. Experimental Results and Analysis

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Liu, S.; Ngiam, K.Y.; Feng, M. Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey. arXiv
**2019**, arXiv:1907.09475. [Google Scholar] - Qigang, L.; Keyan, Z.; Carlos, D.B.; Ma, X.; Wong, W.H. Xrare: A machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis. Genet. Med.
**2019**, 21, 2126–2134. [Google Scholar] - Hsu, C.H.; Chen, X.; Lin, W.; Jiang, C.; Zhang, Y.; Hao, Z.; Chung, Y.C. Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning. Measurement
**2021**, 175, 109145. [Google Scholar] [CrossRef] - Hu, M.; Chen, X.; Sun, Y.; Shen, X.; Wang, X.; Yu, T.; Mei, X.; Xiao, L.; Cheng, W.; Yang, J.; et al. Disease Prediction Model Based on Dynamic Sampling and Transfer Learning. Chin. J. Comput.
**2019**, 42, 2339–2354. [Google Scholar] - Komorowski, M.; Celi, L.A.; Badawi, O.; Gordon, A.C.; Faisal, A.A. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med.
**2018**, 24, 1716–1720. [Google Scholar] [CrossRef] [PubMed] - Ghesu, F.C.; Georgescu, B.; Grbic, S.; Maier, A.; Hornegger, J.; Comaniciu, D. Towards intelligent robust detection of anatomical structures in incomplete volumetric data. ScienceDirect
**2018**, 48, 203–213. [Google Scholar] - Clifton, J.; Laber, E. Q-Learning: Theory and Applications. Annu. Rev.
**2020**, 7, 279–301. [Google Scholar] - Qi, X.; Zhao, L.; Bao, Z. Hyper-parameter Optimization of Neural Networks using Q-learning. In Proceedings of the 40th Chinese Control Conference, Shanghai, China, 26–28 July 2021; pp. 9022–9024. [Google Scholar]
- Botvinick, M.; Ritter, S.; Wang, J.X.; Kurth-Nelson, Z.; Blundell, C.; Hassabis, D. Reinforcement Learning, Fast and Slow. ScienceDirect
**2019**, 23, 408–416. [Google Scholar] [CrossRef] [Green Version] - Wu, J.; Chen, S.-P.; Chen, X.-Y. Reinforcement Learning for Model Selection and Hyperparameter Optimization. J. Univ. Electron. Sci. Technol. China
**2020**, 48, 256–259. [Google Scholar] - Ali, J.; Aldhaifallah, M.; Nisar, K.S.; Aljabr, A.A.; Tanveer, M. Regularized Least Squares Twin SVM for Multiclass Classification. Big Data Res.
**2022**, 27, 100295. [Google Scholar] [CrossRef] - Li, Z.; Liu, Z. Feature selection algorithm based on XGBoost. J. Commun.
**2019**, 40, 101–104. [Google Scholar] - Qi, X.; Zhao, L.; Ban, X. Optimization of Neural Network Hyperparameters Based on Q-learning. In Proceedings of the 40th China Control Conference, Shanghai, China, 26–28 July 2021; pp. 759–762. [Google Scholar]
- Zhu, F.; Wu, W.; Fu, Y.; Liu, Q. Safe Deep Reinforcement Learning Method Based on Dual Deep Networks. J. Comput.
**2019**, 42, 1812–1826. [Google Scholar]

**Figure 3.**Comparison before and after hyperparameter adjustment: (

**a**) precision; (

**b**) recall; (

**c**) F1-Score.

**Figure 4.**Effect comparison of deep reinforcement learning models for disease staging diagnosis: (

**a**) precision; (

**b**) recall; (

**c**) F1-Score.

Gout Diagnostic Rules Correct Dichotomous Results |
---|

Input: The “disease” data set ${\mathrm{S}}_{1}$, the “healthy” data set ${\mathrm{S}}_{2}$ in the binary classification results and the gout diagnosis rules provided by doctors |

Output: Data set for multi-classification ${\mathrm{S}}_{3}$ |

1: According to the gout diagnostic rules provided by the doctor, the feature value set F′ that has a decisive impact on gout is screened out. |

2: In the “healthy” data set ${\mathrm{S}}_{2}$, select the data sample ${\mathrm{S}}_{2}^{\prime}$ with gout on F′. |

3: Merge the data sample ${\mathrm{S}}_{2}^{\prime}$ into the “disease” data set ${\mathrm{S}}_{1}$ to form a multi-classification sample ${\mathrm{S}}_{3}$. |

Missing Value Filling of Medical Record Data Set |
---|

Input: Medical record data set |

Output: Medical record data set with missing values filled |

1: Calculate the missing degree of all eigenvalues in the medical record data set: P = {p_{1}, p_{2}, p_{3}, …, p_{n}}, where p represents the magnitude of the missing degree, and n represents the eigenvalue index. |

2: if (p_{n} > 70%) |

3: The eigenvalue has a large missing degree, delete the eigenvalue |

4: else if (n is completely missing at random or missing at random) |

5: The method of estimation is used for filling, specifically mean value filling and mode filling. |

6: else use the random forest filling method to fill |

Type | Quantity | |
---|---|---|

Healthy | 4043 | |

Diagnosed with gout | chronic arthritis | 8115 |

acute arthritis | 1800 | |

intermission | 10,914 |

Staging Diagnostic Method for Multi-Clinical-Stage Disease |
---|

Input: discount factor γ, initial state St, total rounds of training E, limit the number of explorations M, number of tasks T |

Output: The combination of hyperparameters that performed best for the classification model Hmax |

1: Initialize the optimal reward value R_{max} = 0; Initialize the memory pool DH_{max} = S_{t} |

2: for e = 1 to E do |

3: Initialize the cumulative sum of rewards for this round R = 0 |

4: for t = 1 to T do: |

5: ε-greedy choose At |

6: ${S}_{t+1}\leftarrow change({S}_{t},{A}_{t})$//The environment state transitions to ${S}_{t+1}$ after the behavior ${A}_{t}$ |

7: train the staging model, |

8: obtain the accuracy Ft after k-fold cross-validation |

9: ${R}_{t}={F}_{t}-{F}_{t-1}$ |

10: $\mathrm{save}\text{}Q({S}_{t},{A}_{t},{R}_{t},{S}_{t+1})$ to D |

11: $R=R+{R}_{t}$ |

12: if (t > M) then |

13: Randomly select k samples from memory pool D |

14: update Q network//according to Formula (14) |

15: end if |

16: end for |

17: if (R > Rmax) then |

18: Rmax = R |

19: Hmax = St |

20: end if |

21: end for |

22: return Hmax |

**Table 5.**Performance of “candidate binary classification model library” on binary classification tasks.

Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|

KNN | 80.25% | 82.13% | 81.57% | 81.57% |

SVM | 80.71% | 81.21% | 81.85% | 81.85% |

XGBoost | 90.03% | 91.21% | 89.71% | 89.71% |

Parameter Name | Meaning |
---|---|

learning_rate | The learning rate |

gamma | The lowest value of the loss function, which determines whether to prune |

max_depth | The maximum depth of the decision tree |

min_child_weight | The weight sum of the leaf nodes of the decision tree |

subsample | Sample proportion for random sampling in decision trees |

colsample_bytree | Proportion of randomly sampled features in a decision tree |

scale_pos_weight | Help the model converge quickly when the sample is unbalanced |

lambda | L2 regularization penalty coefficient |

alpha | L1 regularization penalty coefficient |

Model | Time | Accuracy |
---|---|---|

XGBoost–DRL | Before k-fold | 87.82% |

After k-fold | 86.85% | |

MLP-DRL | Before k-fold | 77.26% |

After k-fold | 76.15% | |

DNN-DRL | Before k-fold | 74.33% |

After k-fold | 73.96% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ma, C.; Pan, C.; Ye, Z.; Ren, H.; Huang, H.; Qu, J.
Gout Staging Diagnosis Method Based on Deep Reinforcement Learning. *Processes* **2023**, *11*, 2450.
https://doi.org/10.3390/pr11082450

**AMA Style**

Ma C, Pan C, Ye Z, Ren H, Huang H, Qu J.
Gout Staging Diagnosis Method Based on Deep Reinforcement Learning. *Processes*. 2023; 11(8):2450.
https://doi.org/10.3390/pr11082450

**Chicago/Turabian Style**

Ma, Chao, Changgang Pan, Zi Ye, Hanbin Ren, Hai Huang, and Jiaxing Qu.
2023. "Gout Staging Diagnosis Method Based on Deep Reinforcement Learning" *Processes* 11, no. 8: 2450.
https://doi.org/10.3390/pr11082450