Machine-Learning-Based Digital Twin System for Predicting the Progression of Prostate Cancer

Kim, Jae-Kwon; Lee, Sun-Jung; Hong, Sung-Hoo; Choi, In-Young

doi:10.3390/app12168156

Open AccessArticle

Machine-Learning-Based Digital Twin System for Predicting the Progression of Prostate Cancer

by

Jae-Kwon Kim

¹

,

Sun-Jung Lee

^1,2,

Sung-Hoo Hong

³ and

In-Young Choi

^1,2,*

¹

Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea

²

Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea

³

Department of Urology, Seoul St. Mary’s Hospital, The Catholic University of Korea College of Medicine, Seoul 06591, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(16), 8156; https://doi.org/10.3390/app12168156

Submission received: 17 June 2022 / Revised: 1 August 2022 / Accepted: 12 August 2022 / Published: 15 August 2022

(This article belongs to the Special Issue Advanced Machine Learning in Medical Informatics)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

This study covered a pathological staging and biochemical recurrence prediction method using machine learning to design a prostate cancer process based on a digital twin.

Abstract

Clinical decision support systems (CDSSs) enable users to make decisions based on clinical data from electronic medical records, facilitating personalized precision medicine treatments. A digital twin (DT) approach enables the interoperability between physical and virtual environments through data analysis using machine learning (ML). By combining DT with the prostate cancer (PCa) process, it is possible to predict cancer prognosis. In this study, we propose a DT-based prediction model for clinical decision-making in the PCa process. Pathology and biochemical recurrence (BCR) were predicted with ML using data from a clinical data warehouse and the PCa process. The DT model was developed using data from 404 patients. The BCR prediction accuracy increased according to the amount of data used, and reached as high as 96.25% when all data were used. The proposed DT-based predictive model can help provide a clinical decision support system for PCa. Further, it can be used to improve medical processes, promote health, and reduce medical costs and problems.

Keywords:

digital twin; machine learning; prostate cancer; pathology stage; biochemical recurrence

1. Introduction

Clinical decision-making involves weighing clinical decisions according to the patient’s condition and treatment method. CDSSs, which aim to provide personalized services to clinicians, staff, and patients in the medical field, have been the subject of continuous research for several years [1]. Some CDSS studies have provided promising results regarding technical sophistication, and some systems are being used in clinical practice [2,3]. In other words, users can make decisions based on clinical data from electronic medical records (EMRs) similar to that of a target patient, ultimately reducing costs and improving medical quality [4]. Thus, CDSSs have been implemented to enable personalized precision medicine treatments [5,6].

Machine learning removes individual subjectivity and has the advantage of logically basing various methods and decisions on the data of a specific patient [7]. The reasons for a particular decision being made by such a system should be transparent to experts [8]. The combination of DT and healthcare enables monitoring, understanding, and optimization of all patient functions and can provide insight for continuous health and quality-of-life improvement [9]. Moreover, it can enable interoperability between physical and virtual environments through data analysis using artificial intelligence [10].

Therefore, a DT approach, as a precision medicine technique, is considered to provide the best treatment by providing and utilizing decisive evidence from the results learned from the data [11]. Further, this approach is very advantageous in terms of effective disease prevention and low disease burden as the patient and the attending physician can select the optimal decision [12,13].

Prostate cancer is a common cancer among men worldwide [14]. Currently, research into the introduction of ML technology continues. Guidelines for the diagnosis, staging, and treatment are constantly changing, and the cycle of changing treatment guidelines is shortening [15,16]. Consequently, clinicians find it difficult to apply new medical guidelines because they must learn new, constantly changing medical guidelines [17]. DT, therefore, is useful for providing personalized treatment and addressing these limitations. DT can use ML to predict the survival status of new patients using similar patterns from EMR data [18].

Surgery for PCa is decided according to the biopsy and clinical stage, the progress of PCa, and whether BCR can be determined [19,20,21]. Therefore, a system that supports pathology and BCR prediction using biopsy data is required. In a CDSS using DT, biopsy information is input virtually, and the pathology and BCR can be predicted [22]. Therefore, because the progress of the cancer can be predicted, doctors can make optimal decisions, thereby minimizing costs and providing precision medical services [23].

In this study, we propose a DT system to provide a treatment recommendation method for clinical decision-making in the PCa process. In the following section, the framework of the proposed DT system is presented, and the structure of the system is described. Furthermore, an ML method is described for providing a treatment method. A virtual model was constructed according to the PCa process, and the patient’s treatment results were learned with ML using data from a clinical data warehouse (CDW). Finally, we developed the process and method of the DT-based model, which is an ML framework and is applied to the PCa process. Section 3 provides the results for the performance of the proposed ML system and DT-based predictive model. Section 4 presents the discussion, and Section 5 provides the conclusion.

2. Materials and Methods

2.1. DT Framework

The DT framework is a configuration diagram of a treatment recommendation decision support system for PCa patients [24,25,26]. The DT framework can be used to determine the progression of PCa. The proposed DT framework consists of four parts, as illustrated in Figure 1.

The first component is the physical model with data obtained from patients with PCa. The data consist of demography, radical, biopsy, and pathology information. EMR data from the hospital are used. The physician’s readings for the images, CT, and MRI are stored in the EMRs. Further, blood data are also stored in the form of EMRs. The DT dataset is the second component; it contains the preprocessed CDW data obtained from the physical model and, within it, the PCa data model and prediction results are also stored. The third component is the virtual model; it is a replica model similar to the PCa treatment process and DT-based prediction model in which the ML model receives the input data alongside the PCa process. The final component is the DT service; it provides information on the progress of the PCa process when specific data are input.

The concept of the virtual model is illustrated in Figure 2. The virtual model consists of the DT process and ML model. The DT process is configured using a physical process, and consists of a specific flow of data regarding the operation of the physical process. ML is responsible for processing the detailed components of the DT process, and consists of equations necessary for calculations [27,28,29].

2.2. DT Process of PCa

The DT process for PCa is illustrated in Figure 3. The business process considers the progression of PCa, and the DT process considers the flow of data [30,31]. The real and virtual models were configured similarly. The DT process for PCa was predicted in two stages. First, the pathology was predicted using demographic and biopsy information. Pathology is the data generated after cancer surgery that can be used to predict current cancer progression. The input variables were age, body mass index (BMI), initial prostate-specific antigen (PSA) level, clinical T stage, and Gleason score (primary and secondary). The predicted output variables were the pathology stage, pathology Gleason score (SUM), extracapsular extension (ECE), seminal vesicle (SVI), PI, LI, and SMA. Further, BCR was predicted using demography and predicted pathology information. The input variables were age, BMI, initial PSA level, pathology stage, pathology Gleason score (SUM), ECE, SVI, PI, LI, and SMA. The output variable was the BCR. In this process, the prediction was performed step-by-step, and the predicted value from one process was used as an input value for the subsequent process.

2.3. Dataset

In this study, data from a CDW from tertiary medical institutions were used. A DT model was constructed using the data from a CDW of a cohort of patients with PCa. From the data of 3024 patients, data of 404 patients were used, excluding data such as missing values and noise. The data from the CDW included age, BMI, initial PSA, clinical T stage, Gleason score (primary and secondary), pathological T stage, pathology Gleason score (SUM), ECE, SVI, perineural invasion (PNI), lymph node metastasis (LVI), surgical margin (SM), and BCR data. BCR was defined as a case in which the plasma concentration of PSA increased in patients with PCa who have completed RT treatment, and whose PSA value exceeded 0.2 ng/mL [32].

Table 1 shows the data of the 404 patients with PCa. To build a learning model and DT process, pathology data were used simultaneously as both the input and output. The output data of a specific learning model were the input data for the next model.

2.4. ML Model

To implement the DT model, ML was performed in two stages (pathology and BCR), and a total of eight models were created. The eight models were required to predict pathology and one model was required to predict BCR. In this study, a predictive model was constructed using ML. The structure of the model is illustrated in Figure 4. In the pathology phase, the models were trained using six input data. In the BCR phase, one model was trained using ten input data. The trained model could be used as the prediction model of the DT frame. The software used for ML was Python, TensorFlow, and Keras.

2.5. DT-Based Prediction Model

Combining the PCa process with ML created a DT-based predictive model. Figure 5 shows the method used to construct the DT-based prediction model. Using the backward chaining method, the ML model was designed in reverse order from the last to the first process. The proposed model was completed by removing unnecessary attributes to increase the accuracy of the last inferred BCR.

Step 1. The priority of the data to be predicted was determined based on the process configuration. The priorities were arranged sequentially in the waiting pool.

Step 2. The input variables for the learning model were determined using a feature selection algorithm and information gain (IG) [33]. The software used for the IG algorithm was Java SE 17 and Weka 3.6.3. The feature selection method is illustrated in Figure 6.

The rank of the attributes was determined using the IG method. The ML model was trained by removing the attributes with the lowest rank one after the other. The ML algorithm used a random forest (RF) [34]. When the model with the highest accuracy was generated during the training, the training was terminated. The completed model was placed into the complete pool. In the next order, the highest-ranked data became the priority of the next learning model.

Step 3. The priority of the next model was trained according to the data priority of the waiting pool. The model creation proceeded as in Step 2.

Step 4. If the priorities overlapped, the model with the highest rank among the attributes of the last completed model was determined to be the next priority. The model creation was the same as in Step 2.

Step 5. Steps 2, 3, and 4 were repeated until all models were complete.

Step 6. The process ended when all of the models were completed and constituted a DT-based prediction model. The configuration started with the input data, and was then connected to the last predictive model.

The DT-based prediction model consists of the patient’s biopsy data as input data. When input data are input to the DT-based model, they become an input to the last generated machine learning model. The output results are then sequentially input to the connected predictive model. Finally, pathology and BCR are output according to the biopsy data.

3. Results

The performances of the proposed ML system and DT-based predictive model were measured. The software used for the DT prediction model was JavaSE 17, Weka 3.6.3, Python, TensorFlow, and Keras. The algorithms used were as follows: logistic regression, back propagation, Bayesian network (BN), support vector machine, RF, recurrent neural network (RNN), and long short-term memory (LSTM). The parameters and settings of each ML method were arbitrarily set with the highest accuracy.

3.1. DT-Based Prediction Model Implementation

Figure 7 and Table 2 show the results of implementing the 404 datasets and the RF algorithm to develop the DT-based prediction model. The learning model implementation started with BCR, and all pathological data were implemented. Each model was implemented by measuring the accuracy of the 10-k fold. The proposed model predicts pathological T stage and BCR according to the input demographic and biopsy data. In other words, multiple processes were predicted using basic data. Accordingly, the user of the proposed model can examine the future of the PCa process through a simulation.

3.2. Performance Measure

The experiment was assessed in three ways. First, we evaluated the eight models configured in the DT-based prediction model. Second, we compared the proposed method with a traditional method. Finally, we measured the predictive performance of pathological T stage and BCR. The transitional method uses one machine learning model based on multiple inputs and one output. For instance, in the case of the pathology model, demography and biopsy are input data, and the output is a binary classification pathology or not.

The performance of the eight ML models configured in the DT-based prediction was measured. For the experiment, a 10-fold cross validation was used as the dataset was small. The measurement performance indicator was accuracy [35]. The detailed parameters of the ML models of the comparison group were configured as shown in Table 3. Table 4 lists the experimental results for each ML method.

Among all of the algorithms, RF had the highest performance, with an average of approximately 79.89%. RNN and LSTM, which are deep learning techniques, exhibited low performance, with averages of 47.4% and 77.0%, respectively. RF was the most suitable for the dataset used in this study. Among the models, the average accuracies of SVI and BCR were the highest at 83.91% and 82.67%, respectively.

The performances of the proposed and traditional methods were compared. The traditional method was trained using all of the attributes. The traditional method also used 10-fold cross validation, and the accuracy was determined. Figure 8 shows the average comparison between the DT-based prediction model and traditional method. The average accuracies of the proposed and traditional methods were 72.24% and 72.23%, respectively, and there was no significant difference. Both methods had the highest RF accuracies (79.80% and 79.89%, respectively). In other words, the algorithm did not differ significantly in the analysis results. For model comparison, the overall average of the proposed method was 78.89%, and that of the traditional method was 75.83%. Most of the models had high values for the proposed method, and SM, LVI, and PNI had the same accuracy. This was because the SM, LVI, and PNI models used the same initial data for analysis. The proposed method performed well because it used a feature selection technique that considered the PCa process.

Finally, the performance of the proposed model was evaluated by measuring whether the predictions of pathological T stage and BCR were significant using all of the algorithms. The experimental method involved adjusting the weights of the training and testing sets and the results are shown in Figure 9. The pathological T stage had a very high accuracy of 98.5% when all data were used, and BCR was significantly high as well (96.25%). The more data, the more accurate the proposed model became, because the performance gradually increased according to the amount of learned data. Using the proposed model, as cancer progression can be predicted, physician’s can make optimal decisions, minimizing costs and providing precise medical services.

4. Discussion

We proposed a DT framework and built a system to predict cancer progression in patients with PCa. The prediction was constructed using eight predictive models with data from 404 patients. The DT-based prediction model can predict the pathological status as well as BCR. Previous predictive studies using ML related to PCa have limitations. This is because their models were designed only for a specific point in time. However, this study is superior to previous studies because it provides predictions linked to pathology and BCR. The DT-based model achieved good performance (pathology: 98.50%; BCR: 96.25%). In Georgina et al. (2016), the prediction of the pathology stage using a neuro-fuzzy-based model reported 81.2% as the area under the curve [36]. Wong et al. (2019) provided predictions of BCR after a robot-assisted prostatectomy and the accuracy of K-NN, RF, and LR was reported as 0.976, 0.953, and 0.976, respectively [37]. Zhang et al. (2016) provided an imaging-based approach using SVM that was superior at predicting BCR (sensitivity: 93.3%; specificity: 91.7%; accuracy: 92.2%) [38]. The performance of our DT-based prediction model was significantly better than the performances reported in previous studies.

Our study suggested a method that can generate two predictions for pathology T and BCR sequentially using the patient’s biopsy information, and the results are excellent. Thus, physicians can prepare the treatment in advance according to the prognosis. In addition, it has the advantage of providing a treatment prognostic result to the patient. Existing studies predict pathology or BCR alone, thus they have limitations as they can only be applied to a specific moment. Each machine learning model configured in the DT-based prediction model provides a future prediction using previous data, and because it operates in a forward chain, clinical decisions can be made regarding the treatment process of PCa.

Therefore, we conclude that the proposed method will help provide a CDSS related to PCa. Further, the proposed DT framework can be expected to integrate DT with the medical field to improve medical processes, promote health, and reduce medical costs and problems.

5. Conclusions

In this study, the PCa process was considered, and ML was implemented using clinical data. The PCa process and ML were combined using the backward chaining method. Feature selection was performed using the information gain method, and the models were built using RF. The performance was improved by approximately 4% compared with the existing traditional method. This is very important for clinical decision-making. In the case of BCR prediction, the performance was significantly high at 96.25% when all data were used. The proposed method is considered suitable for application to CDSSs using ML. In addition, the proposed method can be used in other data-based medical applications. Future research will be conducted to achieve the purpose of the DT by combining various processes of PCa, and developing and connecting many models.

Author Contributions

Conceptualization, J.-K.K. and I.-Y.C.; methodology, J.-K.K.; software, J.-K.K.; validation, J.-K.K.; formal analysis, J.-K.K.; investigation, J.-K.K., S.-J.L., S.-H.H. and I.-Y.C.; resources, I.-Y.C.; data curation, S.-J.L. and S.-H.H.; writing—original draft preparation, J.-K.K.; writing—review and editing, J.-K.K.; visualization, J.-K.K.; supervision, I.-Y.C.; project administration, I.-Y.C.; funding acquisition, I.-Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a National Research Foundation of Korea (NRF) (NRF-2020R1A2C2012284). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

Our study protocol was approved by the institutional review board of the Catholic University of Korea (IRB No. KC21WNSE0887).

Informed Consent Statement

This study protocol was approved by the institutional review board of the Catholic Medical Centre. This research was a retrospective study of CDW data and all data was de-identified and involved no more than minimal risk to subjects. The requirement for written informed consent was waived by the Research Ethics Committee of the Catholic Medical Centre. All methods were performed in accordance with the relevant guidelines and regulations.

Data Availability Statement

The datasets of the current study are not publicly available owing to Catholic Medical Centre policies, and reasonable privacy and security concerns; the underlying CDW data are not easily redistributable to researchers from other centers. However, they are available upon reasonable request from corresponding authors and with the permission of the Catholic Medical Centre.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cricelli, I.; Marconi, E.; Lapi, F. Clinical Decision Support System (CDSS) in Primary Care: From Pragmatic Use to the Best Approach to Assess Their Benefit/Risk Profile in Clinical Practice. Curr. Med. Res. Opin. 2022, 38, 827–829. [Google Scholar] [CrossRef] [PubMed]
Saddik, A.E.; Badawi, H.F.; Velazquez, R.; Laamarti, F.; Diaz, R.G.; Bagaria, N.; Arteaga-Falconi, J.S. Dtwins: A Digital Twins Ecosystem for Health and Well-Being. IEEE COMSOC MMTC Commun.-Front. 2019, 14, 39–43. [Google Scholar]
Anna, M.A.; Yuhan, D.; Yasmine, G.; Lan, W.; Claudia, M.; Brett, A.B.; Catherine, M. Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Appl. Sci. 2021, 11, 5088. [Google Scholar] [CrossRef]
Gaebel, J.; Keller, J.; Schneider, D.; Lindenmeyer, A.; Neumuth, T.; Franke, S. The Digital Twin: Modular Model-Based Approach to Personalized Medicine. Curr. Dir. Biomed. Eng. 2021, 7, 223–226. [Google Scholar] [CrossRef]
Nussinov, R.; Jang, H.; Tsai, C.J.; Cheng, F. Review: Precision Medicine and Driver Mutations: Computational Methods, Functional Assays and Conformational Principles for Interpreting Cancer Drivers. PLoS Comput. Biol. 2019, 15, e1006658. [Google Scholar] [CrossRef]
Castaneda, C.; Nalley, K.; Mannion, C.; Bhattacharyya, P.; Blake, P.; Pecora, A.; Goy, A.; Suh, K.S. Clinical Decision Support Systems for Improving Diagnostic Accuracy and Achieving Precision Medicine. J. Clin. Bioinform. 2015, 5, 4. [Google Scholar] [CrossRef]
Peiffer-Smadja, N.; Rawson, T.M.; Ahmad, R.; Buchard, A.; Georgiou, P.; Lescure, F.X.; Birgand, G.; Holmes, A.H. Machine Learning for Clinical Decision Support in Infectious Diseases: A Narrative Review of Current Applications. Clin. Microbiol. Infect. 2020, 26, 584–595. [Google Scholar] [CrossRef]
Shengli, W. Is Human Digital Twin Possible? Comput. Methods Programs Biomed. Update 2021, 1, 100014. [Google Scholar] [CrossRef]
El Saddik, A.E. Digital Twins: The Convergence of Multimedia Technologies. IEEE MultiMedia 2018, 25, 87–92. [Google Scholar] [CrossRef]
Barricelli, B.R.; Casiraghi, E.; Fogli, D. A Survey on Digital Twin: Definitions, Characteristics, Applications, and Design Implications. IEEE Access 2019, 7, 167653–167671. [Google Scholar] [CrossRef]
Rao, D.J.; Mane, S. Digital Twin Approach to Clinical DSS with Explainable AI. arXiv 2019, arXiv:1910.13520. [Google Scholar] [CrossRef]
Rockne, R.C.; Hawkins-Daarud, A.; Swanson, K.R.; Sluka, J.P.; Glazier, J.A.; Macklin, P.; Hormuth, D.A.; Jarrett, A.M.; Lima, E.A.B.F.; Tinsley Oden, J.; et al. The 2019 Mathematical Oncology Roadmap. Phys. Biol. 2019, 16, 041005. [Google Scholar] [CrossRef] [PubMed]
Yadav, P.; Steinbach, M.; Kumar, V.; Simon, G. Mining Electronic Health Records (EHRs): A Survey. ACM Comput. Surv. 2018, 50, 1–40. [Google Scholar] [CrossRef]
Hassanipour, S.; Fathalipour, M.; Salehiniya, H. The Incidence of Prostate Cancer in Iran: A Systematic Review and Meta-analysis. Prostate Int. 2018, 6, 41–45. [Google Scholar] [CrossRef]
Mottet, N.; Bellmunt, J.; Bolla, M.; Briers, E.; Cumberbatch, M.G.; De Santis, M.; Fossati, N.; Gross, T.; Henry, A.M.; Joniau, S.; et al. EAU-ESTRO-SIOG Guidelines on Prostate Cancer. Eur-Part I: Screening, diagnosis, and local treatment with curative intent. Eur. Urol. 2017, 71, 618–629. [Google Scholar] [CrossRef]
Cornford, P.; Bellmunt, J.; Bolla, M.; Briers, E.; De Santis, M.; Gross, T.; Henry, A.M.; Joniau, S.; Lam, T.B.; Mason, M.D.; et al. EAU-ESTRO-SIOG Guidelines on Prostate Cancer. Eur-Part II: Treatment of relapsing, metastatic, and castration-resistant prostate cancer. Eur. Urol. 2017, 71, 630–642. [Google Scholar] [CrossRef]
Shanafelt, T.D.; Gradishar, W.J.; Kosty, M.; Satele, D.; Chew, H.; Horn, L.; Clark, B.; Hanley, A.E.; Chu, Q.; Pippen, J.; et al. Burnout and Career Satisfaction Among US Oncologists. J. Clin. Oncol. 2014, 32, 678–686. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, L.; Yang, Y.; Zhou, L.; Ren, L.; Wang, F.; Liu, R.; Pang, Z.; Deen, M.J. A Novel Cloud-Based Framework for the Elderly Healthcare Services Using Digital Twin. IEEE Access 2019, 7, 49088–49101. [Google Scholar] [CrossRef]
Gandaglia, G.; Ploussard, G.; Valerio, M.; Marra, G.; Moschini, M.; Martini, A.; Roumiguié, M.; Fossati, N.; Stabile, A.; Beauval, J.B.; et al. Prognostic Implications of Multiparametric Magnetic Resonance Imaging and Concomitant Systematic Biopsy in Predicting Biochemical Recurrence After Radical Prostatectomy in Prostate Cancer Patients Diagnosed with Magnetic Resonance Imaging-Targeted Biopsy. Eur. Urol. Oncol. 2020, 3, 739–747. [Google Scholar] [CrossRef]
Athanasiou, A.; Tennstedt, P.; Wittig, A.; Huber, R.; Straub, O.; Schiess, R.; Steuber, T. A Novel Serum Biomarker Quintet Reveals Added Prognostic Value When Combined with Standard Clinical Parameters in Prostate Cancer Patients by Predicting Biochemical Recurrence and Adverse Pathology. PLoS ONE 2021, 16, e0259093. [Google Scholar] [CrossRef]
Negri, E.; Pandhare, V.; Cattaneo, L.; Singh, J.; Macchi, M.; Lee, J. Field-Synchronized Digital Twin Framework for Production Scheduling with Uncertainty. J. Intell. Manuf. 2021, 32, 1207–1228. [Google Scholar] [CrossRef]
Chao, C.M.; Yu, Y.W.; Cheng, B.W.; Kuo, Y.L. Construction the Model on the Breast Cancer Survival Analysis Use Support Vector Machine, Logistic Regression and Decision Tree. J. Med. Syst. 2014, 38, 106. [Google Scholar] [CrossRef] [PubMed]
Elayan, H.; Aloqaily, M.; Guizani, M. Digital Twin for Intelligent Context-Aware IoT Healthcare Systems. IEEE Internet Things J. 2021, 8, 16749–16757. [Google Scholar] [CrossRef]
Zahid, A.; Poulsen, J.K.; Sharma, R.; Wingreen, S.C. A Systematic Review of Emerging Information Technologies for Sustainable Data-Centric Health-Care. Int. J. Med. Inform. 2021, 149, 104420. [Google Scholar] [CrossRef] [PubMed]
Liu, M.; Fang, S.; Dong, H.; Xu, C. Review of Digital Twin About Concepts, Technologies, and Industrial Applications. J. Manuf. Syst. 2021, 58, 346–361. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, L. A BIM-Data Mining Integrated Digital Twin Framework for Advanced Project Management. Autom. Constr. 2021, 124, 103564. [Google Scholar] [CrossRef]
He, R.; Chen, G.; Dong, C.; Sun, S.; Shen, X. Data-Driven Digital Twin Technology for Optimized Control in Process Systems. ISA Trans. 2019, 95, 221–234. [Google Scholar] [CrossRef]
Wang, B.; Zhang, G.; Wang, H.; Xuan, J.; Jiao, K. Multi-physics-Resolved Digital Twin of Proton Exchange Membrane Fuel Cells with a Data-Driven Surrogate Model. Energy AI 2020, 1, 100004. [Google Scholar] [CrossRef]
Wang, X.; Wang, Y.; Tao, F.; Liu, A. New Paradigm of Data-Driven Smart Customisation Through Digital Twin. J. Manuf. Syst. 2021, 58, 270–280. [Google Scholar] [CrossRef]
Heijnsdijk, E.A.M.; De Carvalho, T.M.D.; Auvinen, A.; Zappa, M.; Nelen, V.; Kwiatkowski, M.; Villers, A.; Páez, A.; Moss, S.M.; Tammela, T.L.J.; et al. Cost-Effectiveness of Prostate Cancer Screening: A Simulation Study Based on ERSPC Data. J. Natl. Cancer Inst. 2015, 107, 366. [Google Scholar] [CrossRef]
Lorenzo, G.; Scott, M.A.; Tew, K.; Hughes, T.J.R.; Zhang, Y.J.; Liu, L.; Vilanova, G.; Gomez, H. Tissue-Scale, Personalized Modeling and Simulation of Prostate Cancer Growth. Proc. Natl. Acad. Sci. USA 2016, 113, E7663–E7671. [Google Scholar] [CrossRef]
Freedland, S.J.; Sutter, M.E.; Dorey, F.; Aronson, W.J. Defining the Ideal Cutpoint for Determining PSA Recurrence after Radical Prostatectomy. Urology 2003, 61, 365–369. [Google Scholar] [CrossRef]
Jadhav, S.; He, H.; Jenkins, K. Information Gain Directed Genetic Algorithm Wrapper Feature Selection for Credit Rating. Appl. Soft Comput. J. 2018, 69, 541–553. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
Normawati, D.; Ismi, D.P. K-Fold Cross Validation for Selection of Cardiovascular Disease Diagnosis Features by Applying Rule-Based Datamining. Signal Image Process. Lett. 2019, 1, 23–35. [Google Scholar] [CrossRef]
Georgina, C.; Giovanni, A.; David, B.; Robert, C.R.; Masood, K.; Graham, P.A. Prediction of Pathological Stage in Patients with Prostate Cancer: A Neuro-Fuzzy Model. PLoS ONE 2016, 11, e0155856. [Google Scholar] [CrossRef]
Nathan, C.W.; Cameron, L.; Lisa, P.; Shayegan, B. Use of machine learning to predict early biochemical recurrence after robot assisted prostatectomy. BJU Int. 2019, 123, 51–57. [Google Scholar] [CrossRef]
Zhang, Y.D.; Wang, J.; Wu, C.J.; Bao, M.L.; Li, H.; Wang, X.N.; Tao, J.; Shi, H.B. An imaging-based approach predicts clinical outcomes in prostate cancer through a novel support vector machine classification. Oncotarget 2016, 7, 78140–78151. [Google Scholar] [CrossRef]

Figure 1. DT framework.

Figure 2. Virtual model concept.

Figure 3. PCa and DT process (digital twin: DT; ECE; seminal vesicle: SVI; perineural invasion: biochemical recurrence: BCR).

Figure 4. Structure of the ML models (machine learning: ML; body mass index: BMI; initial prostate-specific antigen: PSA; Gleason score (primary): GS (primary); Gleason score (secondary): GS (secondary); extracapsular extension: ECE; seminal vesicle: SVI; perineural invasion: PNI; lymph node metastasis: LVI; surgical margin: SM; pathology T stage: PT; pathology Gleason score (SUM): PGS (SUM); biochemical recurrence: BCR).

Figure 5. Method for the DT-based prediction model.

Figure 6. Feature selection algorithm with information gain.

Figure 7. Digital-twin-based prediction model.

Figure 8. Comparison of the proposed and traditional methods (left: learning algorithm; right: learning model).

Figure 9. Results of the predictions of pathological T stage and BCR (left: pathological T stage; right: biochemical recurrence).

Table 1. Collected data of 404 PCa patients from a CDW.

Source	Attribute	Type	Value
Demography	Age	Integer	Mean: 67.37, Max: 83, Min: 46, Std. D.: 6.607
Demography	Body mass index (BMI)	Integer	Mean: 22.962, Max: 33.65, Min: 14.17, Std. D.: 2.903
Blood test	Initial prostate-specific antigen (PSA)	Integer	Mean: 22.408, Max: 1794.997, Min: 0.003, Std. D.: 154.597
Biopsy	Gleason score (Primary)	Categorical	T1~T2a: 186, T2b~T2c: 164, T3~: 92
Biopsy	Gleason score (Secondary)	Categorical	2: 1, 3: 213, 4: 187, 5: 3
Biopsy	Clinical T stage	Categorical	3: 217, 4: 162, 5: 25
Pathology	Extracapsular extension (ECE)	Categorical	Present: 258, Absent: 146
Pathology	Seminal vesicle (SVI)	Categorical	Present: 330, Absent: 74
Pathology	Perineural invasion (PNI)	Categorical	Present: 109, Absent: 295
Pathology	Lymph node metastasis (LVI)	Categorical	Present: 310, Absent: 94
Pathology	Surgical margin (SM)	Categorical	Present: 249, Absent: 155
Pathology	Pathology T stage	Categorical	T1~T2a: 86, T2b~T2c: 164, T3~: 154
Pathology	Pathology Gleason score (SUM)	Categorical	6: 42 7: 184 8: 74 9: 66, 10: 38
Blood test	Biochemical recurrence (BCR)	Categorical	BCR: 116, Non-BCR: 288

Table 2. Learning model and attribute.

Learning Model	Attribute
ECE	Age, BMI, PSA, SVI, PI, LI, GS (sec)
SVI	Age, BMI, PSA PI, LI, GS (Pri), GS (Sec), Clinical T
PI	Age, BMI, PSA, GS (Pri), GS (Sec), Clinical T
LI	Age, BMI, PSA, GS (Pri), GS (Sec), Clinical T
SMI	Age, BMI, PSA, GS (Pri), GS (Sec), Clinical T
Pathology T stage	Age, BMI, PSA, ECE, SVI, GS (Pri), GS (Sec)
PGS (SUM)	Age, BMI, PSA, SVI, PI, GS (Pri), GS (Sec), Clinical T
BCR	Age, BMI, PSA, ECE, LI, GS (Pri), GS (Sec), Pathology T, PGS (sum)

Table 3. Parameters of the ML models.

Model	Parameters	Structure
Logistics Regression (LR)	Batch Size = 100	Sigmoid function
Bayesian Network (BN)	Batch Size = 100;	Search algorithm = Hill Climber
Support Vector Machine (SVM)	Batch Size = 100; degree = 3; eps = 0.001; gamma = 0.0; loss = 0.1;	Kernel = Radial basis function
Random Forest (RF)	Batch size = 100; Literation = 100	Bootstrap = 10
Neural Network (NN)	Batch Size = 100; Learning Rate = 0.3; Momentum =0.2; epoch = 500; Activation function = sigmoid	Input node = input variable count; Hidden layer = 1 layer (node: 2); Output Layer = 1 node
Recurrent Neural Network (RNN)	Learning_rate = 0.1; epoch = 10; Batch_size = 100; Activation = Softmax Function	Input Layer = input variable count Hidden Layer = 1 layer (node: input variable count); Output Layer = 1 node
Long Short-Term Memory (LSTM)	learning_rate = 0.1; epoch = 10; clipping = 5; Activation = Softmax Function; Gate Activation Function = Sigmoid	Input Layer = input variable count Hidden Layer = 1 layer, (node: input variable count); Output Layer = 1 node

Table 4. Accuracy of the eight ML models (%).

	ECE	SVI	PNI	LVI	SM	Pathology T	GS (SUM)	BCR	Average
LR	82.2	83.4	72.8	80.0	66.6	78.0	78.7	75.0	77.1
NN	77.5	84.2	68.8	78.0	62.4	75.5	79.5	76.7	75.3
BN	81.4	82.4	71.5	80.0	62.1	77.7	78.5	77.7	76.4
SVM	75.0	84.4	73.3	80.0	61.1	58.4	71.5	77.0	72.6
RF	82.2	85.4	72.8	83.2	69.1	80.2	81.7	84.7	79.9
RNN	37.4	38.4	56.7	59.7	55.4	50.7	31.4	49.3	47.4
LSTM	80.0	84.2	72.0	79.5	65.8	76.7	80.0	78.2	77.0
Average	73.7	77.5	69.7	77.2	63.2	71.0	71.6	74.1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.-K.; Lee, S.-J.; Hong, S.-H.; Choi, I.-Y. Machine-Learning-Based Digital Twin System for Predicting the Progression of Prostate Cancer. Appl. Sci. 2022, 12, 8156. https://doi.org/10.3390/app12168156

AMA Style

Kim J-K, Lee S-J, Hong S-H, Choi I-Y. Machine-Learning-Based Digital Twin System for Predicting the Progression of Prostate Cancer. Applied Sciences. 2022; 12(16):8156. https://doi.org/10.3390/app12168156

Chicago/Turabian Style

Kim, Jae-Kwon, Sun-Jung Lee, Sung-Hoo Hong, and In-Young Choi. 2022. "Machine-Learning-Based Digital Twin System for Predicting the Progression of Prostate Cancer" Applied Sciences 12, no. 16: 8156. https://doi.org/10.3390/app12168156

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine-Learning-Based Digital Twin System for Predicting the Progression of Prostate Cancer

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. DT Framework

2.2. DT Process of PCa

2.3. Dataset

2.4. ML Model

2.5. DT-Based Prediction Model

3. Results

3.1. DT-Based Prediction Model Implementation

3.2. Performance Measure

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI