Model Establishment of Cross-Disease Course Prediction Using Transfer Learning

Ying, Josh Jia-Ching; Chang, Yen-Ting; Chen, Hsin-Hua; Chao, Wen-Cheng

doi:10.3390/app12104907

Open AccessArticle

Model Establishment of Cross-Disease Course Prediction Using Transfer Learning

¹

Department of Management Information Systems, National Chung Hsing University, Taichung 402, Taiwan

²

Department of Medical Research, Taichung Veterans General Hospital, Taichung 402, Taiwan

³

Division of Allergy, Immunology and Rheumatology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung 402, Taiwan

⁴

Institute of Biomedical Science and Rong Hsing Research Center for Translational Medicine, Chung Hsing University, Taichung 402, Taiwan

⁵

Institute of Public Health and Community Medicine Research Center, National Yang Ming University, Taipei 112, Taiwan

⁶

Department of Industrial Engineering and Enterprise Information, Tunghai University, Taichung 402, Taiwan

⁷

Institute of Medicine, Chung Shan Medical University, Taichung 402, Taiwan

⁸

School of Medicine, National Yang-Ming University, Taipei 112, Taiwan

⁹

Department of Critical Care Medicine, Taichung Veterans General Hospital, Taichung 402, Taiwan

¹⁰

Department of Computer Science, Tunghai University, Taichung 402, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(10), 4907; https://doi.org/10.3390/app12104907

Submission received: 10 April 2022 / Revised: 2 May 2022 / Accepted: 10 May 2022 / Published: 12 May 2022

(This article belongs to the Special Issue Application of Artificial Intelligence, Deep Neural Networks)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the development and application of artificial intelligence have both been topics of concern. In the medical field, an important direction of medical technology development is the extraction and use of applicable information from existing medical records to provide more accurate and helpful diagnosis suggestions. Therefore, this paper proposes using the development of diseases with easily discernible symptoms to predict the development of other medically related but distinct diseases that lack similar data. The aim of this study is to improve the ease of assessing the development of diseases in which symptoms are difficult to detect, and to improve the utilization of medical data. First, a time series model was used to capture the continuous manifestations of diseases with symptoms that could be easily found at different time intervals. Then, through transfer learning and attention mechanism, the general features captured were applied to the predictive model of the development of diseases with insufficient data and symptoms that are difficult to detect. Finally, we conducted a comprehensive experimental study based on a dataset collected from the National Health Insurance Research Database in Taiwan. The results demonstrate that the effectiveness of our transfer learning approach outperforms state-of-the-art deep learning prediction models for disease course prediction.

Keywords:

deep learning; time series models; transfer learning; electronic health records

1. Introduction

According to the 2019 Global Health Estimates published by the World Health Organization (WHO) in 2020 [1], the average life expectancy of people in 2019 increased by 6 years compared with that in 2000, and the global average life expectancy increased from 67 years old in 2000 to over 73 years old in 2019. However, life extension does not represent healthy life years, as chronic diseases such as heart disease, diabetes, cardiovascular disease, and chronic respiratory disease significantly shorten people’s healthy life years, and non-communicable diseases account for seven of the top ten causes of death in the world [1].

Chronic disease refers to a person’s physical condition or a disease that is persistent and lasts for a long period of time. Common chronic diseases include arthritis, asthma, and cancer in addition to the above-mentioned ones [2]. When patients receive drug-based treatment, they usually experience periodic short-term remission and then relapse, experiencing a cycle of repeated symptom recurrence and control [3], which increases the suffering caused by the disease. From examining statistical reports and the development of symptoms of some chronic diseases, it has been suggested that the prevention and treatment of chronic diseases is a medical field that needs to be strengthened. Many methods for the prevention and treatment of chronic diseases [2,3,4,5,6,7,8] is to begin treatment at an early stage of the disease before the emergence of obvious symptoms, instead of controlling the diseases’ progression once symptoms are severe enough to affect an individual’s daily life.

With the rapid development of medical technology today, various applications of the medical industry have expanded and evolved, including registration, consultation, cases, operations, outpatient procedures, medical equipment, and nursing care. Through the advancement of storage and synchronization technology, the patient data in these applications can systematically preserve the historical medical experience that represents the patient’s physical state, which are known as Electronic Health Records (EHRs). Through the analysis, classification, and integration of EHRs data, we can develop disease-related prediction mechanisms and provide accurate predictions for disease courses, including diagnosis, treatment, and recommended drugs [3,4]. However, in the process of EHRs data analysis, some problems relating to insufficient disease data still persist, and one example is Rheumatoid Arthritis (RA) [9]. This disease does not directly cause patient death, but instead progressively damages the joints, making it less likely to receive public attention. As a result, patients fail to seek medical treatment early, miss the golden period of treatment, and fail to record the development of their arthritis symptoms, which eventually leads to a small number of data samples that cannot provide a complete understanding of arthritis development [10].

In recent years, deep learning methods have been proposed and apply to many real-world applications, including Healthcare [2,3,4,5,6,7,8,9,10,11], Speech Signal Processing [12,13], and Cybersecurity [14,15,16,17]. Especially, it achieved end-to-end learning and performed well in medical prediction tasks [2,5]. Yang et al. [6] extracted clinical data from hospitalization information to form a clinical sequence and combined the bidirectional long-term and short-term memory model with the attention mechanism to predict future disease probability. He et al. [7] established an automatic arrhythmia classification model by combining the deep residual network and the bidirectional long-term and short-term memory model. Choi et al. [8] used the inherent hierarchical knowledge of medical ontology to supplement the information of EHRs, established a graph-based attention model (GRAM) to analyze EHRs, and predicted the diagnosis of the next visit and probability of heart failure. From the literature review of the aforementioned medical task predictions, it is evident that most studies have applied deep learning to the prediction task of a single disease, but few studies have applied deep learning to the analysis and prediction of cross-diseases. Transfer learning [10] can be used for data with different but similar distributions, and its features can be used for different tasks to predict the transfer of parameters between models [10]. The feasibility and effects of transfer learning in deep learning can be understood through the research of Yosinski et al. [11], and the characteristics of this method are exactly in line with the characteristics of medical information. Therefore, the concept of transfer learning can be applied to medical information, so that even medical tasks with poor predictive performance due to the lack of data can train predictive models with a minimum of effectively labeled data.

Although transfer learning has been addressed in many studies [18,19,20,21], these works mostly focus on image analysis. Unlike image analysis, transfer learning cannot be casually performed on EHRs data for multiple disease course prediction. The reason is that only few diseases share similar courses. Therefore, performing transfer learning on EHRs data should consider whether there is sufficient medical evidence that reveals a correlation between source (disease) domain and target (disease) domain. For example, in related studies on chronic diseases, the treatment of periodontal disease has been found to effectively alleviate the symptoms of RA [22,23]. Thus, it can be seen that there is a similar distribution of pathological characteristics between these two diseases. Therefore, we can use transfer learning (TL) in deep learning to analyze the data on periodontal disease (PD) and RA in EHRs, extract features through the time series model, and establish a predictive model for the development of a source disease based on PD medical records and transfer the predictive model for the development of the target disease based on RA medical records.

Furthermore, many prediction and classification models involve the notion of a self-attention mechanism to improve the accuracy of a prediction result. However, such a self-attention mechanism usually cannot work well while we transfer a timeseries prediction model from a source domain with a high sampling rate to a target domain with a low sampling rate. As health status is presented by the patient’s medical records, the parameters of the timeseries prediction model trained by the source disease course were transferred to the target prediction model for predicting the target disease course. For example, the target prediction model for prediction will be transferred from one built based on PD, in EHRs realized based on the RA visit record data, but the self-attention mechanism must be fine-tuned with a small dataset of the target disease to accurately predict its course. As a result, a new breed transfer learning in deep learning is desired to build a disease course prediction model that can transfer the model train with a self-attention mechanism from the source disease domain to the target disease domain.

To do so, the main task of this study is to use the transfer learning framework with a self-attention mechanism to build a cross-disease course development prediction model and summarize the disease course development model of the existing chronic disease patient data. The purpose is to assist doctors in estimating the course of target diseases and recommend accurate treatment plans while the training data is not sufficient for building a deep learning model. In view of the lack of training data due to the characteristics of the chronic disease course, this study provides a solution to extract the development characteristics of each PD patient and predict the course development by applying deep learning models to the National Health Insurance Research Database in Taiwan [24]. The results will be used for the training of the feature extraction of the disease course model of RA patients. Ultimately, this study aims to construct a cross-disease course development prediction model based on the medical data of chronic disease patients to ensure that disease information with a small amount of data can obtain a good progression prediction outcome, for patients to detect disease earlier and avoid the discomfort caused by it, and for people’s healthy life years to be increased. By using the feature extraction technology of deep learning as the basis, this study aims to construct and train a prediction model suitable for the analysis of the disease course development. Based on our proposed transfer learning framework, a timeseries predictive model can be built from a small number of samples and a pretrained model.

The contributions of our research are four-fold:

We propose a timeseries prediction model for disease course across related but distinct diseases.
We use transfer learning to address inaccurate predictions in the presence of small amounts of disease-specific data.
An attention mechanism is designed for a timeseries prediction model of source domain with a high sampling rate, such that it can be transferred to a target domain with a low sampling rate. Therefore, the timeseries prediction model is not only effective but also explanatory.
We use a dataset from the National Health Insurance Research Database in Taiwan [24] in a series of experiments to evaluate the performance of our proposal. The results show superior performance over other classifier-based fraud detection techniques in terms of mean squared error (MSE).

In this study, the construction of a prediction model for the development of cross-disease course using transfer learning of deep learning is divided into five sections. Section 2 is a comparative discussion of the research literature on the background materials and application of the research methods. Section 3 introduces in detail the National Health Insurance Research Database in Taiwan [24] used in this study, the data preprocessing procedure, and the time series model architecture of transfer learning and research method design. Section 4 presents the results of model experiment evaluation and discusses the interpretation of the results of the attention mechanism. Section 5 summarizes the conclusions drawn from this study and points out future research directions.

2. Related Works

In previous studies, transfer learning has been mostly used in image analysis and it is no exception in the medical field, with various studies [18,19,20,21] all achieving significant improvements in the accuracy of medical image recognition applications. In the application of EHRs data, there is also parameter transfer of the same prediction task model using cross-hospital data [25]. Lee et al. [26] used Artificial Neural Networks (ANNs) as a model basis for transfer learning and applied it to train de-identified models on small EHRs datasets. Desautels et al. [27] combined transfer learning with decision tree, and, under the sharing of pre-training model parameters, helped hospitals with a small amount of data to improve the accuracy of the mortality prediction model. Priyanka et al. [28] extracted features from physiological signal data using time series feature extraction models based on Recurrent Neural Network (RNN)—TimeNet [29] and HealthNet [28]—and then transferred the parameters of the pretrained model to the target model for binary mortality prediction.

Medical information is roughly divided into medical images and medical data [18]. Medical images include ultrasound images, computed tomography scans, and MRI scans [19]. In a complete medical system, a huge number of medical images are accumulated, but there is a lack of labeling and diagnosis of images. As labeling requires a lot of time and manpower, its accuracy is reduced due to long-term operations. Therefore, it is somewhat challenging to use medical images in the development of deep learning applications. Medical data comprise body measurement data, diagnosis, and medication in the patient’s medical records. There are also limitations in the application of these medical data. Among these, doctors in different hospitals have different practices in compiling medical treatment records; it is hard to notice certain diseases during specific medical task predictions because of the difficulty in detecting the symptoms, which result in patients being unable to seek medical treatment within a certain period of time; a lack of data for a complete training process; and poor performance of the prediction model.

The methods of applying shared parameters to the target task prediction model in transfer learning can be divided into freezing and fine-tuning [10]. Figure 1 is a schematic diagram of the training process of the two methods. Freezing is to transfer the trained parameters of the source task prediction model to the first n-layers of the target task prediction model, then freeze the first n-layers without parameter training, initialize the rest of the target task prediction model, and train for the target task to achieve the target prediction outcome. Fine-tuning is to backpropagate the error during training of the target task prediction model to the n-layers copied from the source task prediction model to update the weights, so that the copied parameters are adjusted to match the prediction of the target task. Choosing one of the two methods depends on the size of the target task dataset and the number of parameters migrated from the source task prediction model. If the target dataset is small, and the source task prediction model has a large number of parameters pre-trained, freezing is usually chosen to avoid model overfitting. If the target task dataset is large or the source task prediction model has a small amount of pre-trained parameters, the basic parameters can be fine-tuned to match the prediction of the target task and improve performance.

Although many deep learning mechanisms have been proposed to deal with the problem of model building for a single disease prediction, all of them rely on a huge amount of training data [20,21]. Unfortunately, some diseases are not as common so data regarding these diseases cannot be retrieved from EHRs. Therefore, it is evident that the application of transfer learning in EHRs data is mostly aimed at the problem of model building for a single disease prediction or a specific task with insufficient data. Transferring the parameters of the same task model from hospitals with a large amount of data, as well as fine-tuning to conform to the feature distribution of the target dataset, can significantly improve the accuracy of the target task. However, in the target task of transfer learning, the relationship between different diseases is rarely used for cross-disease analysis. Thus, this study will use transfer learning to carry out a cross-disease course prediction analysis for related diseases.

3. Materials and Methods

In the analysis of target disease course prediction, it is necessary to address the problems of a long time series, high dimensional feature, systematic errors, and a small sample size. Therefore, in this study, patients’ historical medical records were converted into a data format suitable for entering the analysis model through an equivalent conversion. Then, the transfer learning and time series model were used to effectively extract features and address the drawback of a small sample size. Lastly, the target disease course development prediction was obtained. The framework for cross-disease analysis methods using transfer learning was thus presented above. This section will introduce the system architecture and describe the design and implementation of each step.

3.1. Datasets

A National Health Insurance Research Database in Taiwan [24] was fully implemented on 1 March 1995, under the supervision of the Ministry of Health and Welfare. It is a compulsory social insurance that has a national flat rate. Its main goal is to provide the whole society with access to fair medical resources. Under this system, the insurance coverage rate is as high as 99%, making the dataset a very representative sample in medical research. Thus, the health insurance research database will be used as the data source for research. This study uses transfer learning as the main research method. Therefore, this section is divided into source domain data and target domain data.

3.1.1. Source Domain Data

The source domain data used the medical records of PD patients from 1999 to 2010. This study used the outpatient prescription and treatment details file (Ambulatory care expenditures by visits, CD) and outpatient prescription medical order detailed file (Details of ambulatory care orders, OO), shown in Figure 2. The data record format is shown in the table in Figure 2. The CD1999 file used in the re-source domain data in this study represents the CD files of all PD patients who had outpatient clinics in 1999. The figure lists several fields used in this study. ID represents the transcoded national identity of each patient. SEQ_NO is the serial number of the case, and it is consistent with the serial number of the OO file, so that the two files can be linked together to indicate the code for medical treatment or drug ((DRUG_NO) and drug dosage (DRUG_USE). DRUG_DAY is the day of administration. The International Classification of Diseases 1, ACODE_ICD9_1, is used to record the names of injuries and diseases, and there are 3 columns for recording.

In this study, we will understand the patients’ disease course through the outpatient diagnosis data that have been collated. The screening criterium for PD patients was that the diagnosis codes in any column of the International Classification of Diseases (ACODE_ICD9) 1–3 in the CD file in 2000 are 5233–5235 twice or more, which included a total of 864,478 individuals, of which 728,353 individuals had complete outpatient records for 12 years. In order to accurately assess the development of the patient’s disease, this study deleted 1868 individuals with abnormal records related to periodontal medication, including 129 individuals whose administration days were recorded as frequency of use—such as TIW (i.e., 3 times a week)—and 1739 individuals whose drug code was periodontal medication, but the administration day was null. This study also excluded 2330 individuals whose medication values were outliers—of whom the number of medication days exceeded 100 within 3 months. Eventually, a total of 724,155 people were included in the source domain of this study.

3.1.2. Target Domain Data

The target domain data were collected from 1999 to 2010 from 724,155 patients with a diagnosis code of 7140 in any column of the International Classification of Diseases 1–3, which included a total of 31,160 individuals. The sample then excluded two patients whose RA medication values were outliers and 2855 patients whose drug types could not be compared for disease course evaluation, such as the ones with Anatomical Therapeutic Chemical (ATC) codes L01BA01, L01AA01, etc. The absence of a Defined Daily Dose (DDD) for these drugs hindered subsequent equivalent conversions of the dosage, which made it impossible to assess the patient’s disease course. In the end, 28,303 individuals were included in the target domain of this study.

3.2. Data Preprocessing

Due to the high dimension of the data feature fields of the National Health Insurance Research Database in Taiwan [24] and the inconsistent length of the patient’s historical medical treatment interval, it is not easy to understand the development of the disease of each patient at each time point, and this problem will affect the training results of the prediction model, leading to inaccurate predictions. Therefore, it was necessary to first use the data preprocessing method in this section to convert and calculate the data by using the features suitable for disease course, and then enter it in the analysis model.

Using the process shown in Figure 3, this study uses SEQ_NO to concatenate CD and OO files to sort out a total of 12 CD and OO files from 1999 to 2010, and then uses ID to sort out all outpatient records of each patient from 1999 to 2010. As the development manifestations of the source and target diseases are different, the required data preprocessing methods are also different. Therefore, this section will be divided into two parts: the source domain data preprocessing method and the target domain data preprocessing method.

3.2.1. Source Domain Data Preprocessing Method

In this study, the source domain disease course is characterized by the number of days of antibiotics use recommended by professional physicians for PD treatment (PD_DAY), the number of drug types (PD_DRUG), the number of PD-related operations (PD_DISPOSAL), and the number of dental cleanings (PD_SCALING) to represent the progression of PD. The data preprocessing method is to use all the outpatient records of 724,155 PD patients from 1999 to 2010 and compare whether DRUG_NO is the PD treatment code through the PD course matrix representation calculation method in Table 1

In this study, a duration of one month was used to add the value of the development characteristics of PD. We have used four medical records of a patient whose ID was 8f14 in Table 2 for illustration. First, the FUNC_DATE field was used to determine whether it is the outpatient record in January 1999, and when DRUG_NO is the antibiotic code A000080277 for PD treatment, record 7 in the DRUG_DAY column, and one was added to the number of drug types.

As shown in Table 3, in the PD_DAY and PD_DRUG columns where TIME is 1, the DRUG_NO is the periodontal surgery code 91001, so the number of periodontal surgeries is 1. Because the surgery date is January 18, the number “1” is recorded in the PD_DISPOSAL column—the column representing the number of operations—when TIME is 1. However, there are no codes related to periodontal treatment in February, so all the values are 0 when TIME is 2. In March, when the DRUG_NO column of the medical record is the dental cleaning code 91003C, the PD_SCALING column, which represents the number of cleanings, is 1, when TIME is 3.

3.2.2. Target Domain Data Preprocessing Method

In this study, the target domain data preprocessing method used the disease course representation features of the target domain. In addition to the four fields represented by the PD development, the dosages and types of oral and injectable drugs for RA recommended by professional physicians were added, which was the number of RA oral drugs used (RA_USE_ORAL), the number of oral drug types (RA_DRUG_ORAL), the amount of injection drugs used (RA_USE_INJ), and the number of injection drug types (RA_DRUG_INJ). The data preprocessing method is to make use of all the outpatient records of 28,303 RA patients from 1999 to 2010, which were previously compiled, then compare whether DRUG_NO is a medication code for the RA treatment, and lastly perform the equivalent conversion of the drug usage.

In terms of drug dosage records, this study used DDD for the unified measurement of drugs. According to the definition of the WHO, DDD is the assumed average daily maintenance dose for an average 70 kg adult for main indications. Taking the DMARDs group of RA drugs as an example, if the drug in hospital A is leflunomide, its ATC code is L04AA13, the DDD prescribed by the WHO is 20 mg, the minimum package in the hospital is 1 capsule, the content is 15 mg/capsule, and the prescription is 2 capsules/day, then the drug usage is 1.5 DDD. At the same time, if the drug in hospital B is ciclosporin, its ATC code is L04AD01, the DDD specified by the WHO is 250 mg, and the minimum package of the drug is 1 capsule, with a content of 125 mg/day, and the doctor’s prescription is 3 capsules/day, then the drug usage is also 1.5 DDD. Thus, it can be seen that in different hospitals, the same type of different drugs can be equivalently converted through DDD, which standardizes the drug dosage and can understand and compare the severity of patients’ conditions.

We have used four medical records of a patient whose ID was 791f in Table 4 for illustration. First, use the FUNC_DATE column to determine whether it is the outpatient record in January 1999. When the DRUG_NO is the RA treatment drug code B023615100, divide its DRUG_USE value by 20 mg, the DDD of B023615100, and record the result, 1.5 DDD, together with its medication code. For B018142100 on the visit date 16 January, divide its DRUG_USE value by 250 mg, the DDD of B018142100, and then record the result, 0.2 DDD, with its medication code.

Finally, add up all the DDDs in January, calculate the number of medication codes, and output the RA_USE_ORAL and RA_DRUG_ORAL columns with TIME as 1, as presented in Table 5. For AC45781100 in February, divide its DRUG_USE by 150 mg, which is the DDD of AC45781100. Since the result is 0.667DDD, the values of the RA_USE_ORAL and RA_DRUG_ORA columns with TIME of 2 are 0.667 and 1, respectively. When the DRUG_NO column of the medical record in March is the dental cleaning code 91003C, add 1 to the PD_SCALING column, which represents the number of dental cleanings, when TIME is 3.

3.2.3. Matrix Representation of Disease Course

The original data were normalized and aggregated to represent patients’ disease course status through comparative calculation and equivalent conversion. Using the new data to represent the current health status of each patient will be difficult due to the different lengths of medical treatment intervals, resulting in a time series model not being able to effectively capture the temporal characteristics of the disease course, which in turn affects the prediction accuracy of the task. Therefore, in this study, the number of days of medication, number of medication types, and related operations in the patients’ medical records were aggregated according to the characteristics determined by the study. By adding the medication over time, the patients’ medication records can be presented in a unified way that is suitable for the capture of subsequent temporal relationship features. The matrix representation of the source disease course has four columns, and that of the target disease course has eight columns. The values within a month were recorded, with a total of 144 time series data.

As shown in Table 5, in this study, the length of different sequences was summed up based on the data from one month, in order to understand the influence of sequence length on the prediction of disease course. The total sequence length was set as ‘k’. When k = 3, it represents a patient characteristics matrix formed by a total of three months of sequence data, as well as the progression of the patient’s disease over one season. For example, by summing up the columns in Table 5 where TIME equals to 1, 2, and 3, the result is the column in Table 6 where TIME is equal to 1. Here, only the disease course matrix representation of the target disease is used as an example.

3.3. TL-CDC

This study uses transfer learning as the framework for the prediction method of disease course across diseases. In the source disease course prediction model, this study used Recurrent Neural Network (RNN) as the deep learning model for extracting features, including LSTM, Bi-LSTM, and GRU. These three time series models that are specialized in processing the time series data add an additional memory mechanism for the feature extraction of time series relationships, which is mainly composed of Input gate, Forget gate, Output gate, and Memory cell candidate components.

The target disease course prediction model is built by transferring the time series characteristic parameters trained by the source disease course prediction model to the time series model of the target disease course prediction model, then fine-tuning by adding the calculation of the attention mechanism, and finally outputting the disease course prediction at the next time point. The architecture process explanation of the source disease course prediction model and the target disease course development prediction model, respectively, are as follows.

3.3.1. Prediction Model of Course Development of Source Disease

The structure of the prediction model for the development of the disease course of the source disease used in this study is shown in Figure 4. In order to effectively capture the temporal relationship, RNN-based LSTM, Bi-LSTM, and GRU were selected, and the calculation of the attention mechanism was added to extract the features of the source disease course. This study also compared the effect of adding an attention mechanism to the target disease prediction task after parameter transfer. This section will explain in detail the time series model and attention mechanism of the source disease course development prediction model.

Time Series Model

This study uses the Many-to-One architecture in the time series model. First, the PD course representation of the last four time points was input into the time series model. If k is 3, the input of data from the four time points of the disease course matrix into the model represented the input of one year of the patient’s disease course data, and the output of the disease course representation at the next time point is also the prediction of the disease course development in the following three months. For example, in the upper part of Figure 4, the source disease course representation tensor composed of multiple patient data was used as input data, and one of the two-dimensional matrices represents the development of a patient’s disease course, and the column in green represents the disease course at a time point.

First, add 1 to the input data to get

l o g_{10}

, in order to avoid the problem of incomplete model convergence caused by the large difference in the dosage of each patient. Next, input the processed data of the four time points into the time series model for training and prediction of the disease course development at the fifth time point in red. Then, slide down to the blue box as the four time points entered into the model at one time, and predict the light red disease representation at the next time point. The data is then input to the time series model with a fixed-length sequence, and the feature capture of the temporal relationship is carried out through the input gate, forget gate, output gate, and memory cell candidate elements in the network.

The time loop network needs to calculate the weights through these components during the training process, and too many layers of the time loop network will cause the problem of slow training or underfitting. As the sample size of source disease was large in this study, in view of the time and hardware cost, this study set a one-layer time loop network and 128 hidden layer neurons to build a time series model, and the last output of the network enters the fully connected layer to predict the development of the disease course at the next time point.

The dotted line in Figure 4 represents the operation flow after the output of the time series model, and here, LSTM is used for illustration. As shown in Figure 4, the time series model utilized in our framework consists of three kinds of deep learning neural networks, LSTM, GRU, and Bi-LSTM. The detailed explanation of these deep learning neural networks is listed as follows:

LSTM. Long short-term memory (LSTM) is a sort of recurrent neural network which is composed of a memory cell c_t, an input gate for input vector X_t, an output gate o_t and a forget gate for hidden vector h_t. Figure 5 shows the structure of a neuron of the LSTM. Meanwhile the notion o_t indicates the output vector of t terms. To realize the long-term memory, the notion of c_t represents a matrix which is utilized for remember the past input.
GRU. Gated recurrent units (GRUs) is a sort of gating mechanism in a recurrent neural network which is composed of a memory cell c_t, an input gate for input vector X_t, and a forget gate for hidden vector h_t, but without output gate. Therefore, the memory cell is utilized for output. Figure 6 shows the structure of a neuron of the LSTM. Meanwhile the notion h_t indicates the hidden vector of t terms. Unlike LSTM, the notion of c_t represents a matrix which is utilized to remember the past input and current output.
Bi-LSTM. Bidirectional long-short term memory (Bi-LSTM) is a sort of recurrent neural network which can process the sequence information in both directions backwards (future to past) or forward (past to future). In bi-directional, we can realize the input flow in both directions to represent the future and the past information. Figure 7 shows the structure of a neuron of the Bi-LSTM. Similar to LSTM, the notion o_t indicates the output vector of t terms. To realize the long-term memory, the notion of c_t represents a matrix which is utilized to remember the past input.

Attention Mechanism

The design of the attention mechanism in this study used Global Attention to keep track of the importance of the four time points to predict the development of the disease course at the time point, as shown in the attention mechanism part in Figure 4. It is mainly the output of the four time points of the time series model (where the dotted line is connected to the solid line in the figure) that can indirectly represent the last hidden state output from the prediction time point and calculate the importance of each time point’s data to the prediction time point output. Equation (1) formally defines the attention that represents

A t t_{w}

in Figure 4.

e_{i j} = a (O u t p u t, h_{4})

(1)

Equation (2) standardized the attention weights at each time point so that they are added up to 1,

α_{i j} = \frac{\exp (e_{i j})}{\sum_{k = 1}^{T_{z}} \exp (e_{i k})}

(2)

Equation (3) then multiplied the output of the time series model by the weight to obtain the weighted output vector

c_{i}

, as the

A t t_{w} O u t p u t

shown in the figure.

c_{i} = \sum_{j = 1}^{T_{x}} O u t p u t^{⊺} α_{i j}

(3)

The calculation on the right half of the attention mechanism is a schematic diagram of the calculation of Bi-LSTM with the attention mechanism. The hidden states being spliced together were obtained by the Bi-LSTM calculation of the forward and backward

[\vec{h_{t - 1}}; \overset{\leftarrow}{h_{t - 1}}]

. Thus, the attention mechanism can be formally defined as Equation (4). The hidden state that indirectly represented the prediction time point was obtained, and the attention weight was calculated with the output of the Bi-LSTM 4 time points.

{e^{'}}_{i j} = a (O u t p u t, [\vec{h_{t - 1}}; \overset{\leftarrow}{h_{t - 1}}])

(4)

Equation (5) standardized the attention weights obtained by the Bi-LSTM calculation of the forward and backward at each time point so that they added up to 1.

{α^{'}}_{i j} = \frac{\exp (e ’_{i j})}{\sum_{k = 1}^{T_{z}} \exp (e ’_{i k})}

(5)

Equation (6) then multiplied the output of the time series model with the weight to obtain the weighted output vector

{c^{'}}_{i}

.

{c^{'}}_{i} = \sum_{j = 1}^{T_{x}} O u t p u t^{⊺} {α^{'}}_{i j}

(6)

3.3.2. Prediction Model of Target Disease Course Development

The target disease course development model framework of this research method is shown in Figure 8. In order to enable the target prediction task to train the disease course development prediction model with the minimum amount of effective training data, the model being used was constructed by the course development prediction model based on patients’ medical records of source diseases with a similar distribution. First, use linear transformation to convert the target disease course representation tensor into data with the same feature dimension as the source disease course representation tensor. Next, transfer the pre-trained parameters of the source disease prediction model to the time series model of the target disease course development prediction model. Lastly, train the model for the target task.

There are two types of transferred parameters. The first type is the parameters in which the time series model captures the time series relationship and directly predicts the development of the source disease course at the next time point, as shown in the parameter transfer blue box of the source disease course development prediction model in Figure 8. The second type is the parameters in which the time series model was joined by the attention mechanism calculation. As mentioned in the previous section, adding the attention mechanism for calculation has an impact on the prediction of the disease course development of the source disease at the next time point by allocating a higher weight configuration for past time point information. Parameter transfer in the time series model combined with attention mechanism calculation after model training is shown in the red box in the figure.

The output of the time series model is divided into two prediction methods for the target task. The first is that the output directly enters the fully connected layer to learn, train, and predict the target prediction task. The second is that the output goes through weighted calculation with the attention mechanism for the target task, and then enters the fully connected layer to predict the target prediction task. The attention mechanism calculation of the target disease course development prediction model, as shown in Figure 8, is the same as the calculation method of the source disease course development prediction model. It will give a higher weight to the past time point information that has a greater impact on the prediction of the target disease course development at the next time point, so that the information that is helpful to the prediction task can have a greater influence, thereby increasing the outcome of the target disease prediction task.

4. Experimental Evaluation

This section will discuss the efficacy of the application of transfer learning in predicting the course development across diseases. It will first describe the setup of training data and the definition and evaluation of experimental results, and then explain the efficacy evaluation process.

4.1. Experimental Settings

This study uses transfer learning as the framework of the research method, so the training data setup for the source disease course prediction model and the target disease course prediction model will be explained separately. The course development prediction model of the source disease was trained using the disease course representation tensors created from the medical records of 724,155 PD patients extracted from National Health Insurance Research Database in Taiwan [24]. The disease course representation tensors of 30% of the patients were used as test data, and among the remaining 70% of patients (i.e., training dataset), 5-fold cross-validation were performed to train the prediction model. In other words, 20% of the patients in the training dataset were randomly extracted during every epoch training to serve as validation data to evaluate the performance of the model of each epoch and determine early stop. The batch size of the model training was set to 1000, the epoch was set to 50, the learning rate was

10^{- 4}

, and the optimizer used Adam.

The course development prediction model of the target disease was trained using the disease course representation tensors made from the medical records of 28,303 RA patients. The disease course representation tensors of 30% of the patients were used as test data, and among the remaining 70% of patients, the data of 20% of the patients were randomly extracted during every epoch training to serve as verification data to evaluate the performance of the model. Since the sample size of the target disease patients was small, the batch size was set to 300 in order to avoid the instability of the model training due to the batch size being too large. Because the use of transfer learning can effectively reduce the prediction task training of the target disease, the epoch was reduced to 20. The learning rate was

10^{- 4}

, and the optimizer used Adam.

4.2. Evaluation Matrics

The Loss Function of deep learning is also called cost function. It is used to evaluate the degree of difference between the predicted value of the model and the true value. Therefore, it has also become the objective function to be optimized in the neural network training process. The process of neural network training is the process of minimizing the loss function. The smaller the loss function is, the smaller the difference between the predicted value and the actual value, and thus the better the prediction model’s outcome.

The loss function can be roughly divided into two aspects: classification and regression. The main task of the model is to predict the relevant values of medication, so it belongs to the regression aspect. In this study, the mean square error (MSE), which is most commonly used in regression prediction, is used as the loss function during training, and the formal definition is as follows.

M S E = \frac{1}{m n} \sum_{j = 1}^{m} \sum_{k = 1}^{n} {(Y_{j, k} - {\hat{Y}}_{j, k})}^{2}

(7)

The calculation method is to find the mean square value of the difference between the predicted value (

Y_{i}

) and the actual value (

{\hat{Y}}_{i}

). The square in the formula is to avoid the negative sign of the subtracted value, which may affect the accumulated loss value. In the equation, m represents the number of patients and n represents the number of targets to be predicted. Mean square is calculated because the number of samples and predicted targets during model training is not 1, due to which the average of all samples must be calculated. Through Equation (7), it can be assumed that the closer the calculated MSE is to 0, the better the prediction outcome is. In this study, the evaluation function of the results also directly uses the loss function value during model training to evaluate the prediction outcome.

4.3. Experimental Results

This section will present and discuss the prediction outcome of the TL-CDC method proposed by the research on the course development of the target disease. As mentioned in the section on data preprocessing, we will aggregate a period of EHRs into a single vector so that the length of data sequence would be consistent. To do so, we utilize a parameter k to set the length of period of EHRs, which would be aggregated into a vector. For example, if we set k = 2, EHRs within two months would be aggregated. Therefore, data sequence length of EHRs with one year would be 6 (i.e., 12/2 = 6). Since a period that is too long or too short will cause a bias, the experimental length of period k is set as 2, 3, and 4 to compare the prediction of the target disease course to understand the effect of the sequence length on the prediction. The outcomes are divided into experimental comparisons of the TL-CDC method and external experimental comparisons with other models dealing with multivariate time series data.

4.3.1. Internal Experimental Results

This section will discuss the prediction outcome of this research method. It will first discuss the prediction effectiveness of transfer learning, and then explain the impact of the parameter freezing and fine-tuning methods of transfer learning on disease course prediction, and then compare the effectiveness of adding attention mechanism calculation to disease course prediction.

Figure 9, Figure 10 and Figure 11 show the time series models of the source disease course prediction model under various k. Meanwhile, “frozen” represents freezing the parameters trained by the time series model in the source disease course development prediction model, and then transferring the parameters to predict the target disease course development prediction model. This means that the parameters from the source disease course development prediction model will not participate in the back-propagation calculation of the target disease course development prediction task training.

Meanwhile, “retraining” indicates the training of weight initialization, which means that only the disease course representation matrix of the target disease is used for predictive analysis of disease course development. For the transferred parameters, we focus on the difference between frozen and fine-tuning in the prediction outcome of the target disease course prediction model. Regarding fine-tuning, this study will discuss the influence that 10% and 50% parameters of participating in back-propagation calculation will have on the prediction outcome. ‘10% Fine-tuning’ and ‘50% Fine-tuning’ represents the fine-tuning method with 10% and 50% parameters, respectively.

As shown in Figure 9, Figure 10 and Figure 11, it is found that using the target disease data to partially fine-tune the transferred parameters can effectively improve the prediction outcome, especially using parameter transfer on LSTM and Bi-LSTM. Irrespective of whether an attention mechanism was added to the prediction model of the source disease course development, the use of the fine-tuning method for parameter transfer can improve the outcome. The outcome of parameter transfer based on GRU with k = 3, 4 is not significantly improved. We believe that when the data content of the sequence increases and when fine-tuning training is performed for the target disease prediction task, GRU, a time series model with a simpler memory unit, cannot efficiently retain the parameters that are transferred from the source disease prediction model and that are also helpful for the target disease prediction task, resulting in no improvement in the GRU’s outcome with the fine-tuning method.

The prediction effect of adding the attention mechanism is compared in the following figures, in which Figure 12, Figure 13 and Figure 14 indicate the attention mechanism being transferred from source task to target task prediction, and Figure 15, Figure 16 and Figure 17 indicate target task without attention mechanism. Figure 12, Figure 13 and Figure 14 present the prediction outcome of the attention mechanism on the target task. Combining the parameter transfer with the attention mechanism for the target task can significantly improve the prediction outcome of the target task. From the perspective of the three time series models, the lowest MSE with k = 2 is calculated by adding the attention mechanism to the target task prediction, and the prediction outcome of LSTM is the best. The best prediction outcome when using the two time series models, Bi-LSTM and GRU, was achieved on the weight initialization model. The method proposed in this study can achieve the application of cross-disease prediction of disease course development while having a similar prediction outcome.

As shown in Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17, the lowest MSE at k = 3 was calculated when the parameters of the source model did not use the attention mechanism for the source task and were not trained with fine-tuning along with the attention mechanism calculation for the target task, and the loss was lower than that when using the weight initialization model, among which the prediction performance of LSTM is the best. The lowest MSE at k = 4 was calculated when the parameters of the target model did not join the attention mechanism for the target task and were not trained with fine-tuning for the target task prediction, and the loss was lower than that of using the weight initialization model, among which the prediction performance of LSTM is the best. From the above results, it can be concluded that data with different sequence lengths require different transferring methods, and the data with a sequence length of 3 have significant differences in the prediction outcomes with and without parameter transfer. When using different time series models for parameter transfer, the performances of transfer methods were also relatively consistent.

4.3.2. Comparison of TL-CDC with Other Time Series Models

Since there is no disease course development prediction model suitable for multivariate and long-term disease data in previous studies, this study will use a model that can handle multi-feature and long-term data to analyze and predict the disease course development matrix of the source disease, and then transfer the parameters trained by it to compare the prediction outcome and model training time between the method of this study and models of the same architecture that use the target disease course development matrix for analysis.

This section compares the prediction outcome of the best performing LSTM with that of traditional regression, TPA-LSTM [30] and DSANet [31]. As shown in Figure 18, Figure 19 and Figure 20, it is evident that the prediction outcomes of weight initialization using traditional linear regression, TPA-LSTM, and DSANet tended to be better than that seen using the parameter transfer method, which shows that neither a simple linear regression nor a complex model specifically developed to capture the time series relationship is suitable for analysis in a transfer learning framework. For the data of three different sequence lengths, the MSE of the method in this study was lower than that of the other methods, which means that it can effectively use the disease course data of two related but different diseases to perform cross-disease course prediction analysis under the transfer learning framework.

This study trained the model with k = 4 using graphic cards NVIDIA GeForce RTX 3090 Ti, 24GB G6X, and graphics card acceleration software CUDA11.2 and CUDNN8.1.1. Table 7 presents the comparison of model training time. The source disease course prediction representation used LSTM as the training time spent on the source disease course development prediction model. The target disease course prediction representation used the time spent after the source model parameters are transferred to the target model and the time spent on the development of the target disease course prediction model as the training time. As shown in Table 7, it is evident that the prediction outcome of TL-CDC is the best, and the training time of the model is much less than that of most models. In the comparison of the training time of this model, k = 4 is used for comparison, but when k is 2 or 3, the number of sequence data will increase exponentially, causing the training time of the model to increase exponentially. Considering hardware, time cost, and prediction accuracy, TL-CDC is the best option for the prediction of course development across diseases.

5. Conclusions

The main purpose of this study is to use the National Health Insurance Research Database in Taiwan [24] to build a cross-disease prediction model for predicting the disease course, which is achieved using the medical records of patients with chronic diseases. The aim is to assist doctors in estimating the disease course. According to the experimental results, it is evident that our proposed transfer learning framework, TL-CDC, has the advantage of short model training time, and medical data of different sequence lengths can all achieve good prediction results. We conduct a comprehensive experimental study based on a dataset collected from National Health Insurance Research Database in Taiwan. The results demonstrate that the effectiveness of our transfer learning approach outperforms state-of-the-art deep learning prediction models for disease course prediction. According to the experimental results, we find that LSTM achieve the best performance in most cases and using a sequence length of 3 or one season of data can achieve the most complete representation of the patients’ disease course. Furthermore, the attention mechanism added in this research method can improve the effect of prediction tasks and effectively increase the explanatory power of model training and results. This study successfully used medical records to establish a model for predicting the development of diseases across the disease course, so that diseases with a small amount of data can use the parameters of the prediction models of related but different diseases to establish their predictive models. Using the correlation between confirmed diseases and applying this relationship to cross-disease predictions not only helps achieve a good prediction outcome of the target disease, but also increases the utilization of medical data.

Author Contributions

Writing—review and editing, J.J.-C.Y.; writing—original draft preparation, Y.-T.C.; supervision, W.-C.C.; Investigation, H.-H.C.; project administration, H.-H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Taichung Veterans General Hospital grant number TCVGH-NCHU1107612.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the corresponding author on request. The data are not publicly available due to personal information protection used in the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

World Health Organization. WHO Reveals Leading Causes of Death and Disability Worldwide: 2000–2019. 9 December 2020. Available online: https://www.who.int/news/item/09-12-2020-who-reveals-leading-causes-of-death-and-disability-worldwide-2000-2019 (accessed on 11 May 2022).
Kwak, G.H.; Hui, P. DeepHealth: Review and challenges of artificial intelligence in health informatics. arXiv 2019, arXiv:1909.00384. [Google Scholar]
Jamshidi, M.B.; Lalbakhsh, A.; Talla, J.; Peroutka, Z.; Roshani, S.; Matousek, V.; Roshani, S.; Mirmozafari, M.; Malek, Z.; Spada, L.L.; et al. Deep learning techniques and COVID-19 drug discovery: Fundamentals, state-of-the-art and future directions. In Emerging Technologies during the Era of COVID-19 Pandemic; Springer: Cham, Switzerland, 2021; pp. 9–31. [Google Scholar]
Jamshidi, M.B.; Lalbakhsh, A.; Talla, J.; Peroutka, Z.; Hadjilooei, F.; Lalbakhsh, P.; Jamshidi, M.; La Spada, L.; Mirmozafari, M.; Dehghani, M.; et al. Artificial Intelligence and COVID-19: Deep Learning Approaches for Diagnosis and Treatment. IEEE Access 2020, 8, 109581–109595. [Google Scholar] [CrossRef] [PubMed]
Ravì, D.; Wong, C.; Deligianni, F.; Berthelot, M.; Andreu-Perez, J.; Lo, B.; Yang, G.Z. Deep learning for health informatics. IEEE J. Biomed. Health Inform. 2016, 21, 4–21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yang, Y.; Zheng, X.; Ji, C. Disease prediction model based on Bi-LSTM and attention mechanism. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 18–21 November 2019; pp. 1141–1148. [Google Scholar]
He, R.; Liu, Y.; Wang, K.; Zhao, N.; Yuan, Y.; Li, Q.; Zhang, H. Automatic cardiac arrhythmia classification using combination of deep residual network and bidirectional LSTM. IEEE Access 2019, 7, 102119–102135. [Google Scholar] [CrossRef]
Choi, E.; Bahadori, M.T.; Song, L.; Stewart, W.F.; Sun, J. GRAM: Graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 787–795. [Google Scholar]
Koziel, J.; Potempa, J. Pros and cons of causative association between periodontitis and rheumatoid arthritis. Periodontol. 2000 2022, 89, 83–98. [Google Scholar] [CrossRef] [PubMed]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems; Curran: Red Hook, NY, USA, 2015; pp. 3320–3328. [Google Scholar]
Eyben, F.; Weninger, F.; Squartini, S.; Schuller, B. Real-life voice activity detection with LSTM recurrent neural networks and an application to hollywood movies. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
Pascual, S.; Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation. In Proceedings of the IEEE 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 29 August–2 September 2016. [Google Scholar]
Kwon, H.; Kim, Y. BlindNet backdoor: Attack on deep neural network using blind watermark. Multimed. Tools Appl. 2022, 81, 6217–6234. [Google Scholar] [CrossRef]
Kwon, H. Multi-Model Selective Backdoor Attack with Different Trigger Positions. IEICE Trans. Inf. Syst. 2022, 105, 170–174. [Google Scholar] [CrossRef]
Kwon, H.; Yoon, H.; Choi, D. Data Correction For Enhancing Classification Accuracy By Unknown Deep Neural Network Classifiers. KSII Trans. Internet Inf. Syst. (TIIS) 2021, 15, 3243–3257. [Google Scholar]
Kwon, H. Defending Deep Neural Networks against Backdoor Attack by Using De-trigger Autoencoder. IEEE Access 2021. [Google Scholar] [CrossRef]
Shie, C.K.; Chuang, C.H.; Chou, C.N.; Wu, M.H.; Chang, E.Y. Transfer representation learning for medical image analysis. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 711–714. [Google Scholar]
Shin, H.C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018, 172, 1122–1131. [Google Scholar] [CrossRef] [PubMed]
Chouhan, V.; Singh, S.K.; Khamparia, A.; Gupta, D.; Tiwari, P.; Moreira, C.; Damaševičius, R.; de Albuquerque, V.H.C. A novel transfer learning based approach for pneumonia detection in chest X-ray images. Appl. Sci. 2020, 10, 559. [Google Scholar] [CrossRef] [Green Version]
Okada, M.; Kobayashi, T.; Ito, S.; Yokoyama, T.; Abe, A.; Murasawa, A.; Yoshie, H. Periodontal Treatment Decreases Levels of Antibodies to Porphyromonas Gingivalis and Citrulline in Patients with Rheumatoid Arthritis and Periodontitis. J. Periodontol. 2013, 84, 74–84. [Google Scholar] [CrossRef] [PubMed]
Erciyas, K.; Sezer, U.; Üstün, K.; Pehlivan, Y.; Kısacık, B.; Şenyurt, S.; Tarakçıoğlu, M.; Onat, A. Effects of periodontal therapy on disease activity and systemic inflammation in rheumatoid arthritis patients. Oral Dis. 2012, 19, 394–400. [Google Scholar] [CrossRef] [PubMed]
National Health Insurance Research Database in Taiwan. Available online: https://nhird.nhri.org.tw/en/ (accessed on 11 May 2022).
Choi, E.; Bahadori, M.T.; Schuetz, A.; Stewart, W.F.; Sun, J. Doctor AI: Predicting clinical events via recurrent neural networks. JMLR Workshop Conf. Proc. 2016, 56, 301–318. [Google Scholar] [PubMed]
Lee, J.Y.; Dernoncourt, F.; Szolovits, P. Transfer learning for named-entity recognition with neural networks. arXiv 2017, arXiv:1705.06273. [Google Scholar]
Desautels, T.; Calvert, J.; Hoffman, J.; Mao, Q.; Jay, M.; Fletcher, G.; Barton, C.; Chettipally, U.; Kerem, Y.; Das, R. Using transfer learning for improved mortality prediction in a data-scarce hospital setting. Biomed. Inform. Insights 2017, 9, 1178222617712994. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gupta, P.; Malhotra, P.; Narwariya, J.; Vig, L.; Shroff, G. Transfer learning for clinical time series analysis using deep neural networks. J. Healthc. Inform. Res. 2020, 4, 112–137. [Google Scholar] [CrossRef] [Green Version]
Malhotra, P.; Vishnu, T.V.; Vig, L.; Agarwal, P.; Shroff, G. TimeNet: Pre-trained deep recurrent neural network for time series classification. In Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 26–28 April 2017; pp. 607–612. [Google Scholar]
Shih, S.Y.; Sun, F.K.; Lee, H.Y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 2019, 108, 1421–1441. [Google Scholar] [CrossRef] [Green Version]
Huang, S.; Wang, D.; Wu, X.; Tang, A. Dsanet: Dual self-attention network for multivariate time series forecasting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2129–2132. [Google Scholar]

Figure 1. Schematic diagram of the training process of freezing and fine-tuning of the transfer learning method for image analysis.

Figure 2. Schematic diagram of medical records.

Figure 3. Data preprocessing flow chart.

Figure 4. Prediction Model of Source Disease Course Development.

Figure 5. Illustration of LSTM in our proposed framework.

Figure 6. Illustration of GRU in our proposed framework.

Figure 7. Illustration of Bi-LSTM in our proposed framework.

Figure 8. Prediction Model of Target Disease Course Development.

Figure 9. MSE of the source disease course prediction of TL-CDC under k = 2.

Figure 10. MSE of the source disease course prediction of TL-CDC under k = 3.

Figure 11. MSE of the source disease course prediction of TL-CDC under k = 4.

Figure 12. MSE of the target disease course prediction of TL-CDC with attention mechanism under k = 2.

Figure 13. MSE of the target disease course prediction of TL-CDC with attention mechanism under k = 3.

Figure 14. MSE of the target disease course prediction of TL-CDC with attention mechanism under k = 4.

Figure 15. MSE of the target disease course prediction of TL-CDC without attention mechanism under k = 2.

Figure 16. MSE of the target disease course prediction of TL-CDC without attention mechanism under k = 3.

Figure 17. MSE of the target disease course prediction of TL-CDC without attention mechanism under k = 4.

Figure 18. Comparison of TL-CDC and other time series models for predicting the course development of target disease under k = 2.

Figure 19. Comparison of TL-CDC and other time series models for predicting the course development of target disease under k = 3.

Figure 20. Comparison of TL-CDC and other time series models for predicting the course development of target disease under k = 4.

Table 1. Calculation method of PD development matrix representation.

TIME	PD_DAY	PD_DRUG	PD_DISPOSAL	PD_SCALING
1	0	0	0	1
2	7	1	0	0
3	3	1	0	0
⋮	⋮	⋮	⋮	⋮
144	3	1	0	0

Table 2. 1999–2010 medical records of the patient whose ID was 8f14.

ID	SEQ_NO	FUNC_DATE	…	DRUG_DAY	DRUG_NO	DRUG_USE
8f14	15669	19990111	…	7	A000080277	5
8f14	7	19990118	…		91001C
8f14	56	19990202	…	7	A013870	3
8f14	996	19990321	…		91003C

Table 3. The disease course matrix representation of the patient whose ID was 8f14 (one month is the unit of a time series data, k = 1).

TIME	PD_DAY	PD_DRUG	PD_DISPOSAL	PD_SCALING
1	7	1	1	0
2	0	0	0	0
3	0	0	0	1
⋮	⋮	⋮	⋮	⋮
144	14	2	1	0

Table 4. 1999–2010 medical records of patients whose ID was 791f.

ID	SEQ_NO	FUNC_DATE	…	DRUG_DAY	DRUG_NO	DRUG_USE
791f	159	19990105	…	5	B023615100	30
791f	66	19990116	…	2	B018142100	50
791f	355	19990220	…	7	AC45781100	100
791f	458	19990330	…		91003C

Table 5. The disease course matrix representation of the patient whose ID was 791f (one month is the unit of a time series data, k = 1).

TIME	1	2	3	…	144
PD_DAY	0	0	0	…	3
PD_DRUG	0	0	0	…	1
PD_DISPOSAL	0	0	0	…	0
PD_SCALING	0	0	1	…	0
RA_USE_ORAL	1.7	0.667	0	…	12.5867
RA_DRUG_ORAL	2	1	0	…	4
RA_USE_INJ	0	0	0	…	0
RA_DRUG_INJ	0	0	0	…	0

Table 6. The disease course matrix representation of the patient whose ID was 791f (k = 3).

TIME	1	2	…	48
PD_DAY	0	0	…	5
PD_DRUG	0	0	…	1
PD_DISPOSAL	0	0	…	0
PD_SCALING	1	0	…	2
RA_USE_ORAL	2.367	5.3	…	49.1734
RA_DRUG_ORAL	3	1	…	4
RA_USE_INJ	0	0	…	0
RA_DRUG_INJ	0	0	…	1

Table 7. Training Time of Prediction Model for Disease Course.

	Source Disease	Target Disease
TL-CDC (Ours)	19.158 min	1 min
Linear Regression	10.826 min	0.593 min
TPA-LSTM	99.607 min	3.82 min
DSANet	6320.214 min	118.655 min

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ying, J.J.-C.; Chang, Y.-T.; Chen, H.-H.; Chao, W.-C. Model Establishment of Cross-Disease Course Prediction Using Transfer Learning. Appl. Sci. 2022, 12, 4907. https://doi.org/10.3390/app12104907

AMA Style

Ying JJ-C, Chang Y-T, Chen H-H, Chao W-C. Model Establishment of Cross-Disease Course Prediction Using Transfer Learning. Applied Sciences. 2022; 12(10):4907. https://doi.org/10.3390/app12104907

Chicago/Turabian Style

Ying, Josh Jia-Ching, Yen-Ting Chang, Hsin-Hua Chen, and Wen-Cheng Chao. 2022. "Model Establishment of Cross-Disease Course Prediction Using Transfer Learning" Applied Sciences 12, no. 10: 4907. https://doi.org/10.3390/app12104907

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model Establishment of Cross-Disease Course Prediction Using Transfer Learning

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Datasets

3.1.1. Source Domain Data

3.1.2. Target Domain Data

3.2. Data Preprocessing

3.2.1. Source Domain Data Preprocessing Method

3.2.2. Target Domain Data Preprocessing Method

3.2.3. Matrix Representation of Disease Course

3.3. TL-CDC

3.3.1. Prediction Model of Course Development of Source Disease

Time Series Model

Attention Mechanism

3.3.2. Prediction Model of Target Disease Course Development

4. Experimental Evaluation

4.1. Experimental Settings

4.2. Evaluation Matrics

4.3. Experimental Results

4.3.1. Internal Experimental Results

4.3.2. Comparison of TL-CDC with Other Time Series Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI