# Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition

^{*}

## Abstract

**:**

_{1}value. Compared with the single-task model, the generalization ability of the model has been improved.

## 1. Introduction

- A multi-task learning model based on ALBERT-BILSTM is proposed for intent classification and named entity recognition of Chinese online medical questions.
- The experimental results demonstrate that the proposed method in this paper outperforms the benchmark methods and improves the model generalization ability compared to the single-task model.

## 2. Related Work

#### 2.1. Medical Named Entity Recognition

#### 2.2. Intent Classification

#### 2.3. Multi-Task Learning

## 3. Methodology and Model

#### 3.1. ALBERT Pre-Trained Language Model

_{1}, E

_{2,}…E

_{N}, which represents each character in the sequence, and after the training of multi-layer bidirectional Transformer encoder, finally we obtain the text feature vector T

_{1}, T

_{2}, …T

_{N}.

**W**is the matrix of additional weights, which can compress the dimension of the spliced matrix to the length of the sequence.

^{o}**Q**,

**K**,

**V**denotes the query, key and value vectors of each word in the input sequence and ${W}_{i}^{Q},{W}_{i}^{K},{W}_{i}^{V}$ are the weights of

**Q**,

**K**,

**V**respectively matrix.

**d**denotes the dimension of query and key vectors of each word.

_{k}#### 3.2. BILSTM Module

**c**, the input gate

_{t}**i**, the output gate

_{t}**o**, and the forget gate

_{t}**f**. In the LSTM, a gate is a method of selectively passing information through, using specially designed “gates” to introduce or remove information from the cell state

_{t}**c**. The LSTM computes an output vector based on the current input and the output of the previous cell, which is then used as the input to the next cell. The calculation formula is as follows.

_{t}**x**is the cell input at time t and

_{t}**w**,

**b**denotes the weight matrix and bias vector of the input gate, forget gate, and output gate. ${\tilde{c}}_{t}$ is the intermediate state obtained from the current input. tanh is the hyperbolic tangent function.

**c**represents the state at moment t, and

_{t}**h**is the output at moment t.

_{t}#### 3.3. Decoding Unit

**y**

_{i,j}denotes the true probability that the ith word belongs to the jth label, and takes the value of 0 or 1. ${\widehat{y}}_{i,j}$ is the model prediction value derived from Equation (11).

**y**is the corresponding label sequence of sentence

**s**and

**p**is the conditional probability of

**y**given

**s**and

**θ**. Assuming that ${S}_{\theta}(x,y)$ is the score of the sentence label sequence y, the conditional probability p can be calculated using the normalization of ${S}_{\theta}(x,y)$. To take advantage of the dependencies between adjacent labels, the model combines the transfer probability matrix

**T**and the emission probability matrix

**E**to calculate the score of the label sequence ${S}_{\theta}(x,y)$, as shown in Equation (14).

**x**with label

_{t}**y**and ${T}_{{y}_{t-1},{y}_{t}}$ is the probability of word

_{t}**x**with label

_{t}_{−1}**y**followed by word

_{t}_{−1}**x**with label

_{t}**y**. We can find the best sequence of labels for the input sentences by maximizing the log-likelihood over all training sets

_{t}**D**by dynamic programming and maximizing the score by using the Viterbi algorithm.

#### 3.4. Multi-Task Learning Step

**D**and

_{C}**D**. In order to allow both tasks to learn simultaneously, the model alternates between the intent classification and named entity recognition tasks during the training period to “approximate” simultaneous learning. The loss and optimization functions for the entity recognition and intent classification tasks are independent during alternate learning. The intent classification task uses the standard cross-entropy loss as the loss function, and multi-task learning is performed by alternating calls to each task optimizer with the two tasks using the Adam optimization function to learn the parameters of the multi-task model. This means that we can continuously transfer some information from each task to the other task, which is achieved through a shared layer. At each iteration, a task is randomly selected and then some random training samples are chosen from this task to compute the gradient and update the parameters. The exact procedure of the alternating training phase in multitask learning is shown in Algorithm 1.

_{N}Algorithm 1 Multi-task learning training process |

Input: two task datasets D_{C} and D_{N} |

Batch size K for each task |

Maximum number of iterations T, learning rates α and β |

Random initialization parameter θ_{0} |

for t = 1 ⋯ T do |

/*Prepare the data for both tasks*/ |

Randomly divide D_{C} and D_{N} into small batch sets |

BC = {J_{C,1}, …, J_{C,n}} |

BN = {J_{N,1}, …, J_{N,m}} |

end |

Merge all small batch samples B′ = B_{C} ∪ B_{N} |

Random sorting B′ |

for each J $\in $ B′ do |

Calculate the loss L(θ) on the small batch sample |

/* calculate only the loss of J on the corresponding task */ |

Update the parameters: ${\theta}_{t}\leftarrow {\theta}_{t-1}-\alpha \cdot {\nabla}_{\theta}L(\theta )$ |

end |

end |

## 4. Experiments and Results Analysis

#### 4.1. Dataset

#### 4.2. Evaluation Metrics

_{1}are used to evaluate the effectiveness of the model. The precision rate P is the proportion of correctly predicted samples among all samples with positive predictions; the recall rate R is the proportion of correctly predicted samples among all samples with true positive predictions; and F

_{1}is the summed mean of the precision rate and recall rate. The calculation formula is as follows.

#### 4.3. Experimental Environment and Parameter Settings

#### 4.4. Benchmark

#### 4.5. Experimental Results and Analysis

_{1}values, recall and accuracy. We chose BILSTM [4] and BILSTM-CRF [5] as benchmark methods. We also compared with the ALBERT model. In addition, to evaluate whether a multi-task learning strategy can improve the performance, we tested the performance of the model using a single-task learning model ALBERT-BILSTM-CRF. The experimental results are shown in Table 2.

_{1}value, precision and recall metrics. We found that the BILSTM-CRF model has a higher F

_{1}value compared with the BLS TM model and the BILSTM model has a stronger advantage for the sequence annotation task due to its memory function. The CRF model makes full use of the relationship of adjacent tags based on the BILSTM model to optimize the optimal splicing of the whole sequence. ALBERT-BILSTM-CRF model has a higher F

_{1}value compared with the BILSTM-CRF model and ALBERT-CRF model F

_{1}values are 3.63% and 0.59% higher, which can better express the semantic information of words because the word vectors generated by the ALBERT pre-trained language model are contextually relevant. In this paper, the multi-task learning model is also trained on the ALBERT embedding layer and BILSTM encoding layer during the intent classification training and our method can effectively improve the named entity recognition effect compared with the ALBERT-BILSTM-CRF model without multi-task learning strategy.

_{1}value as shown in Figure 4. The F

_{1}values of our model are higher than other models in all three entity types, which validates the effectiveness of our method.

_{1}values, precision and recall to evaluate our model. We used CNN [13], BILSTM and ALBERT [21] as benchmark methods, and we also compared with the single-task model for intent recognition, ALBERT-BILSTM, and the experimental results are shown in Table 3.

_{1}value, precision and recall in intent classification task. In addition, our multi-task learning model shows a significant improvement over the ALBERT-BILSTM single-task model, which means that the multi-task learning approach significantly enhances the intent classification capability. The performance improvement of intent classification is more pronounced than named entity recognition, and the F

_{1}value of the intent classification task is about 2% higher than that of the ALBERT-BILSTM model using a single-task learning strategy. Intent classification is a less complex task in that it only needs to generate labels for the entire sentence unlike the named entity recognition task which generates labels for each word. The model can learn more semantic information in medical named entity labeling, using the entity recognition ability trained by the named entity recognition task, which explains to some extent the significant improvement in the intention classification task. In multi-task learning, each task can “selectively” use the hidden features learned in other tasks to improve its capabilities.

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Gerner, M.; Nenadic, G.; Bergman, C.M. LINNAEUS: A species name identification system for biomedical literature. BMC Bioinform.
**2010**, 11, 85. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Fukuda, K.I.; Tsunoda, T.; Tamura, A.; Takagi, T. Toward information extraction: Identifying protein names from biological papers. Pac. Symp. Biocomput.
**1998**, 707, 707–718. [Google Scholar] - He, L.; Yang, Z.; Lin, H.; Li, Y. Drug name recognition in biomedical texts: A machine-learning-based method. Drug Discov. Today
**2014**, 19, 610–617. [Google Scholar] [CrossRef] [PubMed] - Chen, Y.; Zhou, C.; Li, T.; Wu, H.; Zhao, X.; Ye, K.; Liao, J. Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training. J. Biomed. Inform.
**2019**, 96, 103252. [Google Scholar] [CrossRef] [PubMed] - Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tag-grog. arXiv
**2015**, arXiv:1508.01991. [Google Scholar] - Yang, P.; Yang, Z.; Luo, L.; Lin, H.; Wang, J. An attention-based approach for chemical compound and drug named entity recognition. J. Comput. Res. Dev.
**2018**, 55, 1548–1556. [Google Scholar] - Li, L.; Guo, Y. Biomedical named entity recognition with CNN-BILSTM-CRF. J. Chin. Inf. Process.
**2018**, 32, 116–122. [Google Scholar] - Su, Y.; Liu, J.; Huang, Y. Entity Recognition Research in Online Medical Texts. Acta Sci. Nat. Univ. Pekin.
**2016**, 52, 1–9. [Google Scholar] - Qin, Q.; Zhao, S.; Liu, C. A BERT-BiGRU-CRF Model for Entity Recognition of Chinese Electronic Medical Records. Complexity
**2021**, 2021, 6631837. [Google Scholar] [CrossRef] - Ji, B.; Li, S.; Yu, J.; Ma, J.; Tang, J.; Wu, Q.; Tan, Y.; Liu, H.; Ji, Y. Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models. J. Biomed. Inform.
**2020**, 104, 103395. [Google Scholar] [CrossRef] - Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning--based text classification: A comprehensive review. ACM Comput. Surv. (CSUR)
**2021**, 54, 1–40. [Google Scholar] [CrossRef] - Ravuri, S.; Stolcke, A. A comparative study of recurrent neural network models for lexical domain classification. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, Shanghai, China, 20–25 March 2016; pp. 6075–6079. [Google Scholar]
- Zhang, S.; Grave, E.; Sklar, E.; Elhadad, N. Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks. J. Biomed. Inform.
**2017**, 69, 1–9. [Google Scholar] [CrossRef] [Green Version] - Yao, L.; Mao, C.; Luo, Y. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med. Inform. Decis. Mak.
**2019**, 19, 31–39. [Google Scholar] [CrossRef] [Green Version] - Jang, B.; Kim, M.; Harerimana, G.; Kang, S.U.; Kim, J.W. Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci.
**2020**, 10, 5841. [Google Scholar] [CrossRef] - Zhang, Q.; Yuan, Q.; Lv, P.; Zhang, M.; Lv, L. Research on Medical Text Classification Based on Improved Capsule Network. Electronics
**2022**, 11, 2229. [Google Scholar] [CrossRef] - Zaib, M.; Sheng, Q.Z.; Emma Zhang, W. A short survey of pre-trained language models for conversational ai-a new age in nlp. In Proceedings of the Australasian Computer Science Week Multiconference, Melbourne, VIC, Australia, 4–6 February 2020; pp. 1–4. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
- Song, Z.; Xie, Y.; Huang, W.; Wang, H. Classification of traditional chinese medicine cases based on character-level bert and deep learning. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 1383–1387. [Google Scholar]
- Yao, L.; Jin, Z.; Mao, C.; Zhang, Y.; Luo, Y. Traditional Chinese medicine clinical records classification with BERT and domain specific corpora. J. Am. Med. Inform. Assoc.
**2019**, 26, 1632–1636. [Google Scholar] [CrossRef] - Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv
**2019**, arXiv:1909.11942. [Google Scholar] - Zhang, Z.; Jin, L. Clinical short text classification method based on ALBERT and GAT. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp. 401–404. [Google Scholar]
- Yang, Q.; Shang, L. Multi-task learning with bidirectional language models for text classification. In Proceedings of the International Joint Conference on Neural Network (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Ruder, S. An overview of multi-task learning in deep neural networks. arXiv
**2017**, arXiv:1706.05098. [Google Scholar] - Liu, P.; Qiu, X.; Huang, X. Recurrent neural network for text classification with multi-task learning. arXiv
**2016**, arXiv:1605.05101. [Google Scholar] - Wu, Q.; Peng, D. MTL-BERT: A Multi-task Learning Model Utilizing Bert for Chinese Text. J. Chin. Comput. Syst.
**2021**, 42, 291–296. [Google Scholar] - Chowdhury, S.; Dong, X.; Qian, L.; Li, X.; Guan, Y.; Yang, J.; Yu, Q. A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. BMC Bioinform.
**2018**, 19, 75–84. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zhao, S.; Liu, T.; Zhao, S.; Wang, F. A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In Proceedings of the AAAI Conference on Artificial Intelligence, Budapest, Hungary, 27 January–1 February 2019; Volume 33, pp. 817–824. [Google Scholar] [CrossRef] [Green Version]
- Peng, Y.; Chen, Q.; Lu, Z. An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining. BioNLP
**2020**, 2020, 205. [Google Scholar] - Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 5753–5763. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv
**2014**, arXiv:1409.0473. [Google Scholar] - Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv
**2016**, arXiv:1607.06450. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short memory. Neural Comput.
**2014**, 9, 1735–1780. [Google Scholar] [CrossRef] - Zhang, S.; Zhang, X.; Wang, H.; Cheng, J.; Li, P.; Ding, Z. Chinese medical question answer matching using end-to-end character-level multi-scale CNNs. Appl. Sci.
**2017**, 7, 767. [Google Scholar] [CrossRef] - Chen, N.; Su, X.; Liu, T.; Hao, Q.; Wei, M. A benchmark dataset and case study for Chinese medical question intent classification. BMC Med. Inform. Decis. Mak.
**2020**, 20, 125. [Google Scholar] [CrossRef]

Category | Quantity |
---|---|

Definition | 963 |

Symptom | 1215 |

Causes | 2177 |

Prevention | 1205 |

Treatment | 2506 |

Indications | 1572 |

Complications | 879 |

Method | P | R | F_{1} |
---|---|---|---|

BILSTM | 0.7515 | 0.7432 | 0.7473 |

BILSTM-CRF | 0.7578 | 0.7641 | 0.7606 |

ALBERT-CRF | 0.7869 | 0.7954 | 0.7911 |

ALBERT-BILSTM-CRF | 0.7926 | 0.8014 | 0.7970 |

MTL-ALBERT-BILSTM | 0.8103 | 0.8036 | 0.8069 |

Method | P | R | F_{1} |
---|---|---|---|

CNN | 0.8315 | 0.7921 | 0.8113 |

BILSTM | 0.8377 | 0.8086 | 0.8229 |

ALBERT | 0.8582 | 0.8391 | 0.8485 |

ALBERT-BILSTM | 0.8634 | 0.8473 | 0.8553 |

MTL-ALBERT-BILSTM | 0.8842 | 0.8654 | 0.8747 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tohti, T.; Abdurxit, M.; Hamdulla, A.
Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition. *Information* **2022**, *13*, 581.
https://doi.org/10.3390/info13120581

**AMA Style**

Tohti T, Abdurxit M, Hamdulla A.
Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition. *Information*. 2022; 13(12):581.
https://doi.org/10.3390/info13120581

**Chicago/Turabian Style**

Tohti, Turdi, Mamatjan Abdurxit, and Askar Hamdulla.
2022. "Medical QA Oriented Multi-Task Learning Model for Question Intent Classification and Named Entity Recognition" *Information* 13, no. 12: 581.
https://doi.org/10.3390/info13120581