# Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement Learning

^{*}

## Abstract

**:**

## 1. Introduction

- 1.
- We solve the extractive QA task with an encoder-decoder model that generates all answer words jointly, enabling the model to use more information from the answers for training and to naturally output entire answers in the inference.
- 2.
- The proposed encoder-decoder extractive QA model uses evaluation-based reinforcement learning to enhance the model’s performance. The experiment results show that the proposed model can achieve better results than the baseline.

## 2. Background and Related Work

#### 2.1. Extractive Question Answering

#### 2.1.1. Independent Assumption for the Start and End Positions

#### 2.1.2. Greedy Search in the Multistep Decomposition

#### 2.1.3. Neural Network Models Used for Prediction

#### 2.2. Encoder-Decoder Models

#### 2.3. Reinforcement Learning for Encoder-Decoder Models

## 3. Methods

#### 3.1. Modeling the Whole Answer Span Using the Encoder-Decoder Model

#### 3.2. Constrained Decoding

Algorithm 1: Constrained Decoding. | |

Input: Question Q and Document D, Vocabulary $\mathcal{V}$Input: The $\mathrm{decoder}$, Trie tree T and its functions: $\mathrm{add}(T,\dots )$, $\mathrm{search}(T,\dots )$Output: Answer $\tilde{A}$ | |

1: $\tilde{A}\leftarrow \left\{\right\}$, $T\leftarrow \mathrm{\Phi}$, $i\leftarrow 0$, ${\tilde{y}}_{0}\leftarrow <start>$ | |

2: for $k\leftarrow 1$ to $\left|D\right|$ do | ▷ Initialize the trie tree T |

3: $\mathrm{add}(T,\{{d}_{k},{d}_{k+1},\dots ,{d}_{\left|D\right|}\})$ | ▷ Add a substring that starts with ${d}_{k}$ into trie tree T. |

4: end for | |

5: while ${\tilde{y}}_{i}$≠ <$end$> do | |

6: ${\mathcal{V}}_{\mathrm{c}}\leftarrow \{<end>\}$ | ▷ Initialize the constrained vocabulary |

7: $\mathcal{P}\leftarrow \mathrm{search}(T,\tilde{A})$ | ▷ Obtain the substring starting with $\tilde{A}$ |

8: foreach $\{{p}_{1},{p}_{2},\dots \}\in \mathcal{P}$ do | ▷ Loop over each substring in D starting with $\tilde{A}$ |

9: $P=\{{p}_{1},{p}_{2},\dots \}-\tilde{A}$ | ▷ Remove the prefix $\tilde{A}$ from substring $\{{p}_{1},{p}_{2},\dots \}$ |

10: ${\mathcal{V}}_{\mathrm{c}}\leftarrow {\mathcal{V}}_{\mathrm{c}}+{P}_{\left[1\right]}$ | ▷ Add the first token ${P}_{\left[1\right]}$ in P into ${\mathcal{V}}_{\mathrm{c}}$ |

11: end for | |

12: ${\tilde{y}}_{i}=\underset{w\in {\mathcal{V}}_{\mathrm{c}}}{\mathrm{argmax}}\left(\mathrm{decoder}\left(w\mid {\tilde{y}}_{1},{\tilde{y}}_{2},\dots ,{\tilde{y}}_{i-1}\right)\right).$ | |

13: $\tilde{A}\leftarrow \tilde{A}+{\tilde{y}}_{i}$ | ▷ Save the predicted words |

14: end while | |

15: return $\tilde{A}$ |

#### 3.3. Evaluation-Based Reinforcement Learning

## 4. Results and Discussion

#### 4.1. Experiment Settings

#### 4.2. Main Results

- 1.
**BiDAF [12]:**a classical extractive QA model that uses bidirectional attention flow (question-to-document and document-to-question attention) to enrich the representation of words. BiDAF predicts the answers’ start and end positions independently according to the representations.- 2.
- 3.
**DCN [63]:**locates the answer spans by iteratively predicting the start and end positions to overcome the initial local maxima, which may lead to the wrong answers.- 4.
**DCN+ [50]:**introduces reinforcement learning techniques to optimize the F1 metric for extractive QA directly.- 5.
**R.M-Reader [51]:**a memory-based model that uses reinforcement learning with a reward function refined for better coverage.- 6.
**BERT-****base****[8]:**an extractive QA model based on a powerful pretrained language model. We downloaded the model from https://huggingface.co/csarron/bert-base-uncased-squad-v1/tree/main (accessed on 20 February 2023) and evaluated it locally.- 7.
**BERT-****base****\w compound (best) [14]:**jointly predicts the start and end positions. It is similar to Model 2:**BiDAF\w compound (best)**.- 8.
**BART-base:**directly trains a BART-base model to generate the whole answer based on the question and answer.

No. | Model | EM | F1 | #Out of Document |
---|---|---|---|---|

1 | BiDAF [12] | 66.16 | 76.19 | 0 |

2 | \w compound (best) [14] | 66.96 | 75.90 | 0 |

3 | DCN [63] | 65.4 | 75.6 | 0 |

4 | DCN+ [50] | 74.5 | 83.1 | 0 |

5 | R.M-Reader [51] | 78.9 | 86.3 | 0 |

6 | BERT-base [8] | 80.92 | 88.24 | 0 |

7 | \w compound (best) [14] | 81.83 | 88.52 | 0 |

8 | BART-base | 78.10 | 87.17 | 410 |

9 | BART-base\w Constrained | 79.80 | 88.05 | 0 |

10 | RL\w EM&F1 | 78.37 | 87.87 | 329 |

11 | RL\w EM&F1 Constrained | 79.84 | 88.39 | 0 |

12 | RL\w F1 | 78.83 | 88.04 | 310 |

13 | RL\w F1 Constrained | 80.02 | 88.54 | 0 |

14 | RL\w ROUGE-L | 78.27 | 87.44 | 304 |

15 | RL\w ROUGE-L Constrained | 79.39 | 87.97 | 0 |

#### 4.3. Case Study and Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Gupta, P.; Gupta, V. A Survey of Text Question Answering Techniques. Int. J. Comput. Appl.
**2012**, 53, 1–8. [Google Scholar] [CrossRef] - Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S.R. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 353–355. [Google Scholar] [CrossRef]
- Mitra, B.; Craswell, N. An Introduction to Neural Information Retrieval. Found. Trends Inf. Retr.
**2018**, 13, 1–126. [Google Scholar] [CrossRef] - Bowman, S.R.; Angeli, G.; Potts, C.; Manning, C.D. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 632–642. [Google Scholar] [CrossRef]
- Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; Steinhardt, J. Measuring Massive Multitask Language Understanding. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Fan, A.; Jernite, Y.; Perez, E.; Grangier, D.; Weston, J.; Auli, M. ELI5: Long Form Question Answering. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019; Volume 1: Long Papers, pp. 3558–3567. [Google Scholar] [CrossRef]
- Wang, L.; Zheng, K.; Qian, L.; Li, S. A Survey of Extractive Question Answering. In Proceedings of the 2022 International Conference on High Performance Big Data and Intelligent Systems (HDIS), Tianjin, China, 10–11 December 2022; pp. 147–153. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), pp. 4171–4186. [Google Scholar] [CrossRef]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv
**2019**, arXiv:1907.11692. [Google Scholar] - Yamada, I.; Asai, A.; Shindo, H.; Takeda, H.; Matsumoto, Y. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020; pp. 6442–6454. [Google Scholar] [CrossRef]
- Seo, M.J.; Kembhavi, A.; Farhadi, A.; Hajishirzi, H. Bidirectional Attention Flow for Machine Comprehension. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Yu, A.W.; Dohan, D.; Luong, M.; Zhao, R.; Chen, K.; Norouzi, M.; Le, Q.V. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Fajcik, M.; Jon, J.; Smrz, P. Rethinking the Objectives of Extractive Question Answering. In Proceedings of the 3rd Workshop on Machine Reading for Question Answering, Online, 10 November 2021; pp. 14–27. [Google Scholar] [CrossRef]
- Chen, D. Neural Reading Comprehension and Beyond. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2018. [Google Scholar]
- Liu, S.; Zhang, X.; Zhang, S.; Wang, H.; Zhang, W. Neural Machine Reading Comprehension: Methods and Trends. Appl. Sci.
**2019**, 9, 3698. [Google Scholar] [CrossRef] - Lee, K.; Kwiatkowski, T.; Parikh, A.P.; Das, D. Learning Recurrent Span Representations for Extractive Question Answering. arXiv
**2016**, arXiv:1611.01436. [Google Scholar] - Lee, J.; Sung, M.; Kang, J.; Chen, D. Learning Dense Representations of Phrases at Scale. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 5–6 August 2021; Volume 1: Long Papers. [Google Scholar] [CrossRef]
- Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.G.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 5754–5764. [Google Scholar]
- Clark, K.; Luong, M.; Le, Q.V.; Manning, C.D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Dahlmeier, D.; Ng, H.T. A Beam-Search Decoder for Grammatical Error Correction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 568–578. [Google Scholar]
- Wang, S.; Jiang, J. Machine Comprehension Using Match-LSTM and Answer Pointer. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Richard, M.D.; Lippmann, R.P. Neural Network Classifiers Estimate Bayesian a posteriori Probabilities. Neural Comput.
**1991**, 3, 461–483. [Google Scholar] [CrossRef] [PubMed] - Rosenblatt, F. Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. Technical Report. 1961. Available online: https://apps.dtic.mil/sti/pdfs/AD0256582.pdf (accessed on 20 February 2023). [CrossRef]
- Salehinejad, H.; Baarbe, J.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent Advances in Recurrent Neural Networks. arXiv
**2018**, arXiv:1801.01078. [Google Scholar] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst.
**2022**, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed] - Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional Sequence to Sequence Learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 1243–1252. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]
- Cho, K.; van Merrienboer, B.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
- Wu, Y.; Wu, W.; Yang, D.; Xu, C.; Li, Z. Neural Response Generation With Dynamic Vocabularies. Proc. Aaai Conf. Artif. Intell.
**2018**, 32, 5594–5601. [Google Scholar] [CrossRef] - Li, S.; Sun, C.; Xu, Z.; Tiwari, P.; Liu, B.; Gupta, D.; Shankar, K.; Ji, Z.; Wang, M. Toward Explainable Dialogue System Using Two-stage Response Generation. ACM Trans. Asian-Low-Resour. Lang. Inf. Process.
**2023**, 22, 1–18. [Google Scholar] [CrossRef] - He, X.; Haffari, G.; Norouzi, M. Sequence to Sequence Mixture Model for Diverse Machine Translation. In Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium, 31 October–1 November 2018; pp. 583–592. [Google Scholar] [CrossRef]
- Neubig, G. Neural Machine Translation and Sequence-to-sequence Models: A Tutorial. arXiv
**2017**, arXiv:1703.01619. [Google Scholar] - Nallapati, R.; Zhou, B.; dos Santos, C.N.; Gülçehre, Ç.; Xiang, B. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 11–12 August 2016; pp. 280–290. [Google Scholar] [CrossRef]
- See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 1073–1083. [Google Scholar] [CrossRef]
- Xie, X.; Zhang, N.; Li, Z.; Deng, S.; Chen, H.; Xiong, F.; Chen, M.; Chen, H. From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer. In Proceedings of the Companion of The Web Conference 2022, Virtual Event/Lyon, France, 25–29 April 2022; pp. 162–165. [Google Scholar] [CrossRef]
- Ye, H.; Zhang, N.; Chen, H.; Chen, H. Generative Knowledge Graph Construction: A Review. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 1–17. [Google Scholar]
- Pascual, D.; Egressy, B.; Bolli, F.; Wattenhofer, R. Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation. arXiv
**2020**, arXiv:2012.15416. [Google Scholar] - Vijayakumar, A.K.; Cogswell, M.; Selvaraju, R.R.; Sun, Q.; Lee, S.; Crandall, D.J.; Batra, D. Diverse Beam Search for Improved Description of Complex Scenes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 7371–7379. [Google Scholar]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction. IEEE Trans. Neural Netw.
**1998**, 9, 1054. [Google Scholar] [CrossRef] - Keneshloo, Y.; Shi, T.; Ramakrishnan, N.; Reddy, C.K. Deep Reinforcement Learning for Sequence-to-Sequence Models. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 31, 2469–2489. [Google Scholar] [CrossRef] [PubMed] - Papineni, K.; Roukos, S.; Ward, T.; Zhu, W. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar] [CrossRef]
- Lin, C.Y.; Hovy, E. Manual and automatic evaluation of summaries. In Proceedings of the ACL-02 Workshop on Automatic Summarization, Philadelphia, USA, 11–12 July 2002. [Google Scholar] [CrossRef]
- Li, J.; Galley, M.; Brockett, C.; Gao, J.; Dolan, B. A Diversity-Promoting Objective Function for Neural Conversation Models. In Proceedings of the NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 110–119. [Google Scholar] [CrossRef]
- Ranzato, M.; Chopra, S.; Auli, M.; Zaremba, W. Sequence Level Training with Recurrent Neural Networks. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Chen, Y.; Bansal, M. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018; Volume 1: Long Papers, pp. 675–686. [Google Scholar] [CrossRef]
- Li, X.; Huang, Z.; Liu, F.; Wang, C.; Hu, M.; Xu, S.; Peng, Y. RAD: Reinforced Attention Decoder Model On Question Generation. In Proceedings of the 2020 International Joint Conference on Neural Networks, IJCNN 2020, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Xiong, C.; Zhong, V.; Socher, R. DCN+: Mixed Objective And Deep Residual Coattention for Question Answering. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Hu, M.; Peng, Y.; Huang, Z.; Qiu, X.; Wei, F.; Zhou, M. Reinforced Mnemonic Reader for Machine Reading Comprehension. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 13–19 July 2018; pp. 4099–4106. [Google Scholar] [CrossRef]
- Saxena, A.; Kochsiek, A.; Gemulla, R. Sequence-to-Sequence Knowledge Graph Completion and Question Answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Dublin, Ireland, 22–27 May 2022; Volume 1: Long Papers, pp. 2814–2828. [Google Scholar] [CrossRef]
- Ngai, E.W.; Lee, M.C.; Luo, M.; Chan, P.S.; Liang, T. An intelligent knowledge-based chatbot for customer service. Electron. Commer. Res. Appl.
**2021**, 50, 101098. [Google Scholar] [CrossRef] - Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Hokamp, C.; Liu, Q. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 1535–1546. [Google Scholar] [CrossRef]
- Post, M.; Vilar, D. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, LA, USA, 1–6 June 2018; Volume 1 (Long Papers), pp. 1314–1324. [Google Scholar] [CrossRef]
- McClellan, M.T.; Minker, J.; Knuth, D.E. The Art of Computer Programming, Vol. 3: Sorting and Searching. Math. Comput.
**1974**, 28, 1175. [Google Scholar] [CrossRef] - Jankowski, R. Advanced data structures by Peter Brass Cambridge University Press 2008. ACM SIGACT News
**2010**, 41, 19–20. [Google Scholar] [CrossRef] - Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. 2019. Available online: https://paperswithcode.com/paper/language-models-are-unsupervised-multitask (accessed on 20 February 2023).
- Maier, D. The Complexity of Some Problems on Subsequences and Supersequences. J. ACM
**1978**, 25, 322–336. [Google Scholar] [CrossRef] - Bergroth, L.; Hakonen, H.; Raita, T. A Survey of Longest Common Subsequence Algorithms. In Proceedings of the Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, A Coruña, Spain, 27–29 September 2000; pp. 39–48. [Google Scholar] [CrossRef]
- Xiong, C.; Zhong, V.; Socher, R. Dynamic Coattention Networks For Question Answering. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]

**Figure 1.**An extractive question-answering system takes a question and a document as the input and extracts a span from the document as the output answer.

**Figure 2.**An encoder-decoder model autoregressively generates an output word sequence based on the input. <$start$> and <$end$> are the special tokens representing the generation’s start and end.

**Figure 3.**The comparison between the proposed model and a baseline model. The input question is “Which NFL team represented the NFC at Super Bowl 50?”. The answer is “Santa Clara California”. (

**a**) The encoder-decoder model is used to solve the extractive QA task, taking advantage of all words in the answer. (

**b**) The baseline extractive QA models use the start and end words only.

**Figure 4.**The average length of the answers predicted by the models. (

**a**) The numbers of words in the answers. (

**b**) The numbers of characters in the answers.

**Figure 5.**The F1 scores of the predictions as the answer gets longer. (

**a**) Grouped by numbers of words. (

**b**) Grouped by numbers of characters.

**Table 1.**Samples from the SQuAD dataset. An extractive QA model needs to understand the natural language question and the evidence in the document to find the answer span from the document ${}^{*}$.

NO. | Question | Document | Answer |
---|---|---|---|

1 | Which NFL team represented the AFC at Super Bowl 50? | Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title... | Denver Broncos |

2 | Who was in charge of the papal army in the War of Barbastro? | The legendary religious zeal of the Normans was exercised in religious wars long before the First Crusade carved out a Norman principality in Antioch. They were major foreign participants in the Reconquista in Iberia. In 1018, Roger de Tosny traveled to the Iberian Peninsula to carve out a state for himself from Moorish lands, but he failed. In 1064, during the War of Barbastro, William of Montreuil led the papal army... | William of Montreuil |

Hyperparameter | Value | Description |
---|---|---|

Batch size | 32 | Number of Samples in each Batch |

Learning Rate (LR) | $5\times {10}^{-5}$ | Coefficient for updating the parameters |

LR scheduler | Linear warmup | Tune the LR as the training step increases ${}^{1}$ |

LR warmup steps | 500 | The parameter for LR scheduler |

Optimizer | AdamW | Adamw optimizer provided by Pytorch ${}^{2}$ |

Weight Decay | 0.01 | Coefficient for scaling the parameters down |

Betas | 0.9, 0.999 | Coefficients used for computing running averages of gradient and its square ${}^{3}$ |

k | 4 | Sampled sequences in Equation (14) |

beam size | 4 | The number of beams for the beam search |

^{1}https://huggingface.co/docs/transformers/v4.26.1/en/main_classes/optimizer_schedules#transformers.get_linear_schedule_with_warmup, accessed on 20 February 2023.

^{2}https://pytorch.org/, accessed on 20 February 2023.

^{3}https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html, accessed on 20 February 2023.

Model | EM | F1 | Beam Size |
---|---|---|---|

BART-base | 77.83 | 87.15 | 1 |

78.10 | 87.17 | 4 | |

77.98 | 87.18 | 16 | |

BART-base RL\w EM&F1 | 77.92 | 87.51 | 1 |

78.37 | 87.87 | 4 | |

78.09 | 87.81 | 16 | |

BART-base RL\w F1 | 78.43 | 87.77 | 1 |

78.83 | 88.04 | 4 | |

78.61 | 88.01 | 16 | |

BART-base RL\w ROUGE-L | 77.77 | 87.08 | 1 |

78.27 | 87.44 | 4 | |

78.14 | 87.41 | 16 |

**Table 5.**A case study for BERT-base (Baseline) and BART-base RL\w F1 (Our Model) on SQuAD dataset ${}^{*}$.

NO. | Question | Document | Predictions |
---|---|---|---|

1 | Who did the Normans team up with in Anatolia? | Some Normans joined Turkish forces to aid in the destruction of the Armenians vassal-states of Sassoun and Taron in far eastern Anatolia. Later, many took up service with... | Baseline: Armenians |

Our Model: Turkish forces | |||

2 | What month, day, and year did the Super Bowl 50 take place? | Super Bowl 50 was an American football game used to determine the champion of the National Football League... The game was played on 7 February 2016 at Leviś Stadium in the... | Baseline: February |

Our Model: 7 February 2016 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, S.; Sun, C.; Liu, B.; Liu, Y.; Ji, Z. Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement Learning. *Mathematics* **2023**, *11*, 1624.
https://doi.org/10.3390/math11071624

**AMA Style**

Li S, Sun C, Liu B, Liu Y, Ji Z. Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement Learning. *Mathematics*. 2023; 11(7):1624.
https://doi.org/10.3390/math11071624

**Chicago/Turabian Style**

Li, Shaobo, Chengjie Sun, Bingquan Liu, Yuanchao Liu, and Zhenzhou Ji. 2023. "Modeling Extractive Question Answering Using Encoder-Decoder Models with Constrained Decoding and Evaluation-Based Reinforcement Learning" *Mathematics* 11, no. 7: 1624.
https://doi.org/10.3390/math11071624