Research

Jump to: Other

15 pages, 867 KiB

Open AccessArticle

An Efficient Document Retrieval for Korean Open-Domain Question Answering Based on ColBERT

by Byungha Kang, Yeonghwa Kim and Youhyun Shin

Appl. Sci. 2023, 13(24), 13177; https://doi.org/10.3390/app132413177 - 12 Dec 2023

Cited by 1 | Viewed by 1057

Open-domain question answering requires the task of retrieving documents with high relevance to the query from a large-scale corpus. Deep learning-based dense retrieval methods have become the primary approach for finding related documents. Although deep learning-based methods have improved search accuracy compared to [...] Read more.

Open-domain question answering requires the task of retrieving documents with high relevance to the query from a large-scale corpus. Deep learning-based dense retrieval methods have become the primary approach for finding related documents. Although deep learning-based methods have improved search accuracy compared to traditional techniques, they simultaneously impose a considerable increase in computational burden. Consequently, research on efficient models and methods that optimize the trade-off between search accuracy and time to alleviate computational demands is required. In this paper, we propose a Korean document retrieval method utilizing ColBERT’s late interaction paradigm to efficiently calculate the relevance between questions and documents. For open-domain Korean question answering document retrieval, we construct a Korean dataset using various corpora from AI-Hub. We conduct experiments comparing the search accuracy and inference time among the traditional IR (information retrieval) model BM25, the dense retrieval approach utilizing BERT-based models for Korean, and our proposed method. The experimental results demonstrate that our approach achieves a higher accuracy than BM25 and requires less search time than the dense retrieval method employing KoBERT. Moreover, the most outstanding performance is observed when using KoSBERT, a pre-trained Korean language model that learned to position semantically similar sentences closely in vector space. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

14 pages, 6671 KiB

Open AccessArticle

Neural Machine Translation Research on Syntactic Information Fusion Based on the Field of Electrical Engineering

by Yanna Sang, Yuan Chen and Juwei Zhang

Appl. Sci. 2023, 13(23), 12905; https://doi.org/10.3390/app132312905 - 01 Dec 2023

Viewed by 732

Journal Menu

Journal Browser

Natural Language Processing (NLP) and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (79 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI