Advances in Machine Translation for Low-Resource Languages and Domains

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: 30 September 2024 | Viewed by 7007

Special Issue Editor


E-Mail Website
Guest Editor
Department of Ambient Intelligence and Interactive Systems (DIASI), French Alternative Energies and Atomic Energy Commission, Paris and Gif-sur-Yvette, France Computer Science Department, Paris-Saclay University, 91190 Paris, France
Interests: machine translation; natural language processing; deep learning; machine learning

Special Issue Information

Dear Colleagues,

In the context of a globalizing world, accurate Machine Translation (MT) systems are becoming indispensable engines for breaking the language barriers between countries. MT systems alongside the Internet are becoming major solutions that companies will rely on for promoting their products across borders, enabling them to get in touch with customers and understand their feedback (sentiment, opinion, etc.) regardless of their native language. However, building an accurate MT system for any pair of languages requires substantial resources and knowledge for modelling both languages. For instance, many years of expert work are required to add a new language pair to a rule-based MT system, which is, despite the long history of this approach, one reason why at present only a very limited number of language pairs are covered, and these tend to comprise only the most common languages. On the other hand, there are Statistical MT (SMT) systems (of which the example-based systems are a variant) and Neural MT (NMT) systems, which try to learn how to translate by analyzing the translation patterns found in large collections of human translations. As the statistical and neural algorithms used in these systems are largely language-independent, they can be quickly adapted to new language pairs. The amount of research that has been devoted to Statistical MT and Neural MT has led to some important achievements and improvements for certain pairs of languages. However, the current state of MT systems for low-resource languages and domains has not reached the required quality in order to be used at a large scale. Indeed, the creation of MT systems is more complex as (1) the usage and meanings of words are adapted and modified in the language of specialized domains and genres, and (2) languages evolve over time—new topics and disciplines require the creation or borrowing (e.g., from English) of new terms, with other terms becoming obsolete. In addition, statistical MT and neural MT do not work well for morphologically rich languages, unless the amount of training data is very large.
The objective of this Special Issue is to promote research and discussion, as well as reflect on, the latest advances and findings especially related to the use of advanced deep neural models to address neural machine translation issues. This Special Issue welcomes researchers and practitioners from industry and academia to contribute original research work developed using recent technologies such as Deep Learning and Artificial Intelligence to handle machine translation for low-resource languages and domains.

Topics include but are not limited to:

  • General research on Machine Translation (MT).
  • Transfer-learning techniques for low-resourced languages MT (use of multilingual, pre-trained models, unsupervised, semi-supervised, zero-shot, few-shot training, etc.).
  • MT for morphologically rich languages.
  • MT for low resource languages.
  • MT for specialized domains.
  • Measuring MT quality.
  • Taking multiword expressions into account in MT.
  • Semantics-based MT.
  • Real time MT
  • Hybrid approaches for MT

Dr. Nasredine Semmar
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 1627 KiB  
Article
Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems
by Jungha Son and Boyoung Kim
Information 2023, 14(10), 574; https://doi.org/10.3390/info14100574 - 19 Oct 2023
Cited by 3 | Viewed by 4957
Abstract
The rapid global expansion of ChatGPT, which plays a crucial role in interactive knowledge sharing and translation, underscores the importance of comparative performance assessments in artificial intelligence (AI) technology. This study concentrated on this crucial issue by exploring and contrasting the translation performances [...] Read more.
The rapid global expansion of ChatGPT, which plays a crucial role in interactive knowledge sharing and translation, underscores the importance of comparative performance assessments in artificial intelligence (AI) technology. This study concentrated on this crucial issue by exploring and contrasting the translation performances of large language models (LLMs) and neural machine translation (NMT) systems. For this aim, the APIs of Google Translate, Microsoft Translator, and OpenAI’s ChatGPT were utilized, leveraging parallel corpora from the Workshop on Machine Translation (WMT) 2018 and 2020 benchmarks. By applying recognized evaluation metrics such as BLEU, chrF, and TER, a comprehensive performance analysis across a variety of language pairs, translation directions, and reference token sizes was conducted. The findings reveal that while Google Translate and Microsoft Translator generally surpass ChatGPT in terms of their BLEU, chrF, and TER scores, ChatGPT exhibits superior performance in specific language pairs. Translations from non-English to English consistently yielded better results across all three systems compared with translations from English to non-English. Significantly, an improvement in translation system performance was observed as the token size increased, hinting at the potential benefits of training models on larger token sizes. Full article
Show Figures

Figure 1

11 pages, 1428 KiB  
Article
Chinese–Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion
by Yonghua Wen, Junjun Guo, Zhiqiang Yu and Zhengtao Yu
Information 2023, 14(5), 298; https://doi.org/10.3390/info14050298 - 21 May 2023
Cited by 1 | Viewed by 1450
Abstract
Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a lack of large-scale parallel data. The objective [...] Read more.
Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a lack of large-scale parallel data. The objective of pseudo-parallel sentence extraction is to automatically identify sentence pairs in different languages that convey similar meanings. Earlier methods heavily relied on parallel data, which is unsuitable for low-resource scenarios. The current mainstream research direction is to use transfer learning or unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models; however, these methods are ineffective for languages with substantial differences. To address this issue, we propose a sentence extraction method that leverages image information fusion to extract Chinese–Vietnamese pseudo-parallel sentences from collections of bilingual texts. Our method first employs an adaptive image and text feature fusion strategy to efficiently extract the bilingual parallel sentence pair, and then, a multimodal fusion method is presented to balance the information between the image and text modalities. The experiments on multiple benchmarks show that our method achieves promising results compared to a competitive baseline by infusing additional external image information. Full article
Show Figures

Figure 1

Back to TopTop