Machine Translation for Conquering Language Barriers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 July 2024 | Viewed by 19769

Special Issue Editor


E-Mail Website
Guest Editor
Department of Information and Communication Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 10000 Zagreb, Croatia
Interests: natural language processing; machine translation; machine learning; data science

Special Issue Information

Dear colleagues,

The MDPI journal Information invites submissions to a Special Issue on “Machine Translation for Conquering Language Barriers”.

Machine translation is increasingly becoming a hot research topic due to its potential of becoming one of the most significant disruptive technologies today. It battles various aspects of existing language barriers in challenging ways, thus striving to enable effective communication and transferring of meaning across different languages by applying different approaches, technologies, and solutions. The idea behind machine translation is to automate the process of translation within the contemporary translation workflow in order to respond to the overwhelming quantity of data sets that need to be translated in the best possible way, with special emphasis on speed and quality. However, since machine translation systems depend on large data sets and computer power, numerous issues remain open, especially when it comes to less spoken and under-resourced languages.

In this Special Issue, original and unpublished works with results in any way related to machine translation and linked areas are welcome, especially those that include experimental and methodological aspects on novel solutions, system implementation approaches, new data sets and resources, natural language processing techniques and tools, hybrid solutions, technology combination and integration, computer-assisted translation and impact on quality and productivity, various types of user studies, incorporation of linguistic knowledge and other digital resources, translation quality evaluation and estimation, post-editing efforts and strategies, ethical and legal issues, but also other concerns that deal with tackling existing language barriers and problems within the field of machine translation. Nevertheless, submissions with a strong theoretical contribution are also desirable.

 Topics of interest include but are not limited to:

  • Machine translation systems and deployment;
  • Analysis of machine translation models and approaches;
  • Evaluation of machine translation quality;
  • Machine translation quality estimation;
  • Corpora and other resources for machine translation;
  • Natural language processing for machine translation;
  • Language technologies for machine translation;
  • Linguistic knowledge in machine translation;
  • Machine translation application in various fields.

Dr. Ivan Dunđer
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Machine translation systems and deployment
  • Analysis of machine translation models and approaches
  • Evaluation of machine translation quality
  • Machine translation quality estimation
  • Corpora and other resources for machine translation
  • Language technologies for machine translation

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 1246 KiB  
Article
adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds
by Séamus Lankford, Haithem Afli and Andy Way
Information 2023, 14(12), 638; https://doi.org/10.3390/info14120638 - 29 Nov 2023
Cited by 1 | Viewed by 4626
Abstract
The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively [...] Read more.
The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing. Despite the exciting potential of this technology, its impact on developing high-quality Machine Translation (MT) outputs for low-resource languages remains relatively under-explored. Furthermore, an open-source application, dedicated to both fine-tuning MLLMs and managing the complete MT workflow for low-resources languages, remains unavailable. We aim to address these imbalances through the development of adaptMLLM, which streamlines all processes involved in the fine-tuning of MLLMs for MT. This open-source application is tailored for developers, translators, and users who are engaged in MT. It is particularly useful for newcomers to the field, as it significantly streamlines the configuration of the development environment. An intuitive interface allows for easy customisation of hyperparameters, and the application offers a range of metrics for model evaluation and the capability to deploy models as a translation service directly within the application. As a multilingual tool, we used adaptMLLM to fine-tune models for two low-resource language pairs: English to Irish (EN GA) and English to Marathi (ENMR). Compared with baselines from the LoResMT2021 Shared Task, the adaptMLLM system demonstrated significant improvements. In the ENGA direction, an improvement of 5.2 BLEU points was observed and an increase of 40.5 BLEU points was recorded in the GAEN direction representing relative improvements of 14% and 117%, respectively. Significant improvements in the translation performance of the ENMR pair were also observed notably in the MREN direction with an increase of 21.3 BLEU points which corresponds to a relative improvement of 68%. Finally, a fine-grained human evaluation of the MLLM output on the ENGA pair was conducted using the Multidimensional Quality Metrics and Scalar Quality Metrics error taxonomies. The application and models are freely available. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Graphical abstract

18 pages, 973 KiB  
Article
Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based Platform
by Rafał Jaworski, Sanja Seljan and Ivan Dunđer
Information 2023, 14(4), 226; https://doi.org/10.3390/info14040226 - 06 Apr 2023
Viewed by 1773
Abstract
Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, [...] Read more.
Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, support for some languages is still limited. In this paper, the authors present a framework for collecting, organizing, and storing corpora. The solution was originally designed to obtain data for less-resourced languages, but it proved to work very well for the collection of high-value domain-specific corpora. The scenario is based on the collective work of a group of people who are motivated by the means of gamification. The rules of the game motivate the participants to submit large resources, and a peer-review process ensures quality. More than four million translated segments have been collected so far. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

21 pages, 2963 KiB  
Article
LiST: A Lightweight Framework for Continuous Indian Sign Language Translation
by Amrutha K, Prabu P and Ramesh Chandra Poonia
Information 2023, 14(2), 79; https://doi.org/10.3390/info14020079 - 29 Jan 2023
Cited by 4 | Viewed by 2518
Abstract
Sign language is a natural, structured, and complete form of communication to exchange information. Non-verbal communicators, also referred to as hearing impaired and hard of hearing (HI&HH), consider sign language an elemental mode of communication to convey information. As this language is less [...] Read more.
Sign language is a natural, structured, and complete form of communication to exchange information. Non-verbal communicators, also referred to as hearing impaired and hard of hearing (HI&HH), consider sign language an elemental mode of communication to convey information. As this language is less familiar among a large percentage of the human population, an automatic sign language translator that can act as an interpreter and remove the language barrier is mandatory. The advent of deep learning has resulted in the availability of several sign language translation (SLT) models. However, SLT models are complex, resulting in increased latency in language translation. Furthermore, SLT models consider only hand gestures for further processing, which might lead to the misinterpretation of ambiguous sign language words. In this paper, we propose a lightweight SLT framework, LiST (Lightweight Sign language Translation), that simultaneously considers multiple modalities, such as hand gestures, facial expressions, and hand orientation, from an Indian sign video. The Inception V3 architecture handles the features associated with different signer modalities, resulting in the generation of a feature map, which is processed by a two-layered (long short-term memory) (LSTM) architecture. This sequence helps in sentence-by-sentence recognition and in the translation of sign language into text and audio. The model was tested with continuous Indian Sign Language (ISL) sentences taken from the INCLUDE dataset. The experimental results show that the LiST framework achieved a high translation accuracy of 91.2% and a prediction accuracy of 95.9% while maintaining a low word-level translation error compared to other existing models. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

22 pages, 314 KiB  
Article
Post-Editese in Literary Translations
by Sheila Castilho and Natália Resende
Information 2022, 13(2), 66; https://doi.org/10.3390/info13020066 - 28 Jan 2022
Cited by 11 | Viewed by 3349
Abstract
In the present study, we investigated the post-editese phenomenon, i.e., the unique features that set machine translated post-edited texts apart from human-translated texts. We used two literary texts, namely, the English children’s novel by Lewis Carroll Alice’s Adventures in Wonderland (AW) and Paula [...] Read more.
In the present study, we investigated the post-editese phenomenon, i.e., the unique features that set machine translated post-edited texts apart from human-translated texts. We used two literary texts, namely, the English children’s novel by Lewis Carroll Alice’s Adventures in Wonderland (AW) and Paula Hawkins’ popular book The Girl on the Train (TGOTT). Both literary texts were Google translated from English into Brazilian Portuguese to investigate whether the post-editese features can be found on the surface of the post-edited (PE) texts. In addition, we examined how the features found in the PE texts differ from the features encountered in the human-translated (HT) and machine translation (MT) versions of the same source text. Results revealed evidence for post-editese for TGOTT only with PE versions being more similar to the MT output than to the HT texts. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
14 pages, 1208 KiB  
Article
Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index
by Angelina Gašpar, Sanja Seljan and Vlasta Kučiš
Information 2022, 13(2), 43; https://doi.org/10.3390/info13020043 - 18 Jan 2022
Cited by 4 | Viewed by 3497
Abstract
Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in [...] Read more.
Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in parallel corpora for the evaluation of translated terminology. This research was conducted on three types of legal domain subcorpora, dating from different periods: the Croatian-English parallel corpus (1991–2009), Latin-English and Latin-Croatian versions of the Code of Canon Law (1983), and English and Croatian versions of the EU legislation (2013). After the terminology extraction process, validation of term candidates was performed, followed by an evaluation. Terminology consistency was measured using the HHI—a commonly accepted measurement of market concentration. Results show that the HHI can be used for measuring terminology consistency to improve information transfer and message understanding. In translation settings, the process shows the need for quality management solutions. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

33 pages, 1378 KiB  
Article
Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation
by Arda Tezcan and Bram Bulté
Information 2022, 13(1), 19; https://doi.org/10.3390/info13010019 - 04 Jan 2022
Cited by 3 | Viewed by 2543
Abstract
Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by [...] Read more.
Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

Back to TopTop