Advanced Natural Language Processing and Machine Translation

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (25 November 2022) | Viewed by 14402

Special Issue Editor


E-Mail Website
Guest Editor
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China
Interests: information retrieval and text mining; interpretability and analysis of models for NLP; language model; machine learning for NLP; question answering; resources and evaluation; semantics and syntax parsing; speech and multimodality; text generation; machine translation systems and deployment; analysis of machine translation models and approaches; evaluation of machine translation quality; machine translation quality estimation; corpora and other resources for machine translation; natural language processing for machine translation

Special Issue Information

Dear Colleagues,

Natural language processing (NLP), also known as computational linguistics, is an interdisciplinary subject in computer science and linguistics. It is a branch of artificial intelligence involved in data mining, machine learning, knowledge acquisition, knowledge engineering, and linguistic research related to language computing. The difficulty of this field lies in the diversity, ambiguity, robustness, knowledge dependence, and context of language. The rise of natural language processing is closely related to machine translation (MT), which refers to the use of computers to automatically translate one language into another. This field is increasingly becoming a hot research topic due to its significant potential as a disruptive technology. It confronts various existing language barriers in innovative ways, striving to enable effective communication and translation across different languages by applying different approaches, technologies, and solutions. However, since NLP and MT systems depend on large data sets and computer power, numerous issues remain unsolved.

In this Special Issue, original and unpublished works presenting results in any way related to NLP or MT are welcome, especially those that include experimental and methodological novel solutions, system implementation approaches, new data sets and resources, natural language processing techniques and tools, hybrid solutions, technology combination and integration, incorporation of linguistic knowledge and other digital resources, translation quality evaluation and estimation, post-editing efforts and strategies, and other ways of tackling existing problems within the field of NLP. Nevertheless, submissions with a strong theoretical contribution are also encouraged.

Prof. Dr. Zhengtao Yu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • topics of interest include, but are not limited to: information retrieval and text mining
  • interpretability and analysis of models for NLP
  • language model
  • machine learning for NLP
  • question answering
  • resources and evaluation
  • semantics and syntax parsing
  • speech and multimodality
  • text generation
  • machine translation systems and deployment
  • analysis of machine translation models and approaches
  • evaluation of machine translation quality
  • machine translation quality estimation
  • corpora and other resources for machine translation
  • natural language processing for machine translation

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 388 KiB  
Article
Using Natural Language Processing to Analyze Political Party Manifestos from New Zealand
by Salomon Orellana and Halil Bisgin
Information 2023, 14(3), 152; https://doi.org/10.3390/info14030152 - 01 Mar 2023
Cited by 3 | Viewed by 3211
Abstract
This study explores how natural language processing (NLP) can supplement content analyses of political documents, particularly the manifestos of political parties. NLP is particularly useful for tasks such as: estimating the similarity between documents, identifying the topics discussed in documents (topic modeling), and [...] Read more.
This study explores how natural language processing (NLP) can supplement content analyses of political documents, particularly the manifestos of political parties. NLP is particularly useful for tasks such as: estimating the similarity between documents, identifying the topics discussed in documents (topic modeling), and sentiment analysis. This study applies each of these techniques to the study of political party manifestos. Document similarity may be used to gain some insight into the way parties change over time and which political parties are successful at bringing attention to their policy agenda. Categorizing text into topics may help objectively categorize and visualize the ideas political parties are discussing. Finally, sentiment analysis has the potential to show each political party’s attitude towards a policy area/topic. This study specifically applies these techniques to the manifestos produced by the political parties of New Zealand, from 1987 to 2017 (a period of significant party system change in New Zealand). It finds that NLP techniques provide valuable insights, although there is a need for significant fine-tuning. Full article
(This article belongs to the Special Issue Advanced Natural Language Processing and Machine Translation)
Show Figures

Figure 1

19 pages, 1038 KiB  
Article
Testing the Effectiveness of the Diagnostic Probing Paradigm on Italian Treebanks
by Alessio Miaschi, Chiara Alzetta, Dominique Brunato, Felice Dell’Orletta and Giulia Venturi
Information 2023, 14(3), 144; https://doi.org/10.3390/info14030144 - 22 Feb 2023
Viewed by 1239
Abstract
The outstanding performance recently reached by neural language models (NLMs) across many natural language processing (NLP) tasks has steered the debate towards understanding whether NLMs implicitly learn linguistic competence. Probes, i.e., supervised models trained using NLM representations to predict linguistic properties, are frequently [...] Read more.
The outstanding performance recently reached by neural language models (NLMs) across many natural language processing (NLP) tasks has steered the debate towards understanding whether NLMs implicitly learn linguistic competence. Probes, i.e., supervised models trained using NLM representations to predict linguistic properties, are frequently adopted to investigate this issue. However, it is still questioned if probing classification tasks really enable such investigation or if they simply hint at surface patterns in the data. This work contributes to this debate by presenting an approach to assessing the effectiveness of a suite of probing tasks aimed at testing the linguistic knowledge implicitly encoded by one of the most prominent NLMs, BERT. To this aim, we compared the performance of probes when predicting gold and automatically altered values of a set of linguistic features. Our experiments were performed on Italian and were evaluated across BERT’s layers and for sentences with different lengths. As a general result, we observed higher performance in the prediction of gold values, thus suggesting that the probing model is sensitive to the distortion of feature values. However, our experiments also showed that the length of a sentence is a highly influential factor that is able to confound the probing model’s predictions. Full article
(This article belongs to the Special Issue Advanced Natural Language Processing and Machine Translation)
Show Figures

Figure 1

14 pages, 2329 KiB  
Article
Abstractive Summary of Public Opinion News Based on Element Graph Attention
by Yuxin Huang, Shukai Hou, Gang Li and Zhengtao Yu
Information 2023, 14(2), 97; https://doi.org/10.3390/info14020097 - 06 Feb 2023
Cited by 1 | Viewed by 1200
Abstract
The summary of case–public opinion refers to the generation of case-related sentences from public opinion information related to judicial cases. Case–public opinion news refers to the judicial cases (intentional homicide, rape, etc.) that cause large public opinion. The public opinion news in these [...] Read more.
The summary of case–public opinion refers to the generation of case-related sentences from public opinion information related to judicial cases. Case–public opinion news refers to the judicial cases (intentional homicide, rape, etc.) that cause large public opinion. The public opinion news in these cases usually contains case element information such as the suspect, victim, time, place, process, and sentencing of the case. In the multi-document summary of case–public opinion, due to the problem of information cross and information redundancy between different documents under the same case, in order to generate a concise and smooth summary, this paper proposes an abstractive summary model of case–public opinion based on the attention of a case element diagram. Firstly, multiple public opinion documents in the same case are split into paragraphs, and then the paragraphs and case elements are coded based on the transformer method to construct a heterogeneous graph containing paragraph nodes and case element nodes. Finally, in the decoding process, the two-layer attention mechanism is applied to the case element node and paragraph node, so that the model can effectively solve the redundancy problem in summary generation. Full article
(This article belongs to the Special Issue Advanced Natural Language Processing and Machine Translation)
Show Figures

Figure 1

16 pages, 2507 KiB  
Article
Semantic Features-Based Discourse Analysis Using Deceptive and Real Text Reviews
by Husam M. Alawadh, Amerah Alabrah, Talha Meraj and Hafiz Tayyab Rauf
Information 2023, 14(1), 34; https://doi.org/10.3390/info14010034 - 06 Jan 2023
Cited by 3 | Viewed by 2435
Abstract
Social media usage for news, feedback on services, and even shopping is increasing. Hotel services, food cleanliness and staff behavior are also discussed online. Hotels are reviewed by the public via comments on their websites and social media accounts. This assists potential customers [...] Read more.
Social media usage for news, feedback on services, and even shopping is increasing. Hotel services, food cleanliness and staff behavior are also discussed online. Hotels are reviewed by the public via comments on their websites and social media accounts. This assists potential customers before they book the services of a hotel, but it also creates an opportunity for abuse. Scammers leave deceptive reviews regarding services they never received, or inject fake promotions or fake feedback to lower the ranking of competitors. These malicious attacks will only increase in the future and will become a serious problem not only for merchants but also for hotel customers. To rectify the problem, many artificial intelligence–based studies have performed discourse analysis on reviews to validate their genuineness. However, it is still a challenge to find a precise, robust, and deployable automated solution to perform discourse analysis. A credibility check via discourse analysis would help create a safer social media environment. The proposed study is conducted to perform discourse analysis on fake and real reviews automatically. It uses a dataset of real hotel reviews, containing both positive and negative reviews. Under investigation is the hypothesis that strong, fact-based, realistic words are used in truthful reviews, whereas deceptive reviews lack coherent, structural context. Therefore, frequency weight–based and semantically aware features were used in the proposed study, and a comparative analysis was performed. The semantically aware features have shown strength against the current study hypothesis. Further, holdout and k-fold methods were applied for validation of the proposed methods. The final results indicate that semantically aware features inspire more confidence to detect deception in text. Full article
(This article belongs to the Special Issue Advanced Natural Language Processing and Machine Translation)
Show Figures

Figure 1

28 pages, 441 KiB  
Article
A Comparative Study of Machine Learning and Deep Learning Techniques for Fake News Detection
by Jawaher Alghamdi, Yuqing Lin and Suhuai Luo
Information 2022, 13(12), 576; https://doi.org/10.3390/info13120576 - 12 Dec 2022
Cited by 13 | Viewed by 5426
Abstract
Efforts have been dedicated by researchers in the field of natural language processing (NLP) to detecting and combating fake news using an assortment of machine learning (ML) and deep learning (DL) techniques. In this paper, a review of the existing studies is conducted [...] Read more.
Efforts have been dedicated by researchers in the field of natural language processing (NLP) to detecting and combating fake news using an assortment of machine learning (ML) and deep learning (DL) techniques. In this paper, a review of the existing studies is conducted to understand and curtail the dissemination of fake news. Specifically, we conducted a benchmark study using a wide range of (1) classical ML algorithms such as logistic regression (LR), support vector machines (SVM), decision tree (DT), naive Bayes (NB), random forest (RF), XGBoost (XGB) and an ensemble learning method of such algorithms, (2) advanced ML algorithms such as convolutional neural networks (CNNs), bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent units (BiGRU), CNN-BiLSTM, CNN-BiGRU and a hybrid approach of such techniques and (3) DL transformer-based models such as BERTbase and RoBERTabase. The experiments are carried out using different pretrained word embedding methods across four well-known real-world fake news datasets—LIAR, PolitiFact, GossipCop and COVID-19—to examine the performance of different techniques across various datasets. Furthermore, a comparison is made between context-independent embedding methods (e.g., GloVe) and the effectiveness of BERTbase—contextualised representations in detecting fake news. Compared with the state of the art’s results across the used datasets, we achieve better results by solely relying on news text. We hope this study can provide useful insights for researchers working on fake news detection. Full article
(This article belongs to the Special Issue Advanced Natural Language Processing and Machine Translation)
Show Figures

Figure 1

Back to TopTop