Recent Trends and Advances in the Natural Language Processing

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 15 October 2024 | Viewed by 5153

Special Issue Editors


E-Mail Website
Guest Editor
Computer Science Department, Xi’an Jiaotong University, Xi’an 710049, China
Interests: natural language processing; text mining; semantic web

E-Mail Website
Guest Editor
Computer Science Department, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Interests: natural language processing; text mining; computer vision

E-Mail Website
Guest Editor
Computer Science Department, Xi’an Jiaotong University, Xi’an 710049, China
Interests: machine learning; statistical learning theory; information theory

Special Issue Information

Dear Colleagues,

Natural language processing (NLP) is one of the most exciting areas of artificial intelligence. The rapid growth of social media and digital articles creates significant challenges in analyzing vast user data to generate insights. Furthermore, interactive automation systems such as chatbots are unable to fully replace humans, due to their lack of understanding of semantics and context. As unstructured data grow, NLP techniques are evolving to better understand the nuances, contexts, and ambiguity of human language. Novel technologies have been developed to meet the various requirements in NLP intelligent systems. This Special Issue offers a timely collection of original contributions of works to benefit the researchers and practitioners in the research fields of new trends and applications in natural language processing. The Special Issue focuses on the use and exploration of new technologies (see keywords) for NLP-related tasks, including (but not limited to) information extraction, information retrieval, sentiment analysis, machine translation, text summarization, and dialogue systems.

Prof. Dr. Chen Li
Prof. Dr. Jun Liu
Dr. Tieliang Gong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • language model
  • few-shot learning for NLP
  • reinforcement learning for NLP
  • lifelong learning for NLP
  • graph learning for NLP
  • multilingual NLP
  • multimodel NLP
  • reinforcement learning for NLP
  • intelligent applications with NLP

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

26 pages, 2339 KiB  
Article
Switching Self-Attention Text Classification Model with Innovative Reverse Positional Encoding for Right-to-Left Languages: A Focus on Arabic Dialects
by Laith H. Baniata and Sangwoo Kang
Mathematics 2024, 12(6), 865; https://doi.org/10.3390/math12060865 - 15 Mar 2024
Viewed by 712
Abstract
Transformer models have emerged as frontrunners in the field of natural language processing, primarily due to their adept use of self-attention mechanisms to grasp the semantic linkages between words in sequences. Despite their strengths, these models often face challenges in single-task learning scenarios, [...] Read more.
Transformer models have emerged as frontrunners in the field of natural language processing, primarily due to their adept use of self-attention mechanisms to grasp the semantic linkages between words in sequences. Despite their strengths, these models often face challenges in single-task learning scenarios, particularly when it comes to delivering top-notch performance and crafting strong latent feature representations. This challenge is more pronounced in the context of smaller datasets and is particularly acute for under-resourced languages such as Arabic. In light of these challenges, this study introduces a novel methodology for text classification of Arabic texts. This method harnesses the newly developed Reverse Positional Encoding (RPE) technique. It adopts an inductive-transfer learning (ITL) framework combined with a switching self-attention shared encoder, thereby increasing the model’s adaptability and improving its sentence representation accuracy. The integration of Mixture of Experts (MoE) and RPE techniques empowers the model to process longer sequences more effectively. This enhancement is notably beneficial for Arabic text classification, adeptly supporting both the intricate five-point and the simpler ternary classification tasks. The empirical evidence points to its outstanding performance, achieving accuracy rates of 87.20% for the HARD dataset, 72.17% for the BRAD dataset, and 86.89% for the LABR dataset, as evidenced by the assessments conducted on these datasets. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

20 pages, 897 KiB  
Article
Finite State Automata on Multi-Word Units for Efficient Text-Mining
by Alberto Postiglione
Mathematics 2024, 12(4), 506; https://doi.org/10.3390/math12040506 - 06 Feb 2024
Cited by 1 | Viewed by 672
Abstract
Text mining is crucial for analyzing unstructured and semi-structured textual documents. This paper introduces a fast and precise text mining method based on a finite automaton to extract knowledge domains. Unlike simple words, multi-word units (such as credit card) are emphasized for their [...] Read more.
Text mining is crucial for analyzing unstructured and semi-structured textual documents. This paper introduces a fast and precise text mining method based on a finite automaton to extract knowledge domains. Unlike simple words, multi-word units (such as credit card) are emphasized for their efficiency in identifying specific semantic areas due to their predominantly monosemic nature, their limited number and their distinctiveness. The method focuses on identifying multi-word units within terminological ontologies, where each multi-word unit is associated with a sub-domain of ontology knowledge. The algorithm, designed to handle the challenges posed by very long multi-word units composed of a variable number of simple words, integrates user-selected ontologies into a single finite automaton during a fast pre-processing step. At runtime, the automaton reads input text character by character, efficiently locating multi-word units even if they overlap. This approach is efficient for both short and long documents, requiring no prior training. Ontologies can be updated without additional computational costs. An early system prototype, tested on 100 short and medium-length documents, recognized the knowledge domains for the vast majority of texts (over 90%) analyzed. The authors suggest that this method could be a valuable semantic-based knowledge domain extraction technique in unstructured documents. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

20 pages, 1045 KiB  
Article
Transformer Text Classification Model for Arabic Dialects That Utilizes Inductive Transfer
by Laith H. Baniata and Sangwoo Kang
Mathematics 2023, 11(24), 4960; https://doi.org/10.3390/math11244960 - 14 Dec 2023
Cited by 3 | Viewed by 1075
Abstract
In the realm of the five-category classification endeavor, there has been limited exploration of applied techniques for classifying Arabic text. These methods have primarily leaned on single-task learning, incorporating manually crafted features that lack robust sentence representations. Recently, the Transformer paradigm has emerged [...] Read more.
In the realm of the five-category classification endeavor, there has been limited exploration of applied techniques for classifying Arabic text. These methods have primarily leaned on single-task learning, incorporating manually crafted features that lack robust sentence representations. Recently, the Transformer paradigm has emerged as a highly promising alternative. However, when these models are trained using single-task learning, they often face challenges in achieving outstanding performance and generating robust latent feature representations, especially when dealing with small datasets. This issue is particularly pronounced in the context of the Arabic dialect, which has a scarcity of available resources. Given these constraints, this study introduces an innovative approach to dissecting sentiment in Arabic text. This approach combines Inductive Transfer (INT) with the Transformer paradigm to augment the adaptability of the model and refine the representation of sentences. By employing self-attention SE-A and feed-forward sub-layers as a shared Transformer encoder for both the five-category and three-category Arabic text classification tasks, this proposed model adeptly discerns sentiment in Arabic dialect sentences. The empirical findings underscore the commendable performance of the proposed model, as demonstrated in assessments of the Hotel Arabic-Reviews Dataset, the Book Reviews Arabic Dataset, and the LARB dataset. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

Review

Jump to: Research

32 pages, 730 KiB  
Review
Event-Centric Temporal Knowledge Graph Construction: A Survey
by Timotej Knez and Slavko Žitnik
Mathematics 2023, 11(23), 4852; https://doi.org/10.3390/math11234852 - 02 Dec 2023
Viewed by 2153
Abstract
Textual documents serve as representations of discussions on a variety of subjects. These discussions can vary in length and may encompass a range of events or factual information. Present trends in constructing knowledge bases primarily emphasize fact-based common sense reasoning, often overlooking the [...] Read more.
Textual documents serve as representations of discussions on a variety of subjects. These discussions can vary in length and may encompass a range of events or factual information. Present trends in constructing knowledge bases primarily emphasize fact-based common sense reasoning, often overlooking the temporal dimension of events. Given the widespread presence of time-related information, addressing this temporal aspect could potentially enhance the quality of common-sense reasoning within existing knowledge graphs. In this comprehensive survey, we aim to identify and evaluate the key tasks involved in constructing temporal knowledge graphs centered around events. These tasks can be categorized into three main components: (a) event extraction, (b) the extraction of temporal relationships and attributes, and (c) the creation of event-based knowledge graphs and timelines. Our systematic review focuses on the examination of available datasets and language technologies for addressing these tasks. An in-depth comparison of various approaches reveals that the most promising results are achieved by employing state-of-the-art models leveraging large pre-trained language models. Despite the existence of multiple datasets, a noticeable gap exists in the availability of annotated data that could facilitate the development of comprehensive end-to-end models. Drawing insights from our findings, we engage in a discussion and propose four future directions for research in this domain. These directions encompass (a) the integration of pre-existing knowledge, (b) the development of end-to-end systems for constructing event-centric knowledge graphs, (c) the enhancement of knowledge graphs with event-centric information, and (d) the prediction of absolute temporal attributes. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

Back to TopTop