Cross-Applications of Natural Language Processing and Text Mining

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 January 2024) | Viewed by 5075

Special Issue Editor

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Interests: knowledge graph; graph computing; edge computing; network architecture
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue aims to collect high-quality review papers from the field of the cross-applications of natural language processing and text mining. We encourage researchers from various fields within the journal’s scope to contribute review papers highlighting the latest developments in their research field or to invite relevant experts and colleagues to do so. Topics of interest for this Special Issue include, but are not limited to:

  • Pretraing language model and applications;
  • Generative pre-training transformer and large language models;
  • Knowledge graph construction and reasoning;
  • Dialogue and interaction systems;
  • Natural language generation;
  • Information retrieval and text mining;
  • Interpretable models;
  • Language diversity;
  • Machine translation;
  • Multimodality;
  • Question answering and machine read comprehension.

Dr. Yijun Mo
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • pretraining language model
  • knowledge graph

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 7301 KiB  
Article
Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
by Li He, Qingxiang Wang, Jie Liu, Jianyong Duan and Hao Wang
Appl. Sci. 2024, 14(6), 2333; https://doi.org/10.3390/app14062333 - 10 Mar 2024
Viewed by 615
Abstract
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe [...] Read more.
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe three key issues. Firstly, models are prone to misguidance when fusing unrelated text and images. Secondly, most existing visual features are not enhanced or filtered. Finally, due to the independent encoding strategies employed for text and images, a noticeable semantic gap exists between them. To address these challenges, we propose a framework called visual clue guidance and consistency matching (GMF). To tackle the first issue, we introduce a visual clue guidance (VCG) module designed to hierarchically extract visual information from multiple scales. This information is utilized as an injectable visual clue guidance sequence to steer text representations for error-insensitive prediction decisions. Furthermore, by incorporating a cross-scale attention (CSA) module, we successfully mitigate interference across scales, enhancing the image’s capability to capture details. To address the third issue of semantic disparity between text and images, we employ a consistency matching (CM) module based on the idea of multimodal contrastive learning, facilitating the collaborative learning of multimodal data. To validate the effectiveness of our proposed framework, we conducted comprehensive experimental studies, including extensive comparative experiments, ablation studies, and case studies, on two widely used benchmark datasets, demonstrating the efficacy of the framework. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

24 pages, 930 KiB  
Article
Optimizing Chatbot Effectiveness through Advanced Syntactic Analysis: A Comprehensive Study in Natural Language Processing
by Iván Ortiz-Garces, Jaime Govea, Roberto O. Andrade and William Villegas-Ch
Appl. Sci. 2024, 14(5), 1737; https://doi.org/10.3390/app14051737 - 21 Feb 2024
Cited by 1 | Viewed by 758
Abstract
In the era of digitalization, the interaction between humans and machines, particularly in Natural Language Processing, has gained crucial importance. This study focuses on improving the effectiveness and accuracy of chatbots based on Natural Language Processing. Challenges such as the variability of human [...] Read more.
In the era of digitalization, the interaction between humans and machines, particularly in Natural Language Processing, has gained crucial importance. This study focuses on improving the effectiveness and accuracy of chatbots based on Natural Language Processing. Challenges such as the variability of human language and high user expectations are addressed, analyzing critical aspects such as grammatical structure, keywords, and contextual factors, with a particular emphasis on syntactic structure. An optimized chatbot model that considers explicit content and the user’s underlying context and intentions is proposed using machine learning techniques. This approach reveals that specific features, such as syntactic structure and keywords, are critical to the accuracy of chatbots. The results show that the proposed model adapts to different linguistic contexts and offers coherent and relevant answers in real-world situations. Furthermore, user satisfaction with this advanced model exceeds traditional models, aligning with expectations of more natural and humanized interactions. This study demonstrates the feasibility of improving chatbot–user interaction through advanced syntactic analysis. It highlights the need for continued research and development in this field to achieve significant advances in human–computer interaction. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

18 pages, 2575 KiB  
Article
Improving Performance of Massive Text Real-Time Classification for Document Confidentiality Management
by Lingling Tan, Junkai Yi and Fei Yang
Appl. Sci. 2024, 14(4), 1565; https://doi.org/10.3390/app14041565 - 15 Feb 2024
Viewed by 528
Abstract
For classified and sensitive electronic documents within the scope of enterprises and organizations, in order to standardize and strengthen the confidentiality management of enterprises and meet the actual needs of secret text classification, a document automatic classification optimization method based on keyword retrieval [...] Read more.
For classified and sensitive electronic documents within the scope of enterprises and organizations, in order to standardize and strengthen the confidentiality management of enterprises and meet the actual needs of secret text classification, a document automatic classification optimization method based on keyword retrieval and the kNN classification algorithm is proposed. The method supports keyword classification management, provides users with keywords of multiple risk levels, and then combines a matching scanning algorithm to label keywords of different levels. The text with labels is used as the training set of the kNN algorithm to classify the target text and realize the classification protection of text data. Aimed at solving the shortcomings of large feature vector dimension, low classification efficiency, and low accuracy in existing kNN text classification methods, an optimization method is proposed using a feature selection algorithm and a kNN algorithm based on an AVX instruction set to realize real-time classification of massive texts. By constructing a keyword dictionary and an optimized feature vector, parallel calculation of the feature vector weight and distance vector is realized, and the accuracy and efficiency of text classification are improved. The experimental results show that the multi-classification effect of the feature selection algorithm used in this paper, tf-DE, is better than that of the traditional tf-idf algorithm, and the classification effect of kNN is comparable to that of the support vector machine (SVM) algorithm. With the increase in feature vector dimensions, the classification effect of the text classification algorithm is improved and the classification time also increases linearly. The AVX-256 acceleration method takes about 55% of the time of the original version, thus verifying the effect of multi-classification of massive texts for document confidentiality management. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

15 pages, 1332 KiB  
Article
Self-Distillation and Pinyin Character Prediction for Chinese Spelling Correction Based on Multimodality
by Li He, Feng Liu, Jie Liu, Jianyong Duan and Hao Wang
Appl. Sci. 2024, 14(4), 1375; https://doi.org/10.3390/app14041375 - 07 Feb 2024
Viewed by 458
Abstract
Chinese spelling correction (CSC) constitutes a pivotal and enduring goal in natural language processing, serving as a foundational element for various language-related tasks by detecting and rectifying spelling errors in textual content. Numerous methods for Chinese spelling correction leverage multimodal information, including character, [...] Read more.
Chinese spelling correction (CSC) constitutes a pivotal and enduring goal in natural language processing, serving as a foundational element for various language-related tasks by detecting and rectifying spelling errors in textual content. Numerous methods for Chinese spelling correction leverage multimodal information, including character, character sound, and character shape, to establish connections between incorrect and correct characters. Research indicates that a majority of spelling errors stem from pinyin similarity, with character similarity accounting for half of the errors. Consequently, effectively modeling character pinyin and character relationships emerges as a key challenge in the CSC task. In this study, we propose enhancing the CSC task by introducing the pinyin character prediction task. We employ an adaptive weighting method in the pinyin character prediction task to address predictions in a more granular manner, achieving a balance between the two prediction tasks. The proposed model, SPMSpell, utilizes ChineseBERT as an encoder to capture multimodal feature information simultaneously. It incorporates three parallel decoders for character prediction, pinyin prediction, and self-distillation modules. To mitigate potential overfitting concerning pinyin, a self-distillation method is introduced to prioritize character information in predictions. Extensive experiments conducted on three SIGHAN benchmark tests showcase that the model introduced in this paper attains a superior level of performance. This substantiates the correctness and superiority of the adaptive weighted pinyin character prediction task and underscores the effectiveness of the self-distillation module. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

14 pages, 2176 KiB  
Article
A Combined Semantic Dependency and Lexical Embedding RoBERTa Model for Grid Field Relational Extraction
by Qi Meng, Xixiang Zhang, Yun Dong, Yan Chen and Dezhao Lin
Appl. Sci. 2023, 13(19), 11074; https://doi.org/10.3390/app131911074 - 08 Oct 2023
Viewed by 767
Abstract
Relationship extraction is a crucial step in the construction of a knowledge graph. In this research, the grid field entity relationship extraction was performed via a labeling approach that used span representation. The subject entity and object entity were used as training instances [...] Read more.
Relationship extraction is a crucial step in the construction of a knowledge graph. In this research, the grid field entity relationship extraction was performed via a labeling approach that used span representation. The subject entity and object entity were used as training instances to bolster the linkage between them. The embedding layer of the RoBERTa pre-training model included word embedding, position embedding, and paragraph embedding information. In addition, semantic dependency was introduced to establish an effective linkage between different entities. To facilitate the effective linkage, an additional lexically labeled embedment was introduced to empower the model to acquire more profound semantic insights. After obtaining the embedding layer, the RoBERTa model was used for multi-task learning of entities and relations. The multi-task information was then fused using the parameter hard sharing mechanism. Finally, after the layer was fully connected, the predicted entity relations were obtained. The approach was tested on a grid field dataset created for this study. The obtained results demonstrated that the proposed model has high performance. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

16 pages, 876 KiB  
Article
QZRAM: A Transparent Kernel Memory Compression System Design for Memory-Intensive Applications with QAT Accelerator Integration
by Chi Gao, Xiaofei Xu, Zhizou Yang, Liwei Lin and Jian Li
Appl. Sci. 2023, 13(18), 10526; https://doi.org/10.3390/app131810526 - 21 Sep 2023
Viewed by 1402
Abstract
In recent decades, memory-intensive applications have experienced a boom, e.g., machine learning, natural language processing (NLP), and big data analytics. Such applications often experience out-of-memory (OOM) errors, which cause unexpected processes to exit without warning, resulting in negative effects on a system’s performance [...] Read more.
In recent decades, memory-intensive applications have experienced a boom, e.g., machine learning, natural language processing (NLP), and big data analytics. Such applications often experience out-of-memory (OOM) errors, which cause unexpected processes to exit without warning, resulting in negative effects on a system’s performance and stability. To mitigate OOM errors, many operating systems implement memory compression (e.g., Linux’s ZRAM) to provide flexible and larger memory space. However, these schemes incur two problems: (1) high-compression algorithms consume significant CPU resources, which inevitably degrades application performance; and (2) compromised compression algorithms with low latency and low compression ratios result in insignificant increases in memory space. In this paper, we propose QZRAM, which achieves a high-compression-ratio algorithm without high computing consumption through the integration of QAT (an ASIC accelerator) into ZRAM. To enhance hardware and software collaboration, a page-based parallel write module is introduced to serve as a more efficient request processing flow. More importantly, a QAT offloading module is introduced to asynchronously offload compression to the QAT accelerator, reducing CPU computing resource consumption and addressing two challenges: long CPU idle time and low usage of the QAT unit. The comprehensive evaluation validates that QZRAM can reduce CPU resources by up to 49.2% for the FIO micro-benchmark, increase memory space (1.66×) compared to ZRAM, and alleviate the memory overflow phenomenon of the Redis benchmark. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

Back to TopTop