Natural Language Processing: Theory, Methods and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2024 | Viewed by 2202

Special Issue Editors

Department of Computer Science, University of Vigo, ESEI-Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
Interests: artificial intelligence; text mining; spam filtering
Special Issues, Collections and Topics in MDPI journals
Department of Electronics and Computer Science, University of Santiago de Compostela, EPSI-Escuela Politécnica Superior de Ingeniería, Campus Terra, 27002 Lugo, Spain
Interests: artificial intelligence; text mining; data mining; drugs discovery; unsupervised clustering schemes
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are inviting submissions to this Special Issue on natural language processing.

This Special Issue is centered in the analysis of textual information in different contexts (health, e-mail classification, law analysis, etc.). New contributions to improve the current state of the art in this field or to explain the possible applications are welcome. Papers can also address issues about the application NLP techniques to develop specific solutions for making people's daily lives easier. Contributions may concern a wide variety of techniques including, but not limited to, the following: solutions based on any machine-learning (ML) technique (such as traditional ML models, deep-learning techniques or explainable artificial intelligence methodologies), word-embedding representations, the use of ontologies or ontology dictionaries, statistical techniques, etc.

The Special Issue is open for the publication of experimental work, properly validated designs for solutions, theoretical studies or state-of-the-art review papers.

Dr. José Ramón Méndez Reboredo
Dr. David Ruano-Ordás
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • representation
  • information retrieval
  • text classification
  • semantic analysis
  • word sense disambiguation
  • clustering
  • intent detection

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 1391 KiB  
Article
Named Entity Recognition in Government Audit Texts Based on ChineseBERT and Character-Word Fusion
by Baohua Huang, Yunjie Lin, Si Pang and Long Fu
Appl. Sci. 2024, 14(4), 1425; https://doi.org/10.3390/app14041425 - 09 Feb 2024
Viewed by 500
Abstract
Named entity recognition of government audit text is a key task of intelligent auditing. Aiming at the problems of scarcity of corpus in the field of governmental auditing, insufficient utilization of traditional character vector word-level information features, and insufficient capturing of auditing entity [...] Read more.
Named entity recognition of government audit text is a key task of intelligent auditing. Aiming at the problems of scarcity of corpus in the field of governmental auditing, insufficient utilization of traditional character vector word-level information features, and insufficient capturing of auditing entity features, this study builds its own dataset in the field of auditing and proposes the model CW-CBGC for recognizing named entities in governmental auditing text based on ChineseBERT and character-word fusion. First, the ChineseBERT pre-training model is used to extract the character vector that integrates the features of glyph and pinyin, combining with word vectors dynamically constructed by the BERT pre-training model; then, the sequences of character-word fusion vectors are input into the bi-directional gated recurrent neural network (BiGRU) to learn the textual features. Finally, the global optimal sequence label is generated by Conditional Random Field (CRF), and the GHM classification loss function is used in the model training to solve the problem of error evaluation under the conditions of noisy entities and unbalanced number of entities. The F1 value of this study’s model on the audit dataset is 97.23%, which is 3.64% higher than the baseline model’s F1 value; the F1 value of the model on the public dataset Resume is 96.26%, which is 0.73–2.78% higher than the mainstream model. The experimental results show that the model proposed in this paper can effectively recognize the entities in government audit texts and has certain generalization ability. Full article
(This article belongs to the Special Issue Natural Language Processing: Theory, Methods and Applications)
Show Figures

Figure 1

12 pages, 301 KiB  
Article
An Approach to a Linked Corpus Creation for a Literary Heritage Based on the Extraction of Entities from Texts
by Kenan Kassab and Nikolay Teslya
Appl. Sci. 2024, 14(2), 585; https://doi.org/10.3390/app14020585 - 09 Jan 2024
Viewed by 612
Abstract
Working with the literary heritage of writers requires the studying of a large amount of materials. Finding them can take a considerable amount of time even when using search engines. The solution to this problem is to create a linked corpus of literary [...] Read more.
Working with the literary heritage of writers requires the studying of a large amount of materials. Finding them can take a considerable amount of time even when using search engines. The solution to this problem is to create a linked corpus of literary heritage. Texts in such a corpus will be united by common entities, which will make it possible to select texts not only by the occurrence of certain phrases in a query but also by common entities. To solve this problem, we propose the use of a Named Entity Recognition model trained on examples from a corpus of texts and a database structure for storing connections between texts. We propose to automate the process of creating a dataset for training a BERT-based NER model. Due to the specifics of the subject area, methods, techniques, and strategies are proposed to increase the accuracy of the model trained with a small set of examples. As a result, we created a dataset and a model trained on it which showed high accuracy in recognizing entities in the text (the average F1-score for all entity types is 0.8952). The database structure provides for the storage of unique entities and their relationships with texts and a selection of texts based on the entities. The method was tested for a corpus of texts from the literary heritage of Alexander Sergeevich Pushkin, which is also a difficult task due to the specifics of the Russian language. Full article
(This article belongs to the Special Issue Natural Language Processing: Theory, Methods and Applications)
Show Figures

Figure 1

16 pages, 6689 KiB  
Article
The Question of Studying Information Entropy in Poetic Texts
by Olga Kozhemyakina, Vladimir Barakhnin, Natalia Shashok and Elina Kozhemyakina
Appl. Sci. 2023, 13(20), 11247; https://doi.org/10.3390/app132011247 - 13 Oct 2023
Viewed by 648
Abstract
One of the approaches to quantitative text analysis is to represent a given text in the form of a time series, which can be followed by an information entropy study for different text representations, such as “symbolic entropy”, “phonetic entropy” and “emotional entropy” [...] Read more.
One of the approaches to quantitative text analysis is to represent a given text in the form of a time series, which can be followed by an information entropy study for different text representations, such as “symbolic entropy”, “phonetic entropy” and “emotional entropy” of various orders. Studying authors’ styles based on such entropic characteristics of their works seems to be a promising area in the field of information analysis. In this work, the calculations of entropy values of the first, second and third order for the corpus of poems by A.S. Pushkin and other poets from the Golden Age of Russian Poetry were carried out. The values of “symbolic entropy”, “phonetic entropy” and “emotional entropy” and their mathematical expectations and variances were calculated for given corpora using the software application that automatically extracts statistical information, which is potentially applicable to tasks that identify features of the author’s style. The statistical data extracted could become the basis of the stylometric classification of authors by entropy characteristics. Full article
(This article belongs to the Special Issue Natural Language Processing: Theory, Methods and Applications)
Show Figures

Figure 1

Back to TopTop