When Natural Language Processing Meets Machine Learning—Opportunities, Challenges and Solutions

A special issue of Computers (ISSN 2073-431X).

Deadline for manuscript submissions: 30 June 2024 | Viewed by 5978

Special Issue Editors

School of Computing, Ulster University, Belfast BT15 1ED, UK
Interests: data science; machine learning; pervasive sensing; inertial sensing; neurorehabilitation

E-Mail Website
Guest Editor
School of Computing, Ulster University, Belfast BT15 1AP, UK
Interests: machine learning; bioinformatics; healthcare informatics; healthcare technology; intelligent data analysis; integrative data analytics; assistive technologies
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China
Interests: chemistry; environmental sciences and ecology; imaging science and photographic technology; remote sensing; computer science

Special Issue Information

Dear Colleagues,

The combination of Natural Language Processing (NLP) and Machine Learning (ML) has led to many advancements in the field of artificial intelligence, enabling computers to understand and analyse human language. NLP focuses on the interactions between human language and computers, while ML provides algorithms and techniques to make predictions and automate tasks based on data. The opportunities presented by this combination include improved text classification, sentiment analysis, machine translation, and question-answering systems. However, the integration of NLP and ML still faces several challenges, such as the need for large amounts of annotated data for training, handling the complexity and variability of human language, and ensuring the ethical and fair use of AI systems. To overcome these challenges, NLP and ML researchers are exploring innovative solutions such as transfer learning, semi-supervised learning, and unsupervised learning methods, as well as developing techniques to handle unstructured and diverse data. Additionally, there is a growing emphasis on ensuring the accountability, transparency, and ethical use of AI systems.

Dr. Lu Bai
Prof. Dr. Huiru Zheng
Dr. Zhibao Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computers is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • text classification
  • sentiment analysis
  • machine learning

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 1225 KiB  
Article
Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language
by Gordan Gledec, Mladen Sokele, Marko Horvat and Miljenko Mikuc
Computers 2024, 13(2), 39; https://doi.org/10.3390/computers13020039 - 29 Jan 2024
Viewed by 1147
Abstract
This paper introduces a novel approach to the creation and application of confusion matrices for error pattern discovery in spellchecking for the Croatian language. The experimental dataset has been derived from a corpus of mistyped words and user corrections collected since 2008 using [...] Read more.
This paper introduces a novel approach to the creation and application of confusion matrices for error pattern discovery in spellchecking for the Croatian language. The experimental dataset has been derived from a corpus of mistyped words and user corrections collected since 2008 using the Croatian spellchecker available at ispravi.me. The important role of confusion matrices in enhancing the precision of spellcheckers, particularly within the diverse linguistic context of the Croatian language, is investigated. Common causes of spelling errors, emphasizing the challenges posed by diacritic usage, have been identified and analyzed. This research contributes to the advancement of spellchecking technologies and provides a more comprehensive understanding of linguistic details, particularly in languages with diacritic-rich orthographies, like Croatian. The presented user-data-driven approach demonstrates the potential for custom spellchecking solutions, especially considering the ever-changing dynamics of language use in digital communication. Full article
Show Figures

Figure 1

27 pages, 3461 KiB  
Communication
Analyzing Public Reactions, Perceptions, and Attitudes during the MPox Outbreak: Findings from Topic Modeling of Tweets
by Nirmalya Thakur, Yuvraj Nihal Duggal and Zihui Liu
Computers 2023, 12(10), 191; https://doi.org/10.3390/computers12100191 - 23 Sep 2023
Cited by 3 | Viewed by 1502
Abstract
In the last decade and a half, the world has experienced outbreaks of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika virus, Middle East Respiratory Syndrome (MERS), measles, and West Nile virus, just to name a few. During these virus [...] Read more.
In the last decade and a half, the world has experienced outbreaks of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika virus, Middle East Respiratory Syndrome (MERS), measles, and West Nile virus, just to name a few. During these virus outbreaks, the usage and effectiveness of social media platforms increased significantly, as such platforms served as virtual communities, enabling their users to share and exchange information, news, perspectives, opinions, ideas, and comments related to the outbreaks. Analysis of this Big Data of conversations related to virus outbreaks using concepts of Natural Language Processing such as Topic Modeling has attracted the attention of researchers from different disciplines such as Healthcare, Epidemiology, Data Science, Medicine, and Computer Science. The recent outbreak of the MPox virus has resulted in a tremendous increase in the usage of Twitter. Prior works in this area of research have primarily focused on the sentiment analysis and content analysis of these Tweets, and the few works that have focused on topic modeling have multiple limitations. This paper aims to address this research gap and makes two scientific contributions to this field. First, it presents the results of performing Topic Modeling on 601,432 Tweets about the 2022 Mpox outbreak that were posted on Twitter between 7 May 2022 and 3 March 2023. The results indicate that the conversations on Twitter related to Mpox during this time range may be broadly categorized into four distinct themes—Views and Perspectives about Mpox, Updates on Cases and Investigations about Mpox, Mpox and the LGBTQIA+ Community, and Mpox and COVID-19. Second, the paper presents the findings from the analysis of these Tweets. The results show that the theme that was most popular on Twitter (in terms of the number of Tweets posted) during this time range was Views and Perspectives about Mpox. This was followed by the theme of Mpox and the LGBTQIA+ Community, which was followed by the themes of Mpox and COVID-19 and Updates on Cases and Investigations about Mpox, respectively. Finally, a comparison with related studies in this area of research is also presented to highlight the novelty and significance of this research work. Full article
Show Figures

Figure 1

21 pages, 2284 KiB  
Article
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
by Nasrin Elhassan, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy and Amir Hussain
Computers 2023, 12(6), 126; https://doi.org/10.3390/computers12060126 - 19 Jun 2023
Cited by 3 | Viewed by 2621
Abstract
Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions [...] Read more.
Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset. Full article
Show Figures

Figure 1

Back to TopTop