Electronics

Research

Jump to: Review

24 pages, 786 KiB

Open AccessArticle

Experiences on the Improvement of Logic-Based Anaphora Resolution in English Texts

by Stefano Ferilli and Domenico Redavid

Electronics 2022, 11(3), 372; https://doi.org/10.3390/electronics11030372 - 26 Jan 2022

Cited by 3 | Viewed by 2463

Abstract

Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based [...] Read more.

Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based approach to anaphora resolution in English texts. Our approach improves the traditional algorithm that is considered the standard baseline for comparison in the literature. Its most relevant contributions are in its ability to handle differently different kinds of anaphoras, and to disambiguate alternate associations using gender recognition of proper nouns. The former is obtained by refining the rules in the baseline algorithm, while the latter is obtained using a machine learning approach. Experimental results on a standard benchmark dataset used in the literature show that our approach can significantly improve the performance over the standard baseline algorithm used in the literature, and compares well also to the state-of-the-art algorithm that thoroughly exploits external knowledge. It is also efficient. Thus, we propose to use our algorithm as the new baseline in the literature. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

23 pages, 1128 KiB

Open AccessArticle

Transfer Learning with Social Media Content in the Ride-Hailing Domain by Using a Hybrid Machine Learning Architecture

by Álvaro de Pablo, Oscar Araque and Carlos A. Iglesias

Electronics 2022, 11(2), 189; https://doi.org/10.3390/electronics11020189 - 08 Jan 2022

Cited by 2 | Viewed by 1721

Abstract

The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they [...] Read more.

The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, enables the behavior analysis of users and their opinions with respect to different domains. In this work, a hybrid machine learning-based system has been developed to classify texts using topic modeling techniques and different word-vector representations, as well as traditional text representations. The system has been trained with ride-hailing posts extracted from Reddit, showing promising performance. Then, the generated models have been tested with data extracted from other sources such as Twitter and Google Play, classifying these texts without retraining any models and thus performing Transfer Learning. The obtained results show that our proposed architecture is effective when performing Transfer Learning from data-rich domains and applying them to other sources. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

24 pages, 912 KiB

Open AccessArticle

Monolingual and Cross-Lingual Intent Detection without Training Data in Target Languages

by Jurgita Kapočiūtė-Dzikienė, Askars Salimbajevs and Raivis Skadiņš

Electronics 2021, 10(12), 1412; https://doi.org/10.3390/electronics10121412 - 11 Jun 2021

Cited by 6 | Viewed by 2494

Abstract

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine [...] Read more.

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual solutions that work without the training data in the target language. Consequently, in this research, we use the English dataset and solve the intent detection problem for five target languages (German, French, Lithuanian, Latvian, and Portuguese). When seeking the most accurate solutions, we investigate BERT-based word and sentence transformers together with eager learning classifiers (CNN, BERT fine-tuning, FFNN) and lazy learning approach (Cosine similarity as the memory-based method). We offer and evaluate several strategies to overcome the data scarcity problem with machine translation, cross-lingual models, and a combination of the previous two. The experimental investigation revealed the robustness of sentence transformers under various cross-lingual conditions. The accuracy equal to ~0.842 is achieved with the English dataset with completely monolingual models is considered our top-line. However, cross-lingual approaches demonstrate similar accuracy levels reaching ~0.831, ~0.829, ~0.853, ~0.831, and ~0.813 on German, French, Lithuanian, Latvian, and Portuguese languages. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

14 pages, 1984 KiB

Open AccessArticle

Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification

by Hanqian Wu, Zhike Wang, Feng Qing and Shoushan Li

Electronics 2021, 10(3), 270; https://doi.org/10.3390/electronics10030270 - 23 Jan 2021

Cited by 8 | Viewed by 2085

Abstract

Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, [...] Read more.

Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, we propose an approach for performing a Cross-Lingual Aspect Sentiment Classification (CLASC) task which leverages the rich resources in one language (source language) for aspect sentiment classification in a under-resourced language (target language). Specifically, we first build a bilingual lexicon for domain-specific training data to translate the aspect category annotated in the source-language corpus and then translate sentences from the source language to the target language via Machine Translation (MT) tools. However, most MT systems are general-purpose, it non-avoidably introduces translation ambiguities which would degrade the performance of CLASC. In this context, we propose a novel approach called Reinforced Transformer with Cross-Lingual Distillation (RTCLD) combined with target-sensitive adversarial learning to minimize the undesirable effects of translation ambiguities in sentence translation. We conduct experiments on different language combinations, treating English as the source language and Chinese, Russian, and Spanish as target languages. The experimental results show that our proposed approach outperforms the state-of-the-art methods on different target languages. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

17 pages, 4231 KiB

Open AccessArticle

Connecting Discourse and Domain Models in Discourse Analysis through Ontological Proxies

by Cesar Gonzalez-Perez

Electronics 2020, 9(11), 1955; https://doi.org/10.3390/electronics9111955 - 19 Nov 2020

Viewed by 2812

Abstract

Argumentation-oriented discourse analysis usually focuses on what is being said and how, following the text under analysis quite literally, and paying little attention to the things in the world to which the text refers. However, to perform argumentation-oriented discourse analysis, one must assume [...] Read more.

Argumentation-oriented discourse analysis usually focuses on what is being said and how, following the text under analysis quite literally, and paying little attention to the things in the world to which the text refers. However, to perform argumentation-oriented discourse analysis, one must assume certain conceptualisations by the speaker in order to interpret and reconstruct propositions and argumentation structures. These conceptualisations are rarely captured as a product of the analysis process. In this paper, we argue that considering the ontology to which a discourse refers as well as the text itself provides a richer and more useful representation of the discourse and its argumentation structures, facilitates intertextual analysis, and improves understandability of the analysis products. To this end, we propose the notion of ontological proxies, i.e., conceptual artefacts that connect elements in the argumentation structure to the associated ontology elements. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

13 pages, 494 KiB

Open AccessArticle

Incorporating External Knowledge into Unsupervised Graph Model for Document Summarization

by Tiancheng Tang, Tianyi Yuan, Xinhuai Tang and Delai Chen

Electronics 2020, 9(9), 1520; https://doi.org/10.3390/electronics9091520 - 17 Sep 2020

Cited by 5 | Viewed by 2332

Abstract

Supervised neural network models have achieved outstanding performance in the document summarization task in recent years. However, it is hard to get enough labeled training data with a high quality for these models to generate different types of summaries in reality. In this [...] Read more.

Supervised neural network models have achieved outstanding performance in the document summarization task in recent years. However, it is hard to get enough labeled training data with a high quality for these models to generate different types of summaries in reality. In this work, we mainly focus on improving the performance of the popular unsupervised Textrank algorithm that requires no labeled training data for extractive summarization. We first modify the original edge weight of Textrank to take the relative position of sentences into account, and then combine the output of the improved Textrank with K-means clustering to improve the diversity of generated summaries. To further improve the performance of our model, we innovatively incorporate external knowledge from open-source knowledge graphs into our model by entity linking. We use the knowledge graph sentence embedding and the tf-idf embedding as the input of our improved Textrank, and get the final score for each sentence by linear combination. Evaluations on the New York Times data set show the effectiveness of our knowledge-enhanced approach. The proposed model outperforms other popular unsupervised models significantly. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

14 pages, 1858 KiB

Open AccessArticle

The Graph Reasoning Approach Based on the Dynamic Knowledge Auxiliary for Complex Fact Verification

by Yongyue Wang, Chunhe Xia, Chengxiang Si, Chongyu Zhang and Tianbo Wang

Electronics 2020, 9(9), 1472; https://doi.org/10.3390/electronics9091472 - 09 Sep 2020

Cited by 3 | Viewed by 2150

Abstract

Complex fact verification (FV) requires fusing scattered sequences and performing multi-hop reasoning over these composed sequences. Recently, by employing some FV models, knowledge is obtained from context to support the reasoning process based on pretrained models (e.g., BERT, XLNET), and this model outperforms [...] Read more.

Complex fact verification (FV) requires fusing scattered sequences and performing multi-hop reasoning over these composed sequences. Recently, by employing some FV models, knowledge is obtained from context to support the reasoning process based on pretrained models (e.g., BERT, XLNET), and this model outperforms previous out-of-the-art FV models. In practice, however, the limited training data cannot provide enough background knowledge for FV tasks. Once the background knowledge changed, the pretrained models’ parameters cannot be updated. Additionally, noise against common sense cannot be accurately filtered out due to the lack of necessary knowledge, which may have a negative impact on the reasoning progress. Furthermore, existing models often wrongly label the given claims as ‘not enough information’ due to the lack of necessary conceptual relationship between pieces of evidence. In the present study, a Dynamic Knowledge Auxiliary Graph Reasoning (DKAR) approach is proposed for incorporating external background knowledge in the current FV model, which explicitly identifies and fills the knowledge gaps between provided sources and the given claims, to enhance the reasoning ability of graph neural networks. Experiments show that DKAR put forward in this study can be combined with specific and discriminative knowledge to guide the FV system to successfully overcome the knowledge-gap challenges and achieve improvement in FV tasks. Furthermore, DKAR is adopted to complete the FV task on the Fake NewsNet dataset, showing outstanding advantages in a small sample and heterogeneous web text source. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

Review

Jump to: Research

31 pages, 653 KiB

Open AccessReview

Classification of Arabic Tweets: A Review

by Meshrif Alruily

Electronics 2021, 10(10), 1143; https://doi.org/10.3390/electronics10101143 - 12 May 2021

Cited by 19 | Viewed by 7739

Abstract

Text classification is a prominent research area, gaining more interest in academia, industry and social media. Arabic is one of the world’s most famous languages and it had a significant role in science, mathematics and philosophy in Europe in the middle ages. During [...] Read more.

Text classification is a prominent research area, gaining more interest in academia, industry and social media. Arabic is one of the world’s most famous languages and it had a significant role in science, mathematics and philosophy in Europe in the middle ages. During the Arab Spring, social media, that is, Facebook, Twitter and Instagram, played an essential role in establishing, running, and spreading these movements. Arabic Sentiment Analysis (ASA) and Arabic Text Classification (ATC) for these social media tools are hot topics, aiming to obtain valuable Arabic text insights. Although some surveys are available on this topic, the studies and research on Arabic Tweets need to be classified on the basis of machine learning algorithms. Machine learning algorithms and lexicon-based classifications are considered essential tools for text processing. In this paper, a comparison of previous surveys is presented, elaborating the need for a comprehensive study on Arabic Tweets. Research studies are classified according to machine learning algorithms, supervised learning, unsupervised learning, hybrid, and lexicon-based classifications, and their advantages/disadvantages are discussed comprehensively. We pose different challenges and future research directions. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Hybrid Methods for Natural Language Processing

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI