Hybrid Methods for Natural Language Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 November 2021) | Viewed by 25272

Special Issue Editors


E-Mail Website
Guest Editor
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, 15782 Galiza, Spain
Interests: natural language processing; distributional semantics; information extraction; dependency parsing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Information Retrieval Lab (IRLab). Facultade de Informática, Campus de Elviña s/n. Despacho D1.02 Área Científica, Universidade da Coruña, C.P. 15008 A Coruña, Spain
Interests: natural language processing; discourse analysis; digital humanities; knowledge representation; conceptual modelling

Special Issue Information

Dear Colleagues,

The recent improvements in machine/deep learning (M/DL) technologies do not affect all Natural Language Processing (NLP) tasks, specifically those that require deep linguistic knowledge, natural language understanding, semantic inference and reasoning. For those tasks, it is necessary to design hybrid architectures that integrate symbolic information into M/DL models so as to allow machines to learn new knowledge in a more “intelligent” way by endowing them with common sense and deep understanding. Abstract and structured knowledge from human specialists can be used not just as training data to learn uninterpretable black-box models, but also to design the models themselves by making them more transparent, easy to interpret by humans, and more efficient for specific purposes.

This Special Issue of Electronics will provide a forum for discussing exciting research on hybrid methodology for NLP tasks. It is open to any contribution that requires deep semantic analysis, such as semantic relation extraction, discourse analysis, argument mining, rumor detection, and so on. Strategies can combine statistical models with symbolic information based on propositions, regular patterns, rules, or any symbolic language aimed at representing abstract and structured commonsense knowledge.

Topics of interest of this Special Issue include, but are not limited to

Information Extraction:

  • Semantic relation extraction (including Open Information Extraction);
  • Discovery and identification of multi-word expressions;
  • Named entity recognition and classification;
  • Event detection and temporal analysis.

Language Analysis:

  • Dependency parsing;
  • Discourse analysis;
  • Argument mining;
  • Argument evaluation;
  • Grammar checking.

Semantic Models:

  • (Contextualized) word embeddings;
  • Distributional semantics and compositionality;
  • Sentence similarity and paraphrasing.

Text/Opinion Mining:

  • Sentiment analysis;
  • Hate speech detection;
  • Rumour and fake news detection;
  • Authorship attribution.

Dr. Pablo Gamallo
Dr. Patricia Martín-Rodilla
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information extraction
  • language Analysis
  • semantic models
  • text/opinion mining

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

24 pages, 786 KiB  
Article
Experiences on the Improvement of Logic-Based Anaphora Resolution in English Texts
by Stefano Ferilli and Domenico Redavid
Electronics 2022, 11(3), 372; https://doi.org/10.3390/electronics11030372 - 26 Jan 2022
Cited by 3 | Viewed by 2463
Abstract
Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based [...] Read more.
Anaphora resolution is a crucial task for information extraction. Syntax-based approaches are based on the syntactic structure of sentences. Knowledge-poor approaches aim at avoiding the need for further external resources or knowledge to carry out their task. This paper proposes a knowledge-poor, syntax-based approach to anaphora resolution in English texts. Our approach improves the traditional algorithm that is considered the standard baseline for comparison in the literature. Its most relevant contributions are in its ability to handle differently different kinds of anaphoras, and to disambiguate alternate associations using gender recognition of proper nouns. The former is obtained by refining the rules in the baseline algorithm, while the latter is obtained using a machine learning approach. Experimental results on a standard benchmark dataset used in the literature show that our approach can significantly improve the performance over the standard baseline algorithm used in the literature, and compares well also to the state-of-the-art algorithm that thoroughly exploits external knowledge. It is also efficient. Thus, we propose to use our algorithm as the new baseline in the literature. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

23 pages, 1128 KiB  
Article
Transfer Learning with Social Media Content in the Ride-Hailing Domain by Using a Hybrid Machine Learning Architecture
by Álvaro de Pablo, Oscar Araque and Carlos A. Iglesias
Electronics 2022, 11(2), 189; https://doi.org/10.3390/electronics11020189 - 08 Jan 2022
Cited by 2 | Viewed by 1721
Abstract
The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they [...] Read more.
The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, enables the behavior analysis of users and their opinions with respect to different domains. In this work, a hybrid machine learning-based system has been developed to classify texts using topic modeling techniques and different word-vector representations, as well as traditional text representations. The system has been trained with ride-hailing posts extracted from Reddit, showing promising performance. Then, the generated models have been tested with data extracted from other sources such as Twitter and Google Play, classifying these texts without retraining any models and thus performing Transfer Learning. The obtained results show that our proposed architecture is effective when performing Transfer Learning from data-rich domains and applying them to other sources. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

24 pages, 912 KiB  
Article
Monolingual and Cross-Lingual Intent Detection without Training Data in Target Languages
by Jurgita Kapočiūtė-Dzikienė, Askars Salimbajevs and Raivis Skadiņš
Electronics 2021, 10(12), 1412; https://doi.org/10.3390/electronics10121412 - 11 Jun 2021
Cited by 6 | Viewed by 2494
Abstract
Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine [...] Read more.
Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual solutions that work without the training data in the target language. Consequently, in this research, we use the English dataset and solve the intent detection problem for five target languages (German, French, Lithuanian, Latvian, and Portuguese). When seeking the most accurate solutions, we investigate BERT-based word and sentence transformers together with eager learning classifiers (CNN, BERT fine-tuning, FFNN) and lazy learning approach (Cosine similarity as the memory-based method). We offer and evaluate several strategies to overcome the data scarcity problem with machine translation, cross-lingual models, and a combination of the previous two. The experimental investigation revealed the robustness of sentence transformers under various cross-lingual conditions. The accuracy equal to ~0.842 is achieved with the English dataset with completely monolingual models is considered our top-line. However, cross-lingual approaches demonstrate similar accuracy levels reaching ~0.831, ~0.829, ~0.853, ~0.831, and ~0.813 on German, French, Lithuanian, Latvian, and Portuguese languages. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

14 pages, 1984 KiB  
Article
Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification
by Hanqian Wu, Zhike Wang, Feng Qing and Shoushan Li
Electronics 2021, 10(3), 270; https://doi.org/10.3390/electronics10030270 - 23 Jan 2021
Cited by 8 | Viewed by 2085
Abstract
Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, [...] Read more.
Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, we propose an approach for performing a Cross-Lingual Aspect Sentiment Classification (CLASC) task which leverages the rich resources in one language (source language) for aspect sentiment classification in a under-resourced language (target language). Specifically, we first build a bilingual lexicon for domain-specific training data to translate the aspect category annotated in the source-language corpus and then translate sentences from the source language to the target language via Machine Translation (MT) tools. However, most MT systems are general-purpose, it non-avoidably introduces translation ambiguities which would degrade the performance of CLASC. In this context, we propose a novel approach called Reinforced Transformer with Cross-Lingual Distillation (RTCLD) combined with target-sensitive adversarial learning to minimize the undesirable effects of translation ambiguities in sentence translation. We conduct experiments on different language combinations, treating English as the source language and Chinese, Russian, and Spanish as target languages. The experimental results show that our proposed approach outperforms the state-of-the-art methods on different target languages. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

17 pages, 4231 KiB  
Article
Connecting Discourse and Domain Models in Discourse Analysis through Ontological Proxies
by Cesar Gonzalez-Perez
Electronics 2020, 9(11), 1955; https://doi.org/10.3390/electronics9111955 - 19 Nov 2020
Viewed by 2812
Abstract
Argumentation-oriented discourse analysis usually focuses on what is being said and how, following the text under analysis quite literally, and paying little attention to the things in the world to which the text refers. However, to perform argumentation-oriented discourse analysis, one must assume [...] Read more.
Argumentation-oriented discourse analysis usually focuses on what is being said and how, following the text under analysis quite literally, and paying little attention to the things in the world to which the text refers. However, to perform argumentation-oriented discourse analysis, one must assume certain conceptualisations by the speaker in order to interpret and reconstruct propositions and argumentation structures. These conceptualisations are rarely captured as a product of the analysis process. In this paper, we argue that considering the ontology to which a discourse refers as well as the text itself provides a richer and more useful representation of the discourse and its argumentation structures, facilitates intertextual analysis, and improves understandability of the analysis products. To this end, we propose the notion of ontological proxies, i.e., conceptual artefacts that connect elements in the argumentation structure to the associated ontology elements. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

13 pages, 494 KiB  
Article
Incorporating External Knowledge into Unsupervised Graph Model for Document Summarization
by Tiancheng Tang, Tianyi Yuan, Xinhuai Tang and Delai Chen
Electronics 2020, 9(9), 1520; https://doi.org/10.3390/electronics9091520 - 17 Sep 2020
Cited by 5 | Viewed by 2332
Abstract
Supervised neural network models have achieved outstanding performance in the document summarization task in recent years. However, it is hard to get enough labeled training data with a high quality for these models to generate different types of summaries in reality. In this [...] Read more.
Supervised neural network models have achieved outstanding performance in the document summarization task in recent years. However, it is hard to get enough labeled training data with a high quality for these models to generate different types of summaries in reality. In this work, we mainly focus on improving the performance of the popular unsupervised Textrank algorithm that requires no labeled training data for extractive summarization. We first modify the original edge weight of Textrank to take the relative position of sentences into account, and then combine the output of the improved Textrank with K-means clustering to improve the diversity of generated summaries. To further improve the performance of our model, we innovatively incorporate external knowledge from open-source knowledge graphs into our model by entity linking. We use the knowledge graph sentence embedding and the tf-idf embedding as the input of our improved Textrank, and get the final score for each sentence by linear combination. Evaluations on the New York Times data set show the effectiveness of our knowledge-enhanced approach. The proposed model outperforms other popular unsupervised models significantly. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

14 pages, 1858 KiB  
Article
The Graph Reasoning Approach Based on the Dynamic Knowledge Auxiliary for Complex Fact Verification
by Yongyue Wang, Chunhe Xia, Chengxiang Si, Chongyu Zhang and Tianbo Wang
Electronics 2020, 9(9), 1472; https://doi.org/10.3390/electronics9091472 - 09 Sep 2020
Cited by 3 | Viewed by 2150
Abstract
Complex fact verification (FV) requires fusing scattered sequences and performing multi-hop reasoning over these composed sequences. Recently, by employing some FV models, knowledge is obtained from context to support the reasoning process based on pretrained models (e.g., BERT, XLNET), and this model outperforms [...] Read more.
Complex fact verification (FV) requires fusing scattered sequences and performing multi-hop reasoning over these composed sequences. Recently, by employing some FV models, knowledge is obtained from context to support the reasoning process based on pretrained models (e.g., BERT, XLNET), and this model outperforms previous out-of-the-art FV models. In practice, however, the limited training data cannot provide enough background knowledge for FV tasks. Once the background knowledge changed, the pretrained models’ parameters cannot be updated. Additionally, noise against common sense cannot be accurately filtered out due to the lack of necessary knowledge, which may have a negative impact on the reasoning progress. Furthermore, existing models often wrongly label the given claims as ‘not enough information’ due to the lack of necessary conceptual relationship between pieces of evidence. In the present study, a Dynamic Knowledge Auxiliary Graph Reasoning (DKAR) approach is proposed for incorporating external background knowledge in the current FV model, which explicitly identifies and fills the knowledge gaps between provided sources and the given claims, to enhance the reasoning ability of graph neural networks. Experiments show that DKAR put forward in this study can be combined with specific and discriminative knowledge to guide the FV system to successfully overcome the knowledge-gap challenges and achieve improvement in FV tasks. Furthermore, DKAR is adopted to complete the FV task on the Fake NewsNet dataset, showing outstanding advantages in a small sample and heterogeneous web text source. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

Review

Jump to: Research

31 pages, 653 KiB  
Review
Classification of Arabic Tweets: A Review
by Meshrif Alruily
Electronics 2021, 10(10), 1143; https://doi.org/10.3390/electronics10101143 - 12 May 2021
Cited by 19 | Viewed by 7739
Abstract
Text classification is a prominent research area, gaining more interest in academia, industry and social media. Arabic is one of the world’s most famous languages and it had a significant role in science, mathematics and philosophy in Europe in the middle ages. During [...] Read more.
Text classification is a prominent research area, gaining more interest in academia, industry and social media. Arabic is one of the world’s most famous languages and it had a significant role in science, mathematics and philosophy in Europe in the middle ages. During the Arab Spring, social media, that is, Facebook, Twitter and Instagram, played an essential role in establishing, running, and spreading these movements. Arabic Sentiment Analysis (ASA) and Arabic Text Classification (ATC) for these social media tools are hot topics, aiming to obtain valuable Arabic text insights. Although some surveys are available on this topic, the studies and research on Arabic Tweets need to be classified on the basis of machine learning algorithms. Machine learning algorithms and lexicon-based classifications are considered essential tools for text processing. In this paper, a comparison of previous surveys is presented, elaborating the need for a comprehensive study on Arabic Tweets. Research studies are classified according to machine learning algorithms, supervised learning, unsupervised learning, hybrid, and lexicon-based classifications, and their advantages/disadvantages are discussed comprehensively. We pose different challenges and future research directions. Full article
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)
Show Figures

Figure 1

Back to TopTop