Rich Linguistic Processing for Multilingual Text Mining

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 June 2021) | Viewed by 9845

Special Issue Editors


E-Mail Website1 Website2
Guest Editor
CITIC. Grupo LyS, Departamento de Ciencias da Computación e Tecnoloxías da Información, Universidade da Coruña, 15071 A Coruña, Spain
Interests: natural language processing (NLP); multilingual and crosslingual NLP with an emphasis on low-resource languages; sentiment analysis and opinion mining on social media; information retrieval techniques applying NLP
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Universidade da Coruña, CITIC. Grupo LyS, Departamento de Ciencias da Computación e Tecnoloxías da Información, 15071 A Coruña, Spain
Interests: My research interests lie mainly in the field of computational linguistics (or natural language processing). My main research focus is on natural language parsing algorithms, both from a theoretical and practical standpoint. I am especially interested in techniques to improve the speed of parsing algorithms, making them practical at the web scale, which is the focus of the ERC Starting Grant project FASTPARSE and the focal point of my current research; parsing beyond the “easy cases” such as non-projective dependency parsing (the search for parsing algorithms that can efficiently handle linguistic structures that contain crossing dependency links or, roughly equivalently, which contain discontinuous phrases), parsing morphologically rich languages, noisy text, etc.; and cognitive aspects of syntax, i.e., how the characteristics and constraints of the human brain shape the evolution of languages, and how we can take inspiration from the human language processing system to build better automatic parsers

E-Mail Website
Guest Editor
Universidade da Coruña, CITIC. Grupo LyS, Departamento de Ciencias da Computación e Tecnoloxías da Información, 15071 A Coruña, Spain
Interests: My research interests are in the application of natural language processing techniques to improve text mining systems, including information retrieval/extraction and sentiment analysis tasks. More specifically, my research work includes lexical analysis (e.g., tokenization); morphological analysis; (shallow) parsing; information retrieval; cross-language information retrieval; character n-gram level processing; machine translation; microtext processing (e.g., tweets); Spanish and Galician language NLP

Special Issue Information

Dear Colleagues,

Natural language processing and text mining technologies have experienced a revolution in the last few years, with substantial improvements in accuracy mainly due to the use of deep-learning neural networks and large pretrained models relying on huge amounts of data. Explicit representations of linguistic knowledge (such as parse trees, semantic dependencies, lexicons, linguistic rules, etc.) have lost their protagonist role in systems where neural networks perform the bulk of the task, often in an end-to-end fashion. However, it is far from guaranteed that the accuracy improvement gains from the advances in neural architectures will not plateau, as in previous occasions, highlighting the need to combine them with rich linguistic processing. Furthermore, end-to-end neural systems have limitations, especially in a context of multilingualism where low-resource languages are involved: black-box nature with limited explainability, data-induced bias, reliance on large amounts of data that may be unavailable for many of the thousands of languages existing in the world, high computational requirements, and large energy usage and contribution to global warming.

For all these reasons, approaches utilizing explicit linguistic knowledge are highly relevant and should be pursued by the research community. In this Special Issue, we thus focus on approaches to natural language processing and text mining with an emphasis on multilingualism or low-resource languages, and which include rich linguistic processing, in the sense that explicit linguistic knowledge plays a relevant role in the approach, be it exclusively or in combination with machine learning and neural approaches.

Prof. Dr. Miguel A. Alonso
Prof. Dr. Carlos Gómez-Rodríguez
Prof. Dr. Jesús Vilares
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Natural language processing
  • Multilingual language processing
  • Language resources
  • Linguistic knowledge
  • Text mining
  • Information retrieval
  • Sentiment analysis
  • Recommender systems
  • Explainable artificial intelligence
  • Data-induced bias in NLP systems

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

13 pages, 333 KiB  
Article
Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences
by Pablo Gamallo
Appl. Sci. 2021, 11(12), 5743; https://doi.org/10.3390/app11125743 - 21 Jun 2021
Cited by 3 | Viewed by 2218
Abstract
This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a [...] Read more.
This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models. Full article
(This article belongs to the Special Issue Rich Linguistic Processing for Multilingual Text Mining)
Show Figures

Figure 1

15 pages, 3903 KiB  
Article
Evaluation of the Coherence of Polish Texts Using Neural Network Models
by Sergii Telenyk, Sergiy Pogorilyy and Artem Kramov
Appl. Sci. 2021, 11(7), 3210; https://doi.org/10.3390/app11073210 - 02 Apr 2021
Cited by 3 | Viewed by 2251
Abstract
Coherence evaluation of texts falls into a category of natural language processing tasks. The evaluation of texts’ coherence implies the estimation of their semantic and logical integrity; such a feature of a text can be utilized during the solving of multidisciplinary tasks (SEO [...] Read more.
Coherence evaluation of texts falls into a category of natural language processing tasks. The evaluation of texts’ coherence implies the estimation of their semantic and logical integrity; such a feature of a text can be utilized during the solving of multidisciplinary tasks (SEO analysis, medicine area, detection of fake texts, etc.). In this paper, different state-of-the-art coherence evaluation methods based on machine learning models have been analyzed. The investigation of the effectiveness of different methods for the coherence estimation of Polish texts has been performed. The impact of text’s features on the output coherence value has been analyzed using different approaches of a semantic similarity graph. Two neural networks based on LSTM layers and a pre-trained BERT model correspondingly have been designed and trained for the coherence estimation of input texts. The results obtained may indicate that both lexical and semantic components should be taken into account during the coherence evaluation of Polish documents; moreover, it is advisable to analyze corresponding documents in a sentence-by-sentence manner taking into account word order. According to the retrieved accuracy of the proposed neural networks, it can be concluded that suggested models may be used in order to solve typical coherence estimation tasks for a Polish corpus. Full article
(This article belongs to the Special Issue Rich Linguistic Processing for Multilingual Text Mining)
Show Figures

Figure 1

Review

Jump to: Research

24 pages, 408 KiB  
Review
On the Use of Parsing for Named Entity Recognition
by Miguel A. Alonso, Carlos Gómez-Rodríguez and Jesús Vilares
Appl. Sci. 2021, 11(3), 1090; https://doi.org/10.3390/app11031090 - 25 Jan 2021
Cited by 9 | Viewed by 3945
Abstract
Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language [...] Read more.
Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task. Full article
(This article belongs to the Special Issue Rich Linguistic Processing for Multilingual Text Mining)
Back to TopTop