Trends in Artificial Intelligence and Data Mining: 2021 and Beyond

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 March 2022) | Viewed by 36654

Special Issue Editors

University Institute for Computer Research, University of Alicante, 03690 Alicante, Spain
Interests: data science; natural language processing
Special Issues, Collections and Topics in MDPI journals
1. Department of Software and Computing systems, University of Alicante, Alicante, Spain
2. U.I. for Computer Research, University of Alicante, Alicante, Spain
Interests: designing and developing Knowledge Discovery and Representation strategies; embedding semantic information into Machine Learning(ML)
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

You are cordially invited to submit your original research or review papers to this Special Issue entitled “Trends in Artificial Intelligence and Data Mining: 2021 and beyond” in Applied Sciences.

The Special Issue aims to meet the increasing demand for scientific enquiry on Artificial Intelligence trends related to Data Mining. The amount of data available every day is enormous and increasing at an exponential rate. Recently, there has been a growing interest in using complex methods to analyze and visualize massive data generated from very different knowledge domains: social networks, smart cities, security, health sciences, medical, business, education, or multimedia entertainment. This Special Issue is aimed at encouraging researchers and developers to publish original, innovative, and state-of-the-art machine complex methods algorithms, resources and architectures that analyze and visualize large amounts of data and solve a range of problems.

We are particularly interested in candidates who have conducted research on the theoretical or practical aspects of data miningin particular, text mining and knowledge discovery—which may be complemented by data that are heterogeneous―geolocation, categories, metadata, etc.― and multimodal ―sound, image, video, etc. These aspects can range from resources for improving or training machine learning algorithms, to algorithms that use complex methods―i.e. deep learning, chaos algorithms, genetic algorithms, cellular automata, etc.―and statistical learning methods, applied to one or more domains, such as digital media data; bioinformatics; health care; multimedia entertainment; social networks; natural language processing; and educational.

Potential topics include but are not limited to the following:

  • Soft Computing for multimedia and heterogeneous data analysis (text data processing required);
  • Deep learning (DL) in data mining (DM) and knowledge discovery (transfer learning is highly recommended);
  • Auto machine learning algorithms (AutoML) for DM;
  • Bias in machine learning (ML) and resources;
  • Adversarial challenges of ML for DM;
  • Democratization of resources and tool development for DM;
  • Explainable text mining models and semantics into ML;
  • Language generation from DM;
  • Corpora for ML;
  • Multimodal sentiment analysis and opinion mining using DL;
  • DL for education data learning (text data processing required).

The Special Issue is an opportunity to disseminate the scientific and technological development related to intelligent management of big data. Research accompanied by standardized resources and source codes will be positively received.

Dr. José Ignacio Abreu Salas
Dr. Yoan Gutiérrez Vázquez
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data mining
  • Artificial intelligence
  • Deep learning
  • Transfer learning
  • Auto machine learning
  • Knowledge discovery
  • Knowledge learning
  • Natural language processing
  • Heterogeneous data processing
  • Explainability of the machine learning
  • Language generation
  • Corpora for machine learning
  • Language understanding
  • Bias in machine learning
  • Adversarial challenges in machine learning
  • Democratic resource development
  • Semantic in machine learning
  • Bias in machine learning and resources…

Related Special Issue

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

21 pages, 2087 KiB  
Article
Cognitively Driven Arabic Text Readability Assessment Using Eye-Tracking
by Ibtehal Baazeem, Hend Al-Khalifa and Abdulmalik Al-Salman
Appl. Sci. 2021, 11(18), 8607; https://doi.org/10.3390/app11188607 - 16 Sep 2021
Cited by 2 | Viewed by 3120
Abstract
Using physiological data helps to identify the cognitive processing in the human brain. One method of obtaining these behavioral signals is by using eye-tracking technology. Previous cognitive psychology literature shows that readable and difficult-to-read texts are associated with certain eye movement patterns, which [...] Read more.
Using physiological data helps to identify the cognitive processing in the human brain. One method of obtaining these behavioral signals is by using eye-tracking technology. Previous cognitive psychology literature shows that readable and difficult-to-read texts are associated with certain eye movement patterns, which has recently encouraged researchers to use these patterns for readability assessment tasks. However, although it seems promising, this research direction has not been explored adequately, particularly for Arabic. The Arabic language is defined by its own rules and has its own characteristics and challenges. There is still a clear gap in determining the potential of using eye-tracking measures to improve Arabic text. Motivated by this, we present a pilot study to explore the extent to which eye-tracking measures enhance Arabic text readability. We collected the eye movements of 41 participants while reading Arabic texts to provide real-time processing of the text; these data were further analyzed and used to build several readability prediction models using different regression algorithms. The findings show an improvement in the readability prediction task, which requires further investigation. To the best of our knowledge, this work is the first study to explore the relationship between Arabic readability and eye movement patterns. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

22 pages, 1778 KiB  
Article
Inferring Agents’ Goals from Observing Successful Traces
by Guillaume Lorthioir, Katsumi Inoue and Gauvain Bourgne
Appl. Sci. 2021, 11(9), 4116; https://doi.org/10.3390/app11094116 - 30 Apr 2021
Viewed by 1425
Abstract
Goal recognition is a sub-field of plan recognition that focuses on the goals of an agent. Current approaches in goal recognition have not yet tried to apply concept learning to a propositional logic formalism. In this paper, we extend our method for inferring [...] Read more.
Goal recognition is a sub-field of plan recognition that focuses on the goals of an agent. Current approaches in goal recognition have not yet tried to apply concept learning to a propositional logic formalism. In this paper, we extend our method for inferring an agent’s possible goal by observing this agent in a series of successful attempts to reach its goal and using concept learning on these observations. We propose an algorithm, LFST (Learning From Successful Traces), to produce concise hypotheses about the agent’s goal. We show that if such a goal exists, our algorithm always provides a possible goal for the agent, and we evaluate the performance of our algorithm in different settings. We compare it to another concept-learning algorithm that uses a formalism close to ours, and we obtain better results at producing the hypotheses with our algorithm. We introduce a way to use assumptions about the agent’s behavior and the dynamics of the environment, thus improving the agent’s goal deduction by optimizing the potential goals’ search space. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

27 pages, 449 KiB  
Article
How Do You Speak about Immigrants? Taxonomy and StereoImmigrants Dataset for Identifying Stereotypes about Immigrants
by Javier Sánchez-Junquera, Berta Chulvi, Paolo Rosso and Simone Paolo Ponzetto
Appl. Sci. 2021, 11(8), 3610; https://doi.org/10.3390/app11083610 - 16 Apr 2021
Cited by 30 | Viewed by 4541
Abstract
Stereotype is a type of social bias massively present in texts that computational models use. There are stereotypes that present special difficulties because they do not rely on personal attributes. This is the case of stereotypes about immigrants, a social category that is [...] Read more.
Stereotype is a type of social bias massively present in texts that computational models use. There are stereotypes that present special difficulties because they do not rely on personal attributes. This is the case of stereotypes about immigrants, a social category that is a preferred target of hate speech and discrimination. We propose a new approach to detect stereotypes about immigrants in texts focusing not on the personal attributes assigned to the minority but in the frames, that is, the narrative scenarios, in which the group is placed in public speeches. We have proposed a fine-grained social psychology grounded taxonomy with six categories to capture the different dimensions of the stereotype (positive vs. negative) and annotated a novel StereoImmigrants dataset with sentences that Spanish politicians have stated in the Congress of Deputies. We aggregate these categories in two supracategories: one is Victims that expresses the positive stereotypes about immigrants and the other is Threat that expresses the negative stereotype. We carried out two preliminary experiments: first, to evaluate the automatic detection of stereotypes; and second, to distinguish between the two supracategories of immigrants’ stereotypes. In these experiments, we employed state-of-the-art transformer models (monolingual and multilingual) and four classical machine learning classifiers. We achieve above 0.83 of accuracy with the BETO model in both experiments, showing that transformers can capture stereotypes about immigrants with a high level of accuracy. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

16 pages, 1168 KiB  
Article
Automated Software Vulnerability Detection Based on Hybrid Neural Network
by Xin Li, Lu Wang, Yang Xin, Yixian Yang, Qifeng Tang and Yuling Chen
Appl. Sci. 2021, 11(7), 3201; https://doi.org/10.3390/app11073201 - 02 Apr 2021
Cited by 22 | Viewed by 4130
Abstract
Vulnerabilities threaten the security of information systems. It is crucial to detect and patch vulnerabilities before attacks happen. However, existing vulnerability detection methods suffer from long-term dependency, out of vocabulary, bias towards global features or local features, and coarse detection granularity. This paper [...] Read more.
Vulnerabilities threaten the security of information systems. It is crucial to detect and patch vulnerabilities before attacks happen. However, existing vulnerability detection methods suffer from long-term dependency, out of vocabulary, bias towards global features or local features, and coarse detection granularity. This paper proposes an automatic vulnerability detection framework in source code based on a hybrid neural network. First, the inputs are transformed into an intermediate representation with explicit structure information using lower level virtual machine intermediate representation (LLVM IR) and backward program slicing. After the transformation, the size of samples and the size of vocabulary are significantly reduced. A hybrid neural network model is then applied to extract high-level features of vulnerability, which learns features both from convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The former is applied to learn local vulnerability features, such as buffer size. Furthermore, the latter is utilized to learn global features, such as data dependency. The extracted features are made up of concatenated outputs of CNN and RNN. Experiments are performed to validate our vulnerability detection method. The results show that our proposed method achieves excellent results with F1-scores of 98.6% and accuracy of 99.0% on the SARD dataset. It outperforms state-of-the-art methods. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

26 pages, 400 KiB  
Article
A Survey on Bias in Deep NLP
by Ismael Garrido-Muñoz , Arturo Montejo-Ráez , Fernando Martínez-Santiago  and L. Alfonso Ureña-López 
Appl. Sci. 2021, 11(7), 3184; https://doi.org/10.3390/app11073184 - 02 Apr 2021
Cited by 72 | Viewed by 11392
Abstract
Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), [...] Read more.
Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

15 pages, 275 KiB  
Article
“Here Are the Rules: Ignore All Rules”: Automatic Contradiction Detection in Spanish
by Robiert Sepúlveda-Torres, Alba Bonet-Jover and Estela Saquete
Appl. Sci. 2021, 11(7), 3060; https://doi.org/10.3390/app11073060 - 30 Mar 2021
Cited by 10 | Viewed by 2126
Abstract
This paper tackles automatic detection of contradictions in Spanish within the news domain. Two pieces of information are classified as compatible, contradictory, or unrelated information. To deal with the task, the ES-Contradiction dataset was created. This dataset contains a balanced number of each [...] Read more.
This paper tackles automatic detection of contradictions in Spanish within the news domain. Two pieces of information are classified as compatible, contradictory, or unrelated information. To deal with the task, the ES-Contradiction dataset was created. This dataset contains a balanced number of each of the three types of information. The novelty of the research is the fine-grained annotation of the different types of contradictions in the dataset. Presently, four different types of contradictions are covered in the contradiction examples: negation, antonyms, numerical, and structural. However, future work will extend the dataset with all possible types of contradictions. In order to validate the effectiveness of the dataset, a pretrained model is used (BETO), and after performing different experiments, the system is able to detect contradiction with a F1m of 92.47%. Regarding the type of contradictions, the best results are obtained with negation contradiction (F1m = 98%), whereas structural contradictions obtain the lowest results (F1m = 69%) because of the smaller number of structural examples, due to the complexity of generating them. When dealing with a more generalistic dataset such as XNLI, our dataset fails to detect most of the contradictions properly, as the size of both datasets are very different and our dataset only covers four types of contradiction. However, using the classification of the contradictions leads us to conclude that there are highly complex contradictions that will need external knowledge in order to be properly detected and this will avoid the need for them to be previously exposed to the system. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
15 pages, 2179 KiB  
Article
Identifying Human Daily Activity Types with Time-Aware Interactions
by Renyao Chen, Hong Yao, Runjia Li, Xiaojun Kang, Shengwen Li, Lijun Dong and Junfang Gong
Appl. Sci. 2020, 10(24), 8922; https://doi.org/10.3390/app10248922 - 14 Dec 2020
Cited by 1 | Viewed by 1981
Abstract
Human activities embedded in crowdsourced data, such as social media trajectory, represent individual daily styles and patterns, which are valuable in many applications. However, the accurate identification of human activity types (HATs) from social media is challenging, possibly because interactions between posts and [...] Read more.
Human activities embedded in crowdsourced data, such as social media trajectory, represent individual daily styles and patterns, which are valuable in many applications. However, the accurate identification of human activity types (HATs) from social media is challenging, possibly because interactions between posts and users at different time are overlooked. To fill this gap, we propose a novel model that introduces the interactions hidden in social media and synthesizes Graph Convolutional Network (GCN) for identifying HAT. The model first characterizes interactions among words, posts, dates, and users, and then derives a Time Gated Human Activity Graph Convolutional Network (TG-HAGCN) to predict the HATs of social media trajectory. To examine the proposed model performance, we built a new dataset including interactions between post content, post time, and users from the open Yelp dataset. Experimental results show that exploiting interactions hidden in social media to recognize HATs achieves state-of-the-art performance with high accuracy. The study indicates that interactions among social media promotes ability of machine learning on social media data mining and intelligent applications, and offers a reference solution for how to fuse multi-type heterogeneous data in social media. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

Other

Jump to: Research

61 pages, 1859 KiB  
Systematic Review
K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions
by Abiodun M. Ikotun, Mubarak S. Almutari and Absalom E. Ezugwu
Appl. Sci. 2021, 11(23), 11246; https://doi.org/10.3390/app112311246 - 26 Nov 2021
Cited by 33 | Viewed by 4127
Abstract
K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the [...] Read more.
K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community. Full article
(This article belongs to the Special Issue Trends in Artificial Intelligence and Data Mining: 2021 and Beyond)
Show Figures

Figure 1

Back to TopTop