Advances in Machine Learning and Intelligent Information Systems

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: 31 July 2024 | Viewed by 25269

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Computer Science and Engineering, University Ss. Cyril and Methodius, Skopje, North Macedonia
Interests: big data; stream processing; machine learning; time series analysis; data warehouses
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Interests: machine learning; multivariate time series data analysis; deep learning; software architectures

Special Issue Information

Dear Colleagues,

At present, the success of companies is defined by their ability to cope and adapt to new needs and upcoming trends. This includes new and everchanging patterns and requirements in data generation, data acquisition, data processing, data understanding, and data visualization. Furthermore, extracting meaningful knowledge is paramount and challenging in such a dynamic, data-driven world. To help the industries cope with these needs, there have been numerous technological developments in recent years in the fields of big data processing, machine learning on streaming data, cloud data warehouses and data lakes, intelligent decision support systems, etc.

This Special Issue encourages the submission of papers presenting state-of-the-art research and application of machine learning approaches in various industrial settings. Topics of interest include (but are not limited to) the following subject categories:

  • Big data.
  • Streaming data.
  • Stream processing.
  • Scalable cloud infrastructures.
  • Deep learning and machine learning (DL/ML) on big data.
  • Real-time analytics.
  • Multi-variate time series.
  • Data fusion.
  • Cloud data warehouses.
  • Data lakes.
  • Multi-cloud data processing architectures.
  • Application of ML in medicine and health informatics.
  • Application of ML in retail.
  • Application of ML in banking, financial services, and insurance (BFSI).
  • Data Fabric & Data Mesh architectures

Dr. Eftim Zdravevski
Prof. Dr. Petre Lameski
Prof. Dr. Ivan Miguel Pires
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • machine learning
  • industrial applications
  • data lakes
  • data fusion
  • streaming data
  • real-time analytics

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 359 KiB  
Article
Automated Trace Clustering Pipeline Synthesis in Process Mining
by Iuliana Malina Grigore, Gabriel Marques Tavares, Matheus Camilo da Silva, Paolo Ceravolo and Sylvio Barbon Junior
Information 2024, 15(4), 241; https://doi.org/10.3390/info15040241 - 20 Apr 2024
Viewed by 665
Abstract
Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity of business processes and the abundance of event data have driven the development and widespread adoption of process mining techniques. However, the size and [...] Read more.
Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity of business processes and the abundance of event data have driven the development and widespread adoption of process mining techniques. However, the size and noise of event logs pose challenges that require careful analysis. The inclusion of different sets of behaviors within the same business process further complicates data representation, highlighting the continued need for innovative solutions in the evolving field of process mining. Trace clustering is emerging as a solution to improve the interpretation of underlying business processes. Trace clustering offers benefits such as mitigating the impact of outliers, providing valuable insights, reducing data dimensionality, and serving as a preprocessing step in robust pipelines. However, designing an appropriate clustering pipeline can be challenging for non-experts due to the complexity of the process and the number of steps involved. For experts, it can be time-consuming and costly, requiring careful consideration of trade-offs. To address the challenge of pipeline creation, the paper proposes a genetic programming solution for trace clustering pipeline synthesis that optimizes a multi-objective function matching clustering and process quality metrics. The solution is applied to real event logs, and the results demonstrate improved performance in downstream tasks through the identification of sub-logs. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

26 pages, 7467 KiB  
Article
Exploring Key Issues in Cybersecurity Data Breaches: Analyzing Data Breach Litigation with ML-Based Text Analytics
by Dominik Molitor, Wullianallur Raghupathi, Aditya Saharia and Viju Raghupathi
Information 2023, 14(11), 600; https://doi.org/10.3390/info14110600 - 5 Nov 2023
Viewed by 3801
Abstract
While data breaches are a frequent and universal phenomenon, the characteristics and dimensions of data breaches are unexplored. In this novel exploratory research, we apply machine learning (ML) and text analytics to a comprehensive collection of data breach litigation cases to extract insights [...] Read more.
While data breaches are a frequent and universal phenomenon, the characteristics and dimensions of data breaches are unexplored. In this novel exploratory research, we apply machine learning (ML) and text analytics to a comprehensive collection of data breach litigation cases to extract insights from the narratives contained within these cases. Our analysis shows stakeholders (e.g., litigants) are concerned about major topics related to identity theft, hacker, negligence, FCRA (Fair Credit Reporting Act), cybersecurity, insurance, phone device, TCPA (Telephone Consumer Protection Act), credit card, merchant, privacy, and others. The topics fall into four major clusters: “phone scams”, “cybersecurity”, “identity theft”, and “business data breach”. By utilizing ML, text analytics, and descriptive data visualizations, our study serves as a foundational piece for comprehensively analyzing large textual datasets. The findings hold significant implications for both researchers and practitioners in cybersecurity, especially those grappling with the challenges of data breaches. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

13 pages, 397 KiB  
Article
Knowledge Graph Based Recommender for Automatic Playlist Continuation
by Aleksandar Ivanovski, Milos Jovanovik, Riste Stojanov and Dimitar Trajanov
Information 2023, 14(9), 510; https://doi.org/10.3390/info14090510 - 16 Sep 2023
Viewed by 1619
Abstract
In this work, we present a state-of-the-art solution for automatic playlist continuation through a knowledge graph-based recommender system. By integrating representational learning with graph neural networks and fusing multiple data streams, the system effectively models user behavior, leading to accurate and personalized recommendations. [...] Read more.
In this work, we present a state-of-the-art solution for automatic playlist continuation through a knowledge graph-based recommender system. By integrating representational learning with graph neural networks and fusing multiple data streams, the system effectively models user behavior, leading to accurate and personalized recommendations. We provide a systematic and thorough comparison of our results with existing solutions and approaches, demonstrating the remarkable potential of graph-based representation in improving recommender systems. Our experiments reveal substantial enhancements over existing approaches, further validating the efficacy of this novel approach. Additionally, through comprehensive evaluation, we highlight the robustness of our solution in handling dynamic user interactions and streaming data scenarios, showcasing its practical viability and promising prospects for next-generation recommender systems. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

18 pages, 5379 KiB  
Article
AttG-BDGNets: Attention-Guided Bidirectional Dynamic Graph IndRNN for Non-Intrusive Load Monitoring
by Zuoxin Wang and Xiaohu Zhao
Information 2023, 14(7), 383; https://doi.org/10.3390/info14070383 - 4 Jul 2023
Viewed by 1146
Abstract
Most current non-intrusive load monitoring methods focus on traditional load characteristic analysis and algorithm optimization, lack knowledge of users’ electricity consumption behavior habits, and have poor accuracy. We propose a novel attention-guided bidirectional dynamic graph IndRNN approach. The method first extends sequence or [...] Read more.
Most current non-intrusive load monitoring methods focus on traditional load characteristic analysis and algorithm optimization, lack knowledge of users’ electricity consumption behavior habits, and have poor accuracy. We propose a novel attention-guided bidirectional dynamic graph IndRNN approach. The method first extends sequence or multidimensional data to a topological graph structure. It effectively utilizes the global context by following an adaptive graph topology derived from each set of data content. Then, the bidirectional Graph IndRNN network (Graph IndRNN) encodes the aggregated signals into different graph nodes, which use node information transfer and aggregation based on the entropy measure, power attribute characteristics, and the time-related structural characteristics of the corresponding device signals. The function dynamically incorporates local and global contextual interactions from positive and negative directions to learn the neighboring node information for non-intrusive load decomposition. In addition, using the sequential attention mechanism as a guide while eliminating redundant information facilitates flexible reasoning and establishes good vertex relationships. Finally, we conducted experimental evaluations on multiple open source data, proving that the method has good robustness and accuracy. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

20 pages, 4094 KiB  
Article
Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes
by Kanti Singh Sangher, Archana Singh, Hari Mohan Pandey and Vivek Kumar
Information 2023, 14(6), 349; https://doi.org/10.3390/info14060349 - 18 Jun 2023
Cited by 2 | Viewed by 2977
Abstract
The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used “secretive spaces” to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the [...] Read more.
The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used “secretive spaces” to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the Dark Web is challenging due to its size, intractability, and anonymity. Therefore, it is crucial to intelligently enforce tools and techniques capable of identifying the activities of the Dark Web to assist law enforcement agencies as a support system. Therefore, this study proposes four deep learning architectures (RNN, CNN, LSTM, and Transformer)-based classification models using the pre-trained word embedding representations to identify illicit activities related to cybercrimes on Dark Web forums. We used the Agora dataset derived from the DarkNet market archive, which lists 109 activities by category. The listings in the dataset are vaguely described, and several data points are untagged, which rules out the automatic labeling of category items as target classes. Hence, to overcome this constraint, we applied a meticulously designed human annotation scheme to annotate the data, taking into account all the attributes to infer the context. In this research, we conducted comprehensive evaluations to assess the performance of our proposed approach. Our proposed BERT-based classification model achieved an accuracy score of 96%. Given the unbalancedness of the experimental data, our results indicate the advantage of our tailored data preprocessing strategies and validate our annotation scheme. Thus, in real-world scenarios, our work can be used to analyze Dark Web forums and identify cybercrimes by law enforcement agencies and can pave the path to develop sophisticated systems as per the requirements. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

19 pages, 624 KiB  
Article
Artificially Intelligent Readers: An Adaptive Framework for Original Handwritten Numerical Digits Recognition with OCR Methods
by Parth Hasmukh Jain, Vivek Kumar, Jim Samuel, Sushmita Singh, Abhinay Mannepalli and Richard Anderson
Information 2023, 14(6), 305; https://doi.org/10.3390/info14060305 - 26 May 2023
Cited by 3 | Viewed by 3356
Abstract
Advanced artificial intelligence (AI) techniques have led to significant developments in optical character recognition (OCR) technologies. OCR applications, using AI techniques for transforming images of typed text, handwritten text, or other forms of text into machine-encoded text, provide a fair degree of accuracy [...] Read more.
Advanced artificial intelligence (AI) techniques have led to significant developments in optical character recognition (OCR) technologies. OCR applications, using AI techniques for transforming images of typed text, handwritten text, or other forms of text into machine-encoded text, provide a fair degree of accuracy for general text. However, even after decades of intensive research, creating OCR with human-like abilities has remained evasive. One of the challenges has been that OCR models trained on general text do not perform well on localized or personalized handwritten text due to differences in the writing style of alphabets and digits. This study aims to discuss the steps needed to create an adaptive framework for OCR models, with the intent of exploring a reasonable method to customize an OCR solution for a unique dataset of English language numerical digits were developed for this study. We develop a digit recognizer by training our model on the MNIST dataset with a convolutional neural network and contrast it with multiple models trained on combinations of the MNIST and custom digits. Using our methods, we observed results comparable with the baseline and provided recommendations for improving OCR accuracy for localized or personalized handwritten text. This study also provides an alternative perspective to generating data using conventional methods, which can serve as a gold standard for custom data augmentation to help address the challenges of scarce data and data imbalance. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

18 pages, 8972 KiB  
Article
Emoji, Text, and Sentiment Polarity Detection Using Natural Language Processing
by Shelley Gupta, Archana Singh and Vivek Kumar
Information 2023, 14(4), 222; https://doi.org/10.3390/info14040222 - 5 Apr 2023
Cited by 5 | Viewed by 3171
Abstract
Virtual users generate a gigantic volume of unbalanced sentiments over various online crowd-sourcing platforms which consist of text, emojis, or a combination of both. Its accurate analysis brings profits to various industries and their services. The state-of-art detects sentiment polarity using common sense [...] Read more.
Virtual users generate a gigantic volume of unbalanced sentiments over various online crowd-sourcing platforms which consist of text, emojis, or a combination of both. Its accurate analysis brings profits to various industries and their services. The state-of-art detects sentiment polarity using common sense with text only. The research work proposes an emoji-based framework for cognitive–conceptual–affective computing of sentiment polarity based on the linguistic patterns of text and emojis. The proposed emoji and text-based parser articulates sentiments with proposed linguistic features along with a combination of different emojis to generate the part of speech into n-gram patterns. In this paper, the sentiments of 650 world-famous personages consisting of 1,68,548 tweets have been downloaded from across the world. The results illustrate that the proposed natural language processing framework shows that the existence of emojis in sentiments many times seems to change the overall polarity of the sentiment. By extension, the CLDR name of the emoji is utilized to evaluate the accurate polarity of emoji patterns, and a dictionary of sentiments is adopted for evaluating the polarity of text. Eventually, the performances of three ML classifiers (SVM, DT, and Naïve Bayes) are evaluated for proposed distinctive linguistic features. The robust experiments indicate that the proposed approach outperforms the SVM classifier as compared to other ML classifiers. The proposed polarity detection generator has achieved an exceptional perspective of sentiments presented in the sentence by employing the flow of concept established, based on linguistic features, polarity inversion, coordination, and discourse patterns, surpassing the performance of extant state-of-the-art approaches. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

19 pages, 9708 KiB  
Article
Tracking Unauthorized Access Using Machine Learning and PCA for Face Recognition Developments
by Vasile-Daniel Păvăloaia and George Husac
Information 2023, 14(1), 25; https://doi.org/10.3390/info14010025 - 30 Dec 2022
Cited by 3 | Viewed by 3013
Abstract
In the last two decades there has been obtained tremendous improvements in the field of artificial intelligence (AI) especially in the sector of face/facial recognition (FR). Over the years, the world obtained remarkable progress in the technology that enhanced the face detection techniques [...] Read more.
In the last two decades there has been obtained tremendous improvements in the field of artificial intelligence (AI) especially in the sector of face/facial recognition (FR). Over the years, the world obtained remarkable progress in the technology that enhanced the face detection techniques use on common PCs and smartphones. Moreover, the steadily progress of programming languages, libraries, frameworks, and tools combined with the great passion of developers and researchers worldwide contribute substantially to open-source AI materials that produced machine learning (ML) algorithms available to any scholar with the will to build the software of tomorrow. The study aims to analyze the specialized literature starting from the first prototype delivered by Cambridge University until the most recent discoveries in FR. The purpose is to identify the most proficient algorithms, and the existing gap in the specialized literature. The research builds a FR application based on simplicity and efficiency of code that facilitates a person’s face detection using a real time photo and validate the access by querying a given database. The paper brings contribution to the field throughout the literature review analysis as well as by the customized code in Phyton, using ML with Principal Component Analysis (PCA), AdaBoost and MySQL for a myriad of application’s development in a variety of domains. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

15 pages, 5720 KiB  
Article
The Use of Random Forest Regression for Estimating Leaf Nitrogen Content of Oil Palm Based on Sentinel 1-A Imagery
by Sirojul Munir, Kudang Boro Seminar, Sudradjat, Heru Sukoco and Agus Buono
Information 2023, 14(1), 10; https://doi.org/10.3390/info14010010 - 26 Dec 2022
Cited by 4 | Viewed by 2395
Abstract
For obtaining a spatial map of the distribution of nitrogen nutrients from oil palm plantations, a quite complex Leaf Sampling Unit (LSU) is required. In addition, sample analysis in the laboratory is time consuming and quite expensive, especially for large plantation areas. Monitoring [...] Read more.
For obtaining a spatial map of the distribution of nitrogen nutrients from oil palm plantations, a quite complex Leaf Sampling Unit (LSU) is required. In addition, sample analysis in the laboratory is time consuming and quite expensive, especially for large plantation areas. Monitoring the nutrition of oil palm plants can be achieved using remote-sensing technology. The main obstacles of using passive sensors in multispectral imagery are cloud cover and shadow noise. This research used C-SAR Sentinel equipped with active sensors that can overcome cloud barriers. A model to estimate leaf nitrogen nutrient status was constructed using random forest regression (RFR) based on multiple polarization (VV-VH) and local incidence angle (LIA) data on Sentinel-1A imagery. A sample of 1116 LSU data from different islands (i.e., Sumatra, Java, and Kalimantan) was used to develop the proposed estimation model. The performance evaluation of the model obtained the averaged MAPE, correctness, and MSE of 9.68%, 90.32% and 11.03%, respectively. Spatial maps of the distribution of nitrogen values in certain oil palm areas can be produced and visualized on the web so that they can be accessed easily and quickly for various purposes of oil palm management such as fertilization planning, recommendations, and monitoring. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

19 pages, 396 KiB  
Article
Regularized Mixture Rasch Model
by Alexander Robitzsch
Information 2022, 13(11), 534; https://doi.org/10.3390/info13110534 - 10 Nov 2022
Cited by 4 | Viewed by 1569
Abstract
The mixture Rasch model is a popular mixture model for analyzing multivariate binary data. The drawback of this model is that the number of estimated parameters substantially increases with an increasing number of latent classes, which, in turn, hinders the interpretability of model [...] Read more.
The mixture Rasch model is a popular mixture model for analyzing multivariate binary data. The drawback of this model is that the number of estimated parameters substantially increases with an increasing number of latent classes, which, in turn, hinders the interpretability of model parameters. This article proposes regularized estimation of the mixture Rasch model that imposes some sparsity structure on class-specific item difficulties. We illustrate the feasibility of the proposed modeling approach by means of one simulation study and two simulated case studies. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Information Systems)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Design, building and deployment of smart applications for predicting Remaining Useful Life (RUL) in industrial case uses
Authors: Marta Zorrilla
Affiliation: Department of Computer Science and Electronics, University of Cantabria, Avda. Los Castros s/n, Santander, 39005, Spain
Abstract: This paper presents a comparative analysis of deep learning techniques for predicting Remaining Useful Life (RUL) . We explore various deep learning architectures on distinct datasets, including recurrent neural networks (RNNs, LSTMs and GRUs), convolutional neural networks (CNNs) and Transformers, to assess their effectiveness in RUL estimation. Furthermore, we employ explainability techniques to elucidate the decision-making processes of these models and evaluate their interpretability. By analysing the inner workings of the models, we aim at providing insights into the factors influencing RUL predictions . Through comprehensive experimentation and analysis, this study contributes to the understanding of deep learning methodologies for RUL prediction and underscores the importance of model interpretability in critical applications such as prognostics and health management. On the other hand, we specify the smart system using the RAI4.0 Metamodel, meant for designing, configuring and automatically deploying distributed stream-based industrial applications. Our findings will offer valuable guidance for practitioners seeking to deploy deep learning techniques effectively in predictive maintenance systems, facilitating informed decision-making and enhancing reliability and efficiency in industrial operations.

Back to TopTop