Intelligent Data Mining, Analysis and Modeling Based on Machine Learning

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 March 2024) | Viewed by 4049

Special Issue Editors


E-Mail Website
Guest Editor
College of Information Sciences and Technology, Beijing University of Chemical Technology, Beijing 100029, China
Interests: spatio-temporal big data analysis; artificial intelligence; deep learning; geographic Information science
School of Computer Science, Beijing University of Technology, Beijing 100124, China
Interests: spatio-temporal data analysis and positioning algorithms; geosocial data mining; information retrieval
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
Interests: information security; machine learning

Special Issue Information

Dear Colleagues,

In the realm of data-driven exploration, algorithms seamlessly intertwine with the digital landscape. Our focus converges at the forefront of Intelligent Data Mining, Analysis, and Modeling. This theme delves into the profound integration of machine learning techniques with the domains of data excavation, analysis, and model construction. Within this sphere, we embrace and surmount novel challenges. This Special Issue aims to present pioneering ideas and experimental outcomes in the domain of machine learning-based data mining, spanning from design, services, and theory to practical applications. It serves as a platform for the unveiling of breakthrough concepts and empirical discoveries, encompassing foundational theories to real-world implementations. Join us in exploring the transformative potential of machine learning within the realm of intelligent data exploration.

This Special Issue will publish high-quality and original research papers in the overlapping fields of:

  • Artificial intelligence;
  • Machine learning and deep learning;
  • Computational and data science;
  • Data integration and preprocessing;
  • Modeling methods and techniques;
  • Big data applications and algorithms;
  • Physics-informed neural network;
  • Spatiotemporal big data.

Dr. Danhuai Guo
Dr. Zhi Cai
Dr. Yuping Lai
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning and deep learning
  • computational and data science
  • data integration and preprocessing
  • modeling methods and techniques
  • big data applications and algorithms
  • physics-informed neural network
  • spatiotemporal big data

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1335 KiB  
Article
Link Prediction Based on Data Augmentation and Metric Learning Knowledge Graph Embedding
by Lijuan Duan, Shengwen Han, Wei Jiang, Meng He and Yuanhua Qiao
Appl. Sci. 2024, 14(8), 3412; https://doi.org/10.3390/app14083412 - 18 Apr 2024
Viewed by 302
Abstract
A knowledge graph is a repository that represents a vast amount of information in the form of triplets. In the training process of completing the knowledge graph, the knowledge graph only contains positive examples, which makes reliable link prediction difficult, especially in the [...] Read more.
A knowledge graph is a repository that represents a vast amount of information in the form of triplets. In the training process of completing the knowledge graph, the knowledge graph only contains positive examples, which makes reliable link prediction difficult, especially in the setting of complex relations. At the same time, current techniques that rely on distance models encapsulate entities within Euclidean space, limiting their ability to depict nuanced relationships and failing to capture their semantic importance. This research offers a unique strategy based on Gibbs sampling and connection embedding to improve the model’s competency in handling link prediction within complex relationships. Gibbs sampling is initially used to obtain high-quality negative samples. Following that, the triplet entities are mapped onto a hyperplane defined by the connection. This procedure produces complicated relationship embeddings loaded with semantic information. Through metric learning, this process produces complex relationship embeddings imbued with semantic meaning. Finally, the method’s effectiveness is demonstrated on three link prediction benchmark datasets FB15k-237, WN11RR and FB15k. Full article
Show Figures

Figure 1

20 pages, 1590 KiB  
Article
Query Optimization in Distributed Database Based on Improved Artificial Bee Colony Algorithm
by Yan Du, Zhi Cai and Zhiming Ding
Appl. Sci. 2024, 14(2), 846; https://doi.org/10.3390/app14020846 - 19 Jan 2024
Viewed by 894
Abstract
Query optimization is one of the key factors affecting the performance of database systems that aim to enact the query execution plan with minimum cost. Particularly in distributed database systems, due to the multiple copies of the data that are stored in different [...] Read more.
Query optimization is one of the key factors affecting the performance of database systems that aim to enact the query execution plan with minimum cost. Particularly in distributed database systems, due to the multiple copies of the data that are stored in different data nodes, resulting in the dramatic increase in the feasible query execution plans for a query statement. Because of the increasing volume of stored data, the cluster size of distributed databases also increases, resulting in poor performance of current query optimization algorithms. In this case, a dynamic perturbation-based artificial bee colony algorithm is proposed to solve the query optimization problem in distributed database systems. The improved artificial bee colony algorithm improves the global search capability by combining the selection, crossover, and mutation operators of the genetic algorithm to overcome the problem of falling into the local optimal solution easily. At the same time, the dynamic perturbation factor is introduced so that the algorithm parameters can be dynamically varied along with the process of iteration as well as the convergence degree of the whole population to improve the convergence efficiency of the algorithm. Finally, comparative experiments conducted to assess the average execution cost of Top-k query plans generated by the algorithms and the convergence speed of algorithms under the conditions of query statements in six different dimension sets. The results demonstrate that the Top-k query plans generated by the proposed method have a lower execution cost and a faster convergence speed, which can effectively improve the query efficiency. However, this method requires more execution time. Full article
Show Figures

Figure 1

13 pages, 729 KiB  
Article
Hybrid Clustering Algorithm Based on Improved Density Peak Clustering
by Limin Guo, Weijia Qin, Zhi Cai and Xing Su
Appl. Sci. 2024, 14(2), 715; https://doi.org/10.3390/app14020715 - 15 Jan 2024
Viewed by 748
Abstract
In the era of big data, unsupervised learning algorithms such as clustering are particularly prominent. In recent years, there have been significant advancements in clustering algorithm research. The Clustering by Density Peaks algorithm is known as Clustering by Fast Search and Find of [...] Read more.
In the era of big data, unsupervised learning algorithms such as clustering are particularly prominent. In recent years, there have been significant advancements in clustering algorithm research. The Clustering by Density Peaks algorithm is known as Clustering by Fast Search and Find of Density Peaks (density peak clustering). This clustering algorithm, proposed in Science in 2014, automatically finds cluster centers. It is simple, efficient, does not require iterative computation, and is suitable for large-scale and high-dimensional data. However, DPC and most of its refinements have several drawbacks. The method primarily considers the overall structure of the data, often resulting in the oversight of many clusters. The choice of truncation distance affects the calculation of local density values, and varying dataset sizes may necessitate different computational methods, impacting the quality of clustering results. In addition, the initial assignment of labels can cause a ‘chain reaction’, i.e., if one data point is incorrectly labeled, it may lead to more subsequent data points being incorrectly labeled. In this paper, we propose an improved density peak clustering method, DPC-MS, which uses the mean-shift algorithm to find local density extremes, making the accuracy of the algorithm independent of the parameter dc. After finding the local density extreme points, the allocation strategy of the DPC algorithm is employed to assign the remaining points to appropriate local density extreme points, forming the final clusters. The robustness of this method in handling uncertain dataset sizes adds some application value, and several experiments were conducted on synthetic and real datasets to evaluate the performance of the proposed method. The results show that the proposed method outperforms some of the more recent methods in most cases. Full article
Show Figures

Figure 1

22 pages, 23761 KiB  
Article
Robust Ranking Kernel Support Vector Machine via Manifold Regularized Matrix Factorization for Multi-Label Classification
by Heping Song, Yiming Zhou, Ebenezer Quayson, Qian Zhu and Xiangjun Shen
Appl. Sci. 2024, 14(2), 638; https://doi.org/10.3390/app14020638 - 11 Jan 2024
Viewed by 521
Abstract
Multi-label classification has been extensively researched and utilized for several decades. However, the performance of these methods is highly susceptible to the presence of noisy data samples, resulting in a significant decrease in accuracy when noise levels are high. To address this issue, [...] Read more.
Multi-label classification has been extensively researched and utilized for several decades. However, the performance of these methods is highly susceptible to the presence of noisy data samples, resulting in a significant decrease in accuracy when noise levels are high. To address this issue, we propose a robust ranking support vector machine (Rank-SVM) method that incorporates manifold regularized matrix factorization. Unlike traditional Rank-SVM methods, our approach integrates feature selection and multi-label learning into a unified framework. Within this framework, we employ matrix factorization to learn a low-rank robust subspace within the input space, thereby enhancing the robustness of data representation in high-noise conditions. Additionally, we incorporate manifold structure regularization into the framework to preserve manifold relationships among low-rank samples, which further improves the robustness of the low-rank representation. Leveraging on this robust low-rank representation, we extract a resilient low-rank features and employ them to construct a more effective classifier. Finally, the proposed framework is extended to derive a kernelized ranking approach, for the creation of nonlinear multi-label classifiers. To effectively solve this non-convex kernelized method, we employ the augmented Lagrangian multiplier (ALM) and alternating direction method of multipliers (ADMM) techniques to obtain the optimal solution. Experimental evaluations conducted on various datasets demonstrate that our framework achieves superior classification results and significantly enhances performance in high-noise scenarios. Full article
Show Figures

Figure 1

29 pages, 7450 KiB  
Article
Efficient Diagnosis of Autism Spectrum Disorder Using Optimized Machine Learning Models Based on Structural MRI
by Reem Ahmed Bahathiq, Haneen Banjar, Salma Kammoun Jarraya, Ahmed K. Bamaga and Rahaf Almoallim
Appl. Sci. 2024, 14(2), 473; https://doi.org/10.3390/app14020473 - 05 Jan 2024
Viewed by 881
Abstract
Autism spectrum disorder (ASD) affects approximately 1.4% of the population and imposes significant social and economic burdens. Because its etiology is unknown, effective diagnosis is challenging. Advancements in structural magnetic resonance imaging (sMRI) allow for the objective assessment of ASD by examining structural [...] Read more.
Autism spectrum disorder (ASD) affects approximately 1.4% of the population and imposes significant social and economic burdens. Because its etiology is unknown, effective diagnosis is challenging. Advancements in structural magnetic resonance imaging (sMRI) allow for the objective assessment of ASD by examining structural brain changes. Recently, machine learning (ML)-based diagnostic systems have emerged to expedite and enhance the diagnostic process. However, the expected success in ASD was not yet achieved. This study evaluates and compares the performance of seven optimized ML models to identify sMRI-based biomarkers for early and accurate detection of ASD in children aged 5 to 10 years. The effect of using hyperparameter tuning and feature selection techniques are investigated using two public datasets from Autism Brain Imaging Data Exchange Initiative. Furthermore, these models are tested on a local Saudi dataset to verify their generalizability. The integration of the grey wolf optimizer with a support vector machine achieved the best performance with an average accuracy of 71% (with further improvement to 71% after adding personal features) using 10-fold Cross-validation. The optimized models identified relevant biomarkers for diagnosis, lending credence to their truly generalizable nature and advancing scientific understanding of neurological changes in ASD. Full article
Show Figures

Figure 1

Back to TopTop