Advanced Research in Data-Centric AI

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 31 October 2024 | Viewed by 6841

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Portland State University, Portland, OR 97201, USA
Interests: feature engineering; data mining; reinforcement learning

E-Mail Website
Guest Editor
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
Interests: spatiotemporal graph network; point process
Department of Computer Science, University of Texas Rio Grande Valleydisabled, Brownsville, TX 78520, USA
Interests: time series; spatial-temporal pattern mining
Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
Interests: data mining; big data analytics

Special Issue Information

Dear Colleagues,

The primary focus of machine learning is often on developing models to fit a particular dataset. However, in real-world scenarios, data can often be untidy, and refining models may not be the most effective way to enhance their performance. An alternative approach is to concentrate on improving the dataset itself rather than considering it as a fixed input. Data-Centric AI (DCAI) is an up-and-coming field that deals with techniques which systematically enhance datasets, often leading to notable improvements in practical machine learning applications.

The purpose of this Special Issue is to encourage the development of a dynamic and collaborative interdisciplinary community focused on addressing real-world data challenges through DCAI. These challenges involve several areas, such as data acquisition and creation, data labeling, data preprocessing and enhancement, data quality assessment, data debt, and data governance. As many of these domains are still emerging, we endeavor to foster an environment that collates experts to define and shape the DCAI movement to significantly impact the future of AI and ML.

Dr. Kunpeng Liu
Dr. Pengfei Wang
Dr. Yifeng Gao
Dr. Yanjie Fu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • automated data science
  • data pre-processing
  • big data analytics
  • feature engineering
  • reinforcement learning
  • time series
  • graph mining
  • open-source datasets
  • cross-dataset mining
  • spatiotemporal data mining
  • statistical machine learning
  • bioinformatics
  • distribution shift

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 2220 KiB  
Article
Robust Bias Compensation Method for Sparse Normalized Quasi-Newton Least-Mean with Variable Mixing-Norm Adaptive Filtering
by Ying-Ren Chien, Han-En Hsieh and Guobing Qian
Mathematics 2024, 12(9), 1310; https://doi.org/10.3390/math12091310 - 25 Apr 2024
Viewed by 189
Abstract
Input noise causes inescapable bias to the weight vectors of the adaptive filters during the adaptation processes. Moreover, the impulse noise at the output of the unknown systems can prevent bias compensation from converging. This paper presents a robust bias compensation method for [...] Read more.
Input noise causes inescapable bias to the weight vectors of the adaptive filters during the adaptation processes. Moreover, the impulse noise at the output of the unknown systems can prevent bias compensation from converging. This paper presents a robust bias compensation method for a sparse normalized quasi-Newton least-mean (BC-SNQNLM) adaptive filtering algorithm to address these issues. We have mathematically derived the biased-compensation terms in an impulse noisy environment. Inspired by the convex combination of adaptive filters’ step sizes, we propose a novel variable mixing-norm method, BC-SNQNLM-VMN, to accelerate the convergence of our BC-SNQNLM algorithm. Simulation results confirm that the proposed method significantly outperforms other comparative works regarding normalized mean-squared deviation (NMSD) in the steady state. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

20 pages, 714 KiB  
Article
TabFedSL: A Self-Supervised Approach to Labeling Tabular Data in Federated Learning Environments
by Ruixiao Wang, Yanxin Hu, Zhiyu Chen, Jianwei Guo and Gang Liu
Mathematics 2024, 12(8), 1158; https://doi.org/10.3390/math12081158 - 12 Apr 2024
Viewed by 349
Abstract
Currently, self-supervised learning has shown effectiveness in solving data labeling issues. Its success mainly depends on having access to large, high-quality datasets with diverse features. It also relies on utilizing the spatial, temporal, and semantic structures present in the data. However, domains such [...] Read more.
Currently, self-supervised learning has shown effectiveness in solving data labeling issues. Its success mainly depends on having access to large, high-quality datasets with diverse features. It also relies on utilizing the spatial, temporal, and semantic structures present in the data. However, domains such as finance, healthcare, and insurance primarily utilize tabular data formats. This presents challenges for traditional data augmentation methods aimed at improving data quality. Furthermore, the privacy-sensitive nature of these domains complicates the acquisition of the extensive, high-quality datasets necessary for training effective self-supervised models. To tackle these challenges, our proposal introduces a novel framework that combines self-supervised learning with Federated Learning (FL). This approach aims to solve the problem of data-distributed training while ensuring training quality. Our framework improves upon the conventional self-supervised learning data augmentation paradigm by incorporating data labeling through the segmentation of data into subsets. Our framework adds noise by splitting subsets of data and can achieve the same level of centralized learning in a distributed environment. Moreover, we conduct experiments on various public tabular datasets to evaluate our approach. The experimental results showcase the effectiveness and generalizability of our proposed method in scenarios involving unlabeled data and distributed settings. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

24 pages, 14284 KiB  
Article
Mask2Former with Improved Query for Semantic Segmentation in Remote-Sensing Images
by Shichen Guo, Qi Yang, Shiming Xiang, Shuwen Wang and Xuezhi Wang
Mathematics 2024, 12(5), 765; https://doi.org/10.3390/math12050765 - 04 Mar 2024
Viewed by 989
Abstract
Semantic segmentation of remote sensing (RS) images is vital in various practical applications, including urban construction planning, natural disaster monitoring, and land resources investigation. However, RS images are captured by airplanes or satellites at high altitudes and long distances, resulting in ground objects [...] Read more.
Semantic segmentation of remote sensing (RS) images is vital in various practical applications, including urban construction planning, natural disaster monitoring, and land resources investigation. However, RS images are captured by airplanes or satellites at high altitudes and long distances, resulting in ground objects of the same category being scattered in various corners of the image. Moreover, objects of different sizes appear simultaneously in RS images. For example, some objects occupy a large area in urban scenes, while others only have small regions. Technically, the above two universal situations pose significant challenges to the segmentation with a high quality for RS images. Based on these observations, this paper proposes a Mask2Former with an improved query (IQ2Former) for this task. The fundamental motivation behind the IQ2Former is to enhance the capability of the query of Mask2Former by exploiting the characteristics of RS images well. First, we propose the Query Scenario Module (QSM), which aims to learn and group the queries from feature maps, allowing the selection of distinct scenarios such as the urban and rural areas, building clusters, and parking lots. Second, we design the query position module (QPM), which is developed to assign the image position information to each query without increasing the number of parameters, thereby enhancing the model’s sensitivity to small targets in complex scenarios. Finally, we propose the query attention module (QAM), which is constructed to leverage the characteristics of query attention to extract valuable features from the preceding queries. Being positioned between the duplicated transformer decoder layers, QAM ensures the comprehensive utilization of the supervisory information and the exploitation of those fine-grained details. Architecturally, the QSM, QPM, and QAM as well as an end-to-end model are assembled to achieve high-quality semantic segmentation. In comparison to the classical or state-of-the-art models (FCN, PSPNet, DeepLabV3+, OCRNet, UPerNet, MaskFormer, Mask2Former), IQ2Former has demonstrated exceptional performance across three publicly challenging remote-sensing image datasets, 83.59 mIoU on the Vaihingen dataset, 87.89 mIoU on Potsdam dataset, and 56.31 mIoU on LoveDA dataset. Additionally, overall accuracy, ablation experiment, and visualization segmentation results all indicate IQ2Former validity. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

20 pages, 854 KiB  
Article
A Community Detection and Graph-Neural-Network-Based Link Prediction Approach for Scientific Literature
by Chunjiang Liu, Yikun Han, Haiyun Xu, Shihan Yang, Kaidi Wang and Yongye Su
Mathematics 2024, 12(3), 369; https://doi.org/10.3390/math12030369 - 24 Jan 2024
Viewed by 1388
Abstract
This study presents a novel approach that synergizes community detection algorithms with various Graph Neural Network (GNN) models to bolster link prediction in scientific literature networks. By integrating the Louvain community detection algorithm into our GNN frameworks, we consistently enhanced the performance across [...] Read more.
This study presents a novel approach that synergizes community detection algorithms with various Graph Neural Network (GNN) models to bolster link prediction in scientific literature networks. By integrating the Louvain community detection algorithm into our GNN frameworks, we consistently enhanced the performance across all models tested. For example, integrating the Louvain model with the GAT model resulted in an AUC score increase from 0.777 to 0.823, exemplifying the typical improvements observed. Similar gains were noted when the Louvain model was paired with other GNN architectures, confirming the robustness and effectiveness of incorporating community-level insights. This consistent increase in performance—reflected in our extensive experimentation on bipartite graphs of scientific collaborations and citations—highlights the synergistic potential of combining community detection with GNNs to overcome common link prediction challenges such as scalability and resolution limits. Our findings advocate for the integration of community structures as a significant step forward in the predictive accuracy of network science models, offering a comprehensive understanding of scientific collaboration patterns through the lens of advanced machine learning techniques. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

24 pages, 6195 KiB  
Article
Enhanced Sea Horse Optimization Algorithm for Hyperparameter Optimization of Agricultural Image Recognition
by Zhuoshi Li, Shizheng Qu, Yinghang Xu, Xinwei Hao and Nan Lin
Mathematics 2024, 12(3), 368; https://doi.org/10.3390/math12030368 - 23 Jan 2024
Viewed by 714
Abstract
Deep learning technology has made significant progress in agricultural image recognition tasks, but the parameter adjustment of deep models usually requires a lot of manual intervention, which is time-consuming and inefficient. To solve this challenge, this paper proposes an adaptive parameter tuning strategy [...] Read more.
Deep learning technology has made significant progress in agricultural image recognition tasks, but the parameter adjustment of deep models usually requires a lot of manual intervention, which is time-consuming and inefficient. To solve this challenge, this paper proposes an adaptive parameter tuning strategy that combines sine–cosine algorithm with Tent chaotic mapping to enhance sea horse optimization, which improves the search ability and convergence stability of standard sea horse optimization algorithm (SHO). Through adaptive optimization, this paper determines the best parameter configuration in ResNet-50 neural network and optimizes the model performance. The improved ESHO algorithm shows superior optimization effects than other algorithms in various performance indicators. The improved model achieves 96.7% accuracy in the corn disease image recognition task, and 96.4% accuracy in the jade fungus image recognition task. These results show that ESHO can not only effectively improve the accuracy of agricultural image recognition, but also reduce the need for manual parameter adjustment. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

25 pages, 5269 KiB  
Article
Application of Artificial Intelligence Methods for Predicting the Compressive Strength of Green Concretes with Rice Husk Ash
by Miljan Kovačević, Marijana Hadzima-Nyarko, Ivanka Netinger Grubeša, Dorin Radu and Silva Lozančić
Mathematics 2024, 12(1), 66; https://doi.org/10.3390/math12010066 - 24 Dec 2023
Cited by 1 | Viewed by 1079
Abstract
To promote sustainable growth and minimize the greenhouse effect, rice husk fly ash can be used instead of a certain amount of cement. The research models the effects of using rice fly ash as a substitute for regular Portland cement on the compressive [...] Read more.
To promote sustainable growth and minimize the greenhouse effect, rice husk fly ash can be used instead of a certain amount of cement. The research models the effects of using rice fly ash as a substitute for regular Portland cement on the compressive strength of concrete. In this study, different machine-learning techniques are investigated and a procedure to determine the optimal model is provided. A database of 909 analyzed samples forms the basis for creating forecast models. The derived models are assessed using the accuracy criteria RMSE, MAE, MAPE, and R. The research shows that artificial intelligence techniques can be used to model the compressive strength of concrete with acceptable accuracy. It is also possible to evaluate the importance of specific input variables and their influence on the strength of such concrete. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

25 pages, 26532 KiB  
Article
Statistical Image Watermark Algorithm for FAPHFMs Domain Based on BKF–Rayleigh Distribution
by Siyu Yang, Ansheng Deng and Hui Cui
Mathematics 2023, 11(23), 4720; https://doi.org/10.3390/math11234720 - 21 Nov 2023
Viewed by 965
Abstract
In the field of image watermarking, imperceptibility, robustness, and watermarking capacity are key indicators for evaluating the performance of watermarking techniques. However, these three factors are often mutually constrained, posing a challenge in achieving a balance among them. To address this issue, this [...] Read more.
In the field of image watermarking, imperceptibility, robustness, and watermarking capacity are key indicators for evaluating the performance of watermarking techniques. However, these three factors are often mutually constrained, posing a challenge in achieving a balance among them. To address this issue, this paper presents a novel image watermark detection algorithm based on local fast and accurate polar harmonic Fourier moments (FAPHFMs) and the BKF–Rayleigh distribution model. Firstly, the original image is chunked without overlapping, the entropy value is calculated, the high-entropy chunks are selected in descending order, and the local FAPHFM magnitudes are calculated. Secondly, the watermarking signals are embedded into the robust local FAPHFM magnitudes by the multiplication function, and then MMLE based on the RSS method is utilized to estimate the statistical parameters of the BKF–Rayleigh distribution model. Finally, a blind image watermarking detector is designed using BKF–Rayleigh distribution and LO decision criteria. In addition, we derive the closed expression of the watermark detector using the BKF–Rayleigh model. The experiments proved that the algorithm in this paper outperforms the existing methods in terms of performance, maintains robustness well under a large watermarking capacity, and has excellent imperceptibility at the same time. The algorithm maintains a well-balanced relationship between robustness, imperceptibility, and watermarking capacity. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

Back to TopTop