Submit to Special Issue Submit Abstract to Special Issue Review for Mathematics Propose a Special Issue

Journal Menu

Journal Browser

Advanced Research in Data-Centric AI

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 31 October 2024 | Viewed by 6841

Share This Special Issue

Special Issue Editors

Dr. Kunpeng Liu

E-Mail Website
Guest Editor

Department of Computer Science, Portland State University, Portland, OR 97201, USA
Interests: feature engineering; data mining; reinforcement learning

Dr. Pengfei Wang

E-Mail Website
Guest Editor

Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
Interests: spatiotemporal graph network; point process

Dr. Yifeng Gao

E-Mail Website
Guest Editor

Department of Computer Science, University of Texas Rio Grande Valleydisabled, Brownsville, TX 78520, USA
Interests: time series; spatial-temporal pattern mining

Dr. Yanjie Fu

E-Mail Website
Guest Editor

Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
Interests: data mining; big data analytics

Special Issue Information

Dear Colleagues,

The primary focus of machine learning is often on developing models to fit a particular dataset. However, in real-world scenarios, data can often be untidy, and refining models may not be the most effective way to enhance their performance. An alternative approach is to concentrate on improving the dataset itself rather than considering it as a fixed input. Data-Centric AI (DCAI) is an up-and-coming field that deals with techniques which systematically enhance datasets, often leading to notable improvements in practical machine learning applications.

The purpose of this Special Issue is to encourage the development of a dynamic and collaborative interdisciplinary community focused on addressing real-world data challenges through DCAI. These challenges involve several areas, such as data acquisition and creation, data labeling, data preprocessing and enhancement, data quality assessment, data debt, and data governance. As many of these domains are still emerging, we endeavor to foster an environment that collates experts to define and shape the DCAI movement to significantly impact the future of AI and ML.

Dr. Kunpeng Liu
Dr. Pengfei Wang
Dr. Yifeng Gao
Dr. Yanjie Fu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

automated data science
data pre-processing
big data analytics
feature engineering
reinforcement learning
time series
graph mining
open-source datasets
cross-dataset mining
spatiotemporal data mining
statistical machine learning
bioinformatics
distribution shift

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 2220 KiB

Open AccessArticle

Robust Bias Compensation Method for Sparse Normalized Quasi-Newton Least-Mean with Variable Mixing-Norm Adaptive Filtering

by Ying-Ren Chien, Han-En Hsieh and Guobing Qian

Mathematics 2024, 12(9), 1310; https://doi.org/10.3390/math12091310 - 25 Apr 2024

Viewed by 189

Abstract

Input noise causes inescapable bias to the weight vectors of the adaptive filters during the adaptation processes. Moreover, the impulse noise at the output of the unknown systems can prevent bias compensation from converging. This paper presents a robust bias compensation method for a sparse normalized quasi-Newton least-mean (BC-SNQNLM) adaptive filtering algorithm to address these issues. We have mathematically derived the biased-compensation terms in an impulse noisy environment. Inspired by the convex combination of adaptive filters’ step sizes, we propose a novel variable mixing-norm method, BC-SNQNLM-VMN, to accelerate the convergence of our BC-SNQNLM algorithm. Simulation results confirm that the proposed method significantly outperforms other comparative works regarding normalized mean-squared deviation (NMSD) in the steady state. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Figure 1

20 pages, 714 KiB

Open AccessArticle

TabFedSL: A Self-Supervised Approach to Labeling Tabular Data in Federated Learning Environments

by Ruixiao Wang, Yanxin Hu, Zhiyu Chen, Jianwei Guo and Gang Liu

Mathematics 2024, 12(8), 1158; https://doi.org/10.3390/math12081158 - 12 Apr 2024

Viewed by 349

Abstract

Currently, self-supervised learning has shown effectiveness in solving data labeling issues. Its success mainly depends on having access to large, high-quality datasets with diverse features. It also relies on utilizing the spatial, temporal, and semantic structures present in the data. However, domains such as finance, healthcare, and insurance primarily utilize tabular data formats. This presents challenges for traditional data augmentation methods aimed at improving data quality. Furthermore, the privacy-sensitive nature of these domains complicates the acquisition of the extensive, high-quality datasets necessary for training effective self-supervised models. To tackle these challenges, our proposal introduces a novel framework that combines self-supervised learning with Federated Learning (FL). This approach aims to solve the problem of data-distributed training while ensuring training quality. Our framework improves upon the conventional self-supervised learning data augmentation paradigm by incorporating data labeling through the segmentation of data into subsets. Our framework adds noise by splitting subsets of data and can achieve the same level of centralized learning in a distributed environment. Moreover, we conduct experiments on various public tabular datasets to evaluate our approach. The experimental results showcase the effectiveness and generalizability of our proposed method in scenarios involving unlabeled data and distributed settings. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Figure 1

24 pages, 14284 KiB

Open AccessArticle

Mask2Former with Improved Query for Semantic Segmentation in Remote-Sensing Images

by Shichen Guo, Qi Yang, Shiming Xiang, Shuwen Wang and Xuezhi Wang

Mathematics 2024, 12(5), 765; https://doi.org/10.3390/math12050765 - 04 Mar 2024

Viewed by 989

Abstract

Semantic segmentation of remote sensing (RS) images is vital in various practical applications, including urban construction planning, natural disaster monitoring, and land resources investigation. However, RS images are captured by airplanes or satellites at high altitudes and long distances, resulting in ground objects of the same category being scattered in various corners of the image. Moreover, objects of different sizes appear simultaneously in RS images. For example, some objects occupy a large area in urban scenes, while others only have small regions. Technically, the above two universal situations pose significant challenges to the segmentation with a high quality for RS images. Based on these observations, this paper proposes a Mask2Former with an improved query (IQ2Former) for this task. The fundamental motivation behind the IQ2Former is to enhance the capability of the query of Mask2Former by exploiting the characteristics of RS images well. First, we propose the Query Scenario Module (QSM), which aims to learn and group the queries from feature maps, allowing the selection of distinct scenarios such as the urban and rural areas, building clusters, and parking lots. Second, we design the query position module (QPM), which is developed to assign the image position information to each query without increasing the number of parameters, thereby enhancing the model’s sensitivity to small targets in complex scenarios. Finally, we propose the query attention module (QAM), which is constructed to leverage the characteristics of query attention to extract valuable features from the preceding queries. Being positioned between the duplicated transformer decoder layers, QAM ensures the comprehensive utilization of the supervisory information and the exploitation of those fine-grained details. Architecturally, the QSM, QPM, and QAM as well as an end-to-end model are assembled to achieve high-quality semantic segmentation. In comparison to the classical or state-of-the-art models (FCN, PSPNet, DeepLabV3+, OCRNet, UPerNet, MaskFormer, Mask2Former), IQ2Former has demonstrated exceptional performance across three publicly challenging remote-sensing image datasets, 83.59 mIoU on the Vaihingen dataset, 87.89 mIoU on Potsdam dataset, and 56.31 mIoU on LoveDA dataset. Additionally, overall accuracy, ablation experiment, and visualization segmentation results all indicate IQ2Former validity. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Figure 1

20 pages, 854 KiB

Open AccessArticle

A Community Detection and Graph-Neural-Network-Based Link Prediction Approach for Scientific Literature

by Chunjiang Liu, Yikun Han, Haiyun Xu, Shihan Yang, Kaidi Wang and Yongye Su

Mathematics 2024, 12(3), 369; https://doi.org/10.3390/math12030369 - 24 Jan 2024

Viewed by 1388

Abstract

This study presents a novel approach that synergizes community detection algorithms with various Graph Neural Network (GNN) models to bolster link prediction in scientific literature networks. By integrating the Louvain community detection algorithm into our GNN frameworks, we consistently enhanced the performance across all models tested. For example, integrating the Louvain model with the GAT model resulted in an AUC score increase from 0.777 to 0.823, exemplifying the typical improvements observed. Similar gains were noted when the Louvain model was paired with other GNN architectures, confirming the robustness and effectiveness of incorporating community-level insights. This consistent increase in performance—reflected in our extensive experimentation on bipartite graphs of scientific collaborations and citations—highlights the synergistic potential of combining community detection with GNNs to overcome common link prediction challenges such as scalability and resolution limits. Our findings advocate for the integration of community structures as a significant step forward in the predictive accuracy of network science models, offering a comprehensive understanding of scientific collaboration patterns through the lens of advanced machine learning techniques. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Figure 1

24 pages, 6195 KiB

Open AccessArticle

Enhanced Sea Horse Optimization Algorithm for Hyperparameter Optimization of Agricultural Image Recognition

by Zhuoshi Li, Shizheng Qu, Yinghang Xu, Xinwei Hao and Nan Lin

Mathematics 2024, 12(3), 368; https://doi.org/10.3390/math12030368 - 23 Jan 2024

Viewed by 714

Abstract

Deep learning technology has made significant progress in agricultural image recognition tasks, but the parameter adjustment of deep models usually requires a lot of manual intervention, which is time-consuming and inefficient. To solve this challenge, this paper proposes an adaptive parameter tuning strategy that combines sine–cosine algorithm with Tent chaotic mapping to enhance sea horse optimization, which improves the search ability and convergence stability of standard sea horse optimization algorithm (SHO). Through adaptive optimization, this paper determines the best parameter configuration in ResNet-50 neural network and optimizes the model performance. The improved ESHO algorithm shows superior optimization effects than other algorithms in various performance indicators. The improved model achieves 96.7% accuracy in the corn disease image recognition task, and 96.4% accuracy in the jade fungus image recognition task. These results show that ESHO can not only effectively improve the accuracy of agricultural image recognition, but also reduce the need for manual parameter adjustment. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Figure 1

25 pages, 5269 KiB

Open AccessArticle

Application of Artificial Intelligence Methods for Predicting the Compressive Strength of Green Concretes with Rice Husk Ash

by Miljan Kovačević, Marijana Hadzima-Nyarko, Ivanka Netinger Grubeša, Dorin Radu and Silva Lozančić

Mathematics 2024, 12(1), 66; https://doi.org/10.3390/math12010066 - 24 Dec 2023

Cited by 1 | Viewed by 1079

Abstract

To promote sustainable growth and minimize the greenhouse effect, rice husk fly ash can be used instead of a certain amount of cement. The research models the effects of using rice fly ash as a substitute for regular Portland cement on the compressive strength of concrete. In this study, different machine-learning techniques are investigated and a procedure to determine the optimal model is provided. A database of 909 analyzed samples forms the basis for creating forecast models. The derived models are assessed using the accuracy criteria RMSE, MAE, MAPE, and R. The research shows that artificial intelligence techniques can be used to model the compressive strength of concrete with acceptable accuracy. It is also possible to evaluate the importance of specific input variables and their influence on the strength of such concrete. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Figure 1

25 pages, 26532 KiB

Open AccessArticle

Statistical Image Watermark Algorithm for FAPHFMs Domain Based on BKF–Rayleigh Distribution

by Siyu Yang, Ansheng Deng and Hui Cui

Mathematics 2023, 11(23), 4720; https://doi.org/10.3390/math11234720 - 21 Nov 2023

Viewed by 965

Abstract

In the field of image watermarking, imperceptibility, robustness, and watermarking capacity are key indicators for evaluating the performance of watermarking techniques. However, these three factors are often mutually constrained, posing a challenge in achieving a balance among them. To address this issue, this paper presents a novel image watermark detection algorithm based on local fast and accurate polar harmonic Fourier moments (FAPHFMs) and the BKF–Rayleigh distribution model. Firstly, the original image is chunked without overlapping, the entropy value is calculated, the high-entropy chunks are selected in descending order, and the local FAPHFM magnitudes are calculated. Secondly, the watermarking signals are embedded into the robust local FAPHFM magnitudes by the multiplication function, and then MMLE based on the RSS method is utilized to estimate the statistical parameters of the BKF–Rayleigh distribution model. Finally, a blind image watermarking detector is designed using BKF–Rayleigh distribution and LO decision criteria. In addition, we derive the closed expression of the watermark detector using the BKF–Rayleigh model. The experiments proved that the algorithm in this paper outperforms the existing methods in terms of performance, maintains robustness well under a large watermarking capacity, and has excellent imperceptibility at the same time. The algorithm maintains a well-balanced relationship between robustness, imperceptibility, and watermarking capacity. Full article

(This article belongs to the Special Issue Advanced Research in Data-Centric AI)

► Show Figures

Journal Menu

Journal Browser

Advanced Research in Data-Centric AI

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI