Big Data Engineering and Application

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (29 February 2024) | Viewed by 4467

Special Issue Editors

Associate Professor, School of Physics and Electronics, Central South University, Changsha 410017, China
Interests: big data, machine learning, data mining

Special Issue Information

Dear Colleagues,

We are inviting submissions to the Special Issue on Big Data Engineering and Application. 

Big data is at the forefront of the current research and provides a complete and effective data and information analysis method for artificial intelligence and in many fields of intelligent decisions. In this Special Issue, we invite submissions exploring cutting-edge research and recent advances in the fields of Big Data Engineering and Application. Both theoretical and experimental studies are welcome, as well as comprehensive reviews and survey papers.

The main topics include computing models, algorithms, frameworks and related applications and so on, as well as optimization and application of machine learning theory in big data.

Dr. Linzi Yin
Dr. Zhiwen Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data and data mining
  • big data and machine learning
  • big data engineering and application
  • big data-driven decision modeling
  • high performance big data learning architecture, algorithm and system

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5161 KiB  
Article
Anomaly Detection and Identification Method for Shield Tunneling Based on Energy Consumption Perspective
by Min Hu, Fan Zhang and Huiming Wu
Appl. Sci. 2024, 14(5), 2202; https://doi.org/10.3390/app14052202 - 06 Mar 2024
Viewed by 377
Abstract
Various abnormal scenarios might occur during the shield tunneling process, which have an impact on construction efficiency and safety. Existing research on shield tunneling construction anomaly detection typically designs models based on the characteristics of a specific anomaly, so the scenarios of anomalies [...] Read more.
Various abnormal scenarios might occur during the shield tunneling process, which have an impact on construction efficiency and safety. Existing research on shield tunneling construction anomaly detection typically designs models based on the characteristics of a specific anomaly, so the scenarios of anomalies that can be detected are limited. Therefore, the research objective of this article is to establish an accurate anomaly detection model with generalization and identification capabilities on multiple types of abnormal scenarios. Inspired by energy dissipation theory, this paper innovatively detects various anomalies in the shield tunneling process from the perspective of energy consumption and designs the AD_SI model (Anomaly Detection and Scenario Identification model of shield tunneling) based on machine learning. The AD_SI model first monitors the shield machine’s energy consumption status based on the VAE-LSTM (Variational Autoencoder–Long Short-Term Memory) algorithm with a dynamic threshold, thereby detecting abnormal sections. Secondly, the AD_SI model uses the correlation of construction parameters to represent different known scenarios and further clarifies scenarios of the abnormal sections, thus achieving anomaly identification. The application of the AD_SI model in a shield tunneling construction project demonstrates its capability to accurately detect and identify different anomalies, with a recall value exceeding 0.9 and F1 exceeding 0.8, thereby providing guidance for accurately detecting multiple types anomaly scenarios in practical applications. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

18 pages, 2614 KiB  
Article
A Fast Parallel Random Forest Algorithm Based on Spark
by Linzi Yin, Ken Chen, Zhaohui Jiang and Xuemei Xu
Appl. Sci. 2023, 13(10), 6121; https://doi.org/10.3390/app13106121 - 17 May 2023
Viewed by 1681
Abstract
To improve the computational efficiency and classification accuracy in the context of big data, an optimized parallel random forest algorithm is proposed based on the Spark computing framework. First, a new Gini coefficient is defined to reduce the impact of feature redundancy for [...] Read more.
To improve the computational efficiency and classification accuracy in the context of big data, an optimized parallel random forest algorithm is proposed based on the Spark computing framework. First, a new Gini coefficient is defined to reduce the impact of feature redundancy for higher classification accuracy. Next, to reduce the number of candidate split points and Gini coefficient calculations for continuous features, an approximate equal-frequency binning method is proposed to determine the optimal split points efficiently. Finally, based on Apache Spark computing framework, the forest sampling index (FSI) table is defined to speed up the parallel training process of decision trees and reduce data communication overhead. Experimental results show that the proposed algorithm improves the efficiency of constructing random forests while ensuring classification accuracy, and is superior to Spark-MLRF in terms of performance and scalability. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

28 pages, 888 KiB  
Article
Efficient False Positive Control Algorithms in Big Data Mining
by Xuze Liu, Yuhai Zhao, Tongze Xu, Fazal Wahab, Yiming Sun and Chen Chen
Appl. Sci. 2023, 13(8), 5006; https://doi.org/10.3390/app13085006 - 16 Apr 2023
Cited by 3 | Viewed by 1468
Abstract
The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in [...] Read more.
The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in large-scale exploratory data analysis can result in a large number of false positive results. The permutation testing-based FWER control method (PFWER) is theoretically effective in dealing with multiple hypothesis testing issues. In reality, however, this theoretical approach confronts a serious computational efficiency problem. It takes an extremely long time to compute an appropriate FWER false positive control threshold using PFWER, which is almost impossible to achieve in a reasonable amount of time using human effort on medium- or large-scale data. Although some methods for improving the efficiency of the FWER false positive control threshold calculation have been proposed, most of them are stand-alone, and there is still a lot of space for efficiency improvement. To address this problem, this paper proposes a distributed PFWER false-positive threshold calculation method for large-scale data. The computational effectiveness increases significantly when compared to the current approaches. The FP-growth algorithm is used first for pattern mining, and the mining process reduces the computation of invalid patterns by using pruning operations and index optimization for merging patterns with index transactions. The distributed computing technique is introduced on this basis, and the constructed FP tree is decomposed into a set of subtrees, each corresponding to a subtask. All subtrees (subtasks) are distributed to different computing nodes. Each node independently calculates the local significance threshold according to the designated subtasks. Finally, all local results are aggregated to compute the FWER false positive control threshold, which is completely consistent with the theoretical result. A series of experimental findings on 11 real-world datasets demonstrate that the distributed algorithm proposed in this paper can significantly improve the computation efficiency of PFWER while ensuring its theoretical accuracy. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

Back to TopTop