Recent Advances in Big Data Analytics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 36165

Special Issue Editor


E-Mail Website
Guest Editor
Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, China
Interests: data science and complexity science; including link prediction; computational socioeconomics; human dynamics; epidemic dynamics

Special Issue Information

Dear Colleagues,

The increasing availability of data sources and analysis tools have sharply changed traditional methodologies of natural sciences and social sciences. All branches of science are transformed by data-intensive methodologies, and thus, so-called “big data analytics” has become a core issue for almost every researcher. As an echo of this tendency, the aim of this Special Issue is to collect original research papers or surveys on the following four topics: (i) fundamental theoretical analyses, such as the predictability of a system, the minimum error of a classifier, and the reliability of a certain data mining approach; (ii) novel methods, such as methods to uncover hidden causal relationships, to learn multimodal data, and to analyze private data; (iii) the launch of significant data sets with extensive attention, advanced platforms for data analytics, and important analytical tools for some specific problems; (iv) the applications of big data analytics in all disciplines.

Prof. Dr. Tao Zhou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • data mining
  • artificial intelligence
  • machine learning
  • causal inference
  • predictability
  • minimum error
  • multimodal learning
  • private data analysis

Published Papers (21 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 2656 KiB  
Article
Research on the Quality Evaluation Method of Mobile Emergency Big Data Based on the Measure of Medium Truth Degree
by Jianxun Li, Qing Li, Haoxin Fu and Kin Keung Lai
Appl. Sci. 2023, 13(16), 9072; https://doi.org/10.3390/app13169072 - 08 Aug 2023
Viewed by 738
Abstract
Mobile emergency services are better able to meet the needs of frequent public emergencies; however, their data quality problems seriously affect decision-making. In order to reduce the interference of low-quality data and solve the problem of data quality ambiguity, this paper first summarizes [...] Read more.
Mobile emergency services are better able to meet the needs of frequent public emergencies; however, their data quality problems seriously affect decision-making. In order to reduce the interference of low-quality data and solve the problem of data quality ambiguity, this paper first summarizes the five characteristics of mobile emergency big data. Second, based on the characteristics of mobile emergency big data, four data quality dimensions are defined with reference to existing research and national standards and combined with the measure of medium truth degree to give single-dimension and multi-dimension data quality truth degree measure models. Finally, a subjective-objective, qualitative-quantitative mobile emergency big data quality evaluation method based on the measure of medium truth degree is formed. The validity and practicality of the method are also verified by examples of algorithmic analysis of fire text datasets from Australian mountain fire data and the Chinese Emergency Incident Corpus. The experiments show that the method can realize quantitative mobile emergency big data quality assessment, solve the problem of data quality ambiguity, and reduce the interference of low-quality data, so as to save resources and improve the analysis and decision-making ability. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

20 pages, 3459 KiB  
Article
A Network Analysis Approach to Detecting Social Issues with Web-Based Data
by Seunghyun Lee, Jiho Lee, Jae-Min Lee, Hong-Woo Chun and Janghyeok Yoon
Appl. Sci. 2023, 13(14), 8516; https://doi.org/10.3390/app13148516 - 23 Jul 2023
Viewed by 1505
Abstract
Social issues refer to topics that occur and become increasingly focused in various areas of society. Because of the evolutionary pattern of issues, detecting social issues requires monitoring various stories formed by members of society over time. Various studies related to issue detection [...] Read more.
Social issues refer to topics that occur and become increasingly focused in various areas of society. Because of the evolutionary pattern of issues, detecting social issues requires monitoring various stories formed by members of society over time. Various studies related to issue detection have been preceded, but it is necessary to supplement in two aspects: presenting the time when issues occurred and prioritizing issues by urgency. As a remedy, the purpose of this study is to propose a new approach to detecting social issues from web-based data through network analysis. Since stories that form social issues are composed of various keywords and topics, this study detects social issues by monitoring keyword co-occurrence networks constructed with web-based data. Specifically, this approach uses network structure entropy to identify a time period at which social issues occur. Next, a community detection algorithm is used to extract social issue candidates in the identified time period. Finally, social issues are detected by deriving the priority of social issue candidates through the centrality index of keywords constituting the candidates. This study detected South Korean social issue topics that attract people’s attention among the various topics of society. The proposed approach contributes to the existing literature by identifying when social issues occurred quantitatively based on the characteristics of issues. In addition, since the proposed approach detects urgent issues to be dealt with priority, it can support timely responses to social issues. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

13 pages, 770 KiB  
Article
Predicting Critical Nodes in Temporal Networks by Dynamic Graph Convolutional Networks
by Enyu Yu, Yan Fu, Junlin Zhou, Hongliang Sun and Duanbing Chen
Appl. Sci. 2023, 13(12), 7272; https://doi.org/10.3390/app13127272 - 18 Jun 2023
Cited by 1 | Viewed by 1881
Abstract
Many real-world systems can be expressed in temporal networks with nodes playing different roles in structure and function, and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in [...] Read more.
Many real-world systems can be expressed in temporal networks with nodes playing different roles in structure and function, and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in academia, conduct advertisements for various commodities and so on. However, it is rather difficult to identify critical nodes, because the network structure changes over time in temporal networks. In this paper, considering the sequence topological information of temporal networks, a novel and effective learning framework based on the combination of special graph convolutional and long short-term memory network (LSTM) is proposed to identify nodes with the best spreading ability. The special graph convolutional network can embed nodes in each sequential weighted snapshot and LSTM is used to predict the future importance of timing-embedded features. The effectiveness of the approach is evaluated by a weighted Susceptible-Infected-Recovered model. Experimental results on four real-world temporal networks demonstrate that the proposed method outperforms both traditional and deep learning benchmark methods in terms of the Kendall τ coefficient and top k hit rate. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

12 pages, 2016 KiB  
Article
VLSM-Net: A Fusion Architecture for CT Image Segmentation
by Yachun Gao, Jia Guo, Chuanji Fu, Yan Wang and Shimin Cai
Appl. Sci. 2023, 13(7), 4384; https://doi.org/10.3390/app13074384 - 30 Mar 2023
Cited by 2 | Viewed by 1313
Abstract
Region of interest (ROI) segmentation is a key step in computer-aided diagnosis (CAD). With the problems of blurred tissue edges and imprecise boundaries of ROI in medical images, it is hard to extract satisfactory ROIs from medical images. In order to overcome the [...] Read more.
Region of interest (ROI) segmentation is a key step in computer-aided diagnosis (CAD). With the problems of blurred tissue edges and imprecise boundaries of ROI in medical images, it is hard to extract satisfactory ROIs from medical images. In order to overcome the shortcomings in segmentation from the V-Net model or the level set method (LSM), we propose in this paper a new image segmentation method, the VLSM-Net model, combining these two methods. Specifically, we first use the V-Net model to segment the ROIs, and set the segmentation result as the initial contour. It is then fed through the hybrid LSM for further fine segmentation. That is, the complete segmentation of the V-Net model can be obtained by successively combining the V-Net model and the hybrid LSM. The experimental results conducted in the public datasets LiTS and LUNA show that, compared with the V-Net model or LSM alone, our VLSM-Net model greatly improves the sensitivity, precision and dice coefficient values (DCV) in 3D image segmentation, thus validating our model’s effectiveness. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

14 pages, 549 KiB  
Article
A Joint Domain-Specific Pre-Training Method Based on Data Enhancement
by Yi Gan, Gaoyong Lu, Zhihui Su, Lei Wang, Junlin Zhou, Jiawei Jiang and Duanbing Chen
Appl. Sci. 2023, 13(7), 4115; https://doi.org/10.3390/app13074115 - 23 Mar 2023
Cited by 1 | Viewed by 3046
Abstract
State-of-the-art performances for natural language processing tasks are achieved by supervised learning, specifically, by fine-tuning pre-trained language models such as BERT (Bidirectional Encoder Representation from Transformers). With increasingly accurate models, the size of the fine-tuned pre-training corpus is becoming larger and larger. However, [...] Read more.
State-of-the-art performances for natural language processing tasks are achieved by supervised learning, specifically, by fine-tuning pre-trained language models such as BERT (Bidirectional Encoder Representation from Transformers). With increasingly accurate models, the size of the fine-tuned pre-training corpus is becoming larger and larger. However, very few studies have explored the selection of pre-training corpus. Therefore, this paper proposes a data enhancement-based domain pre-training method. At first, a pre-training task and a downstream fine-tuning task are jointly trained to alleviate the catastrophic forgetting problem generated by existing classical pre-training methods. Then, based on the hard-to-classify texts identified from downstream tasks’ feedback, the pre-training corpus can be reconstructed by selecting the similar texts from it. The learning of the reconstructed pre-training corpus can deepen the model’s understanding of undeterminable text expressions, thus enhancing the model’s feature extraction ability for domain texts. Without any pre-processing of the pre-training corpus, the experiments are conducted for two tasks, named entity recognition (NER) and text classification (CLS). The results show that learning the domain corpus selected by the proposed method can supplement the model’s understanding of domain-specific information and improve the performance of the basic pre-training model to achieve the best results compared with other benchmark methods. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

14 pages, 479 KiB  
Article
A Constrained Louvain Algorithm with a Novel Modularity
by Bibao Yao, Junfang Zhu, Peijie Ma, Kun Gao and Xuezao Ren
Appl. Sci. 2023, 13(6), 4045; https://doi.org/10.3390/app13064045 - 22 Mar 2023
Cited by 2 | Viewed by 1752
Abstract
Community detection is a significant and challenging task in network research. Nowadays, many community detection methods have been developed. Among them, the classical Louvain algorithm is an excellent method aiming at optimizing an objective function. In this paper, we propose a modularity function [...] Read more.
Community detection is a significant and challenging task in network research. Nowadays, many community detection methods have been developed. Among them, the classical Louvain algorithm is an excellent method aiming at optimizing an objective function. In this paper, we propose a modularity function F2 as a new objective function. Our modularity function F2 overcomes certain disadvantages of the modularity functions raised in previous literature, such as the resolution limit problem. It is desired as a competitive objective function. Then, the constrained Louvain algorithm is proposed by adding some constraints to the classical Louvain algorithm. Finally, through the comparison, we have found that the constrained Louvain algorithm with F2 is better than the constrained Louvain algorithm with other objective functions on most considered networks. Moreover, the constrained Louvain algorithm with F2 is superior to the classical Louvain algorithm and the Newman’s fast method. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

12 pages, 4604 KiB  
Article
Prediction of Air Quality Combining Wavelet Transform, DCCA Correlation Analysis and LSTM Model
by Zheng Zhang, Haibo Chen and Xiaoli Huang
Appl. Sci. 2023, 13(5), 2796; https://doi.org/10.3390/app13052796 - 22 Feb 2023
Cited by 2 | Viewed by 1377
Abstract
In the context of global climate change, air quality prediction work has a substantial impact on humans’ daily lives. The current extensive usage of machine learning models for air quality forecasting has resulted in significant improvements to the sector. The long short-term memory [...] Read more.
In the context of global climate change, air quality prediction work has a substantial impact on humans’ daily lives. The current extensive usage of machine learning models for air quality forecasting has resulted in significant improvements to the sector. The long short-term memory network is a deep learning prediction model, which adds a forgetting layer to a recurrent neural network and has several applications in air quality prediction. The experimental data presented in this research include air pollution data (SO2, NO2, PM10, PM2.5, O3, and CO) and meteorological data (temperature, barometric pressure, humidity, and wind speed). Initially, using air pollution data to calculate the air pollution index (AQI) and the wavelet transform with the adaptive Stein risk estimation threshold is utilized to enhance the quality of meteorological data. Using detrended cross-correlation analysis (DCCA), the mutual association between pollution elements and meteorological elements is then quantified. On short, medium, and long scales, the prediction model’s accuracy increases by 1%, 1.6%, 2%, and 5% for window sizes (h) of 24, 48, 168, and 5000, and the efficiency increases by 5.72%, 8.64%, 8.29%, and 3.42%, respectively. The model developed in this paper has a substantial improvement effect, and its application to the forecast of air quality is of immense practical significance. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

15 pages, 4069 KiB  
Article
Student Behavior Prediction of Mental Health Based on Two-Stream Informer Network
by Jieming Xu, Xuefeng Ding, Hanyu Ke, Cong Xu and Hanlun Zhang
Appl. Sci. 2023, 13(4), 2371; https://doi.org/10.3390/app13042371 - 12 Feb 2023
Cited by 2 | Viewed by 1210
Abstract
Students’ mental health has always been the focus of social attention, and mental health prediction can be regarded as a time-series classification task. In this paper, an informer network based on a two-stream structure (TSIN) is proposed to calculate the interdependence between students’ [...] Read more.
Students’ mental health has always been the focus of social attention, and mental health prediction can be regarded as a time-series classification task. In this paper, an informer network based on a two-stream structure (TSIN) is proposed to calculate the interdependence between students’ behaviors and the trend of time cycle, and the intermediate features are integrated layer by layer to realize the prediction of mental health by a gating mechanism. Through experiments on a real campus environment dataset (STU) and an open dataset (MTS), it is verified that the proposed algorithm can obtain higher accuracy than existing methods. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

17 pages, 42072 KiB  
Article
Design and Application of Intelligent Transportation Multi-Source Data Collaboration Framework Based on Digital Twins
by Xihou Zhang, Dingding Han, Xiaobo Zhang and Leheng Fang
Appl. Sci. 2023, 13(3), 1923; https://doi.org/10.3390/app13031923 - 02 Feb 2023
Cited by 6 | Viewed by 2512
Abstract
The increasing urban traffic problems have made the transportation system require a large amount of data. Aiming at the current problems of data types redundancy and low coordination rate of intelligent transportation systems (ITS), this paper proposes an improved digital twin architecture applicable [...] Read more.
The increasing urban traffic problems have made the transportation system require a large amount of data. Aiming at the current problems of data types redundancy and low coordination rate of intelligent transportation systems (ITS), this paper proposes an improved digital twin architecture applicable to ITS. Based on the improved digital twin architecture, a framework for dynamic and static data collaboration in ITS is constructed. For various collaboration methods, this paper specifically describes the collaboration methods and scopes, and designs the framework and interfaces for data mapping. Finally, the effectiveness of the framework is verified by case studies to mine the spatiotemporal distribution characteristics of data, capture human travel characteristics, and visualize intersections using digital twins. This paper provides a new data fusion idea for digital twin systems in ITS, and the framework covers all data types in digital twin systems for cross-integration analysis. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

32 pages, 7263 KiB  
Article
Automatic Detection of Multilevel Communities: Scalable, Selective and Resolution-Limit-Free
by Kun Gao, Xuezao Ren, Lei Zhou and Junfang Zhu
Appl. Sci. 2023, 13(3), 1774; https://doi.org/10.3390/app13031774 - 30 Jan 2023
Cited by 2 | Viewed by 1028
Abstract
Community structure is one of the most important features of complex networks. Modularity-based methods for community detection typically rely on heuristic algorithms to optimize a specific community quality function. Such methods have two major limits: (1) the resolution limit problem, which prohibits communities [...] Read more.
Community structure is one of the most important features of complex networks. Modularity-based methods for community detection typically rely on heuristic algorithms to optimize a specific community quality function. Such methods have two major limits: (1) the resolution limit problem, which prohibits communities of heterogeneous sizes being simultaneously detected, and (2) divergent outputs of the heuristic algorithm, which make it difficult to differentiate relevant and irrelevant results. In this paper, we propose an improved method for community detection based on a scalable community “fitness function.” We introduce a new parameter to enhance its scalability, and a strict strategy to filter the outputs. Due to the scalability, on the one hand, our method is free of the resolution limit problem and performs excellently on large heterogeneous networks, while on the other hand, it is capable of detecting more levels of communities than previous methods in deep hierarchical networks. Moreover, our strict strategy automatically removes redundant and irrelevant results; it selectively but inartificially outputs only the best and unique community structures, which turn out to be largely interpretable by the a priori knowledge of the network, including the implanted community structures within synthetic networks, or metadata observed for real-world networks. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

17 pages, 399 KiB  
Article
Improving Domain-Generalized Few-Shot Text Classification with Multi-Level Distributional Signatures
by Xuyang Wang, Yajun Du, Danroujing Chen, Xianyong Li, Xiaoliang Chen, Yongquan Fan, Chunzhi Xie, Yanli Li and Jia Liu
Appl. Sci. 2023, 13(2), 1202; https://doi.org/10.3390/app13021202 - 16 Jan 2023
Cited by 1 | Viewed by 1506
Abstract
Domain-generalized few-shot text classification (DG-FSTC) is a new setting for few-shot text classification (FSTC). In DG-FSTC, the model is meta-trained on a multi-domain dataset, and meta-tested on unseen datasets with different domains. However, previous methods mostly construct semantic representations by learning from words [...] Read more.
Domain-generalized few-shot text classification (DG-FSTC) is a new setting for few-shot text classification (FSTC). In DG-FSTC, the model is meta-trained on a multi-domain dataset, and meta-tested on unseen datasets with different domains. However, previous methods mostly construct semantic representations by learning from words directly, which is limited in domain adaptability. In this study, we enhance the domain adaptability of the model by utilizing the distributional signatures of texts that indicate domain-related features in specific domains. We propose a Multi-level Distributional Signatures based model, namely MultiDS. Firstly, inspired by pretrained language models, we compute distributional signatures from an extra large news corpus, and we denote these as domain-agnostic features. Then we calculate the distributional signatures from texts in the same domain and texts from the same class, respectively. These two kinds of information are regarded as domain-specific and class-specific features, respectively. After that, we fuse and translate these three distributional signatures into word-level attention values, which enables the model to capture informative features as domain changes. In addition, we utilize domain-specific distributional signatures for the calibration of feature representations in specific domains. The calibration vectors produced by the domain-specific distributional signatures and word embeddings help the model adapt to various domains. Extensive experiments are performed on four benchmarks. The results demonstrate that our proposed method beats the state-of-the-art method with an average improvement of 1.41% on four datasets. Compared with five competitive baselines, our method achieves the best average performance. The ablation studies prove the effectiveness of each proposed module. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

14 pages, 1437 KiB  
Article
Spatiotemporal Patterns of Risk Propagation in Complex Financial Networks
by Tingting Chen, Yan Li, Xiongfei Jiang and Lingjie Shao
Appl. Sci. 2023, 13(2), 1129; https://doi.org/10.3390/app13021129 - 14 Jan 2023
Cited by 3 | Viewed by 1387
Abstract
The methods of complex networks have been extensively used to characterize information flow in complex systems, such as risk propagation in complex financial networks. However, network dynamics are ignored in most cases despite systems with similar topological structures exhibiting profoundly different dynamic behaviors. [...] Read more.
The methods of complex networks have been extensively used to characterize information flow in complex systems, such as risk propagation in complex financial networks. However, network dynamics are ignored in most cases despite systems with similar topological structures exhibiting profoundly different dynamic behaviors. To observe the spatiotemporal patterns of risk propagation in complex financial networks, we combined a dynamic model with empirical networks. Our analysis revealed that hub nodes play a dominant role in risk propagation across the network and respond rapidly, thus exhibiting a degree-driven effect. The influence of key dynamic parameters, i.e., infection rate and recovery rate, was also investigated. Furthermore, the impacts of two typical characteristics of complex financial systems—the existence of community structures and frequent large fluctuations—on the spatiotemporal patterns of risk propagation were explored. About 30% of the total risk propagation flow of each community can be explained by the top 10% nodes. Thus, we can control the risk propagation flow of each community by controlling a few influential nodes in the community and, in turn, control the whole network. In extreme market states, hub nodes become more dominant, indicating better risk control. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

12 pages, 355 KiB  
Article
A Cardiovascular Disease Risk Score Model Based on High Contribution Characteristics
by Mengxiao Peng, Fan Hou, Zhixiang Cheng, Tongtong Shen, Kaixian Liu, Cai Zhao and Wen Zheng
Appl. Sci. 2023, 13(2), 893; https://doi.org/10.3390/app13020893 - 09 Jan 2023
Cited by 3 | Viewed by 1714
Abstract
Cardiovascular disease (CVD) risk prediction shows great significance for disease diagnosis and treatment, especially early intervention for CVD, which has a direct impact on preventing and reducing adverse outcomes. In this paper, we collected clinical indicators and outcomes of 14,832 patients with cardiovascular [...] Read more.
Cardiovascular disease (CVD) risk prediction shows great significance for disease diagnosis and treatment, especially early intervention for CVD, which has a direct impact on preventing and reducing adverse outcomes. In this paper, we collected clinical indicators and outcomes of 14,832 patients with cardiovascular disease in Shanxi, China, and proposed a cardiovascular disease risk prediction model, XGBH, based on key contributing characteristics to perform risk scoring of patients’ clinical outcomes. The XGBH risk prediction model had high accuracy, with a significant improvement compared to the baseline risk score (AUC = 0.80 vs. AUC = 0.65). At the same time, we found that with the addition of conventional biometric variables, the accuracy of the model’s CVD risk prediction would also be improved. Finally, we designed a simpler model to quantify disease risk based on only three questions answered by the patient, with only a modest reduction in accuracy (AUC = 0.79), and providing a valid risk assessment for CVD. Overall, our models may allow early-stage intervention in high-risk patients, as well as a cost-effective screening approach. Further prospective studies and studies in other populations are needed to assess the actual clinical effect of XGBH risk prediction models. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

18 pages, 5337 KiB  
Article
Balanced Loss Function for Accurate Surface Defect Segmentation
by Zhouyang Xie, Chang Shu, Yan Fu, Junlin Zhou and Duanbing Chen
Appl. Sci. 2023, 13(2), 826; https://doi.org/10.3390/app13020826 - 06 Jan 2023
Cited by 3 | Viewed by 2102
Abstract
The accurate image segmentation of surface defects is challenging for modern convolutional neural networks (CNN)-based segmentation models. This paper identifies that loss imbalance is a critical problem in segmentation accuracy improvement. The loss imbalance problem includes: label imbalance, which impairs the accuracy on [...] Read more.
The accurate image segmentation of surface defects is challenging for modern convolutional neural networks (CNN)-based segmentation models. This paper identifies that loss imbalance is a critical problem in segmentation accuracy improvement. The loss imbalance problem includes: label imbalance, which impairs the accuracy on less represented classes; easy–hard example imbalance, which misleads the focus of optimization on less valuable examples; and boundary imbalance, which involves an unusually large loss value at the defect boundary caused by label confusion. In this paper, a novel balanced loss function is proposed to address the loss imbalance problem. The balanced loss function includes dynamical class weighting, truncated cross-entropy loss and label confusion suppression to solve the three types of loss imbalance, respectively. Extensive experiments are performed on surface defect benchmarks and various CNN segmentation models in comparison with other commonly used loss functions. The balanced loss function outperforms the counterparts and brings accuracy improvement from 5% to 30%. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

18 pages, 3419 KiB  
Article
Multi-View Multi-Attention Graph Neural Network for Traffic Flow Forecasting
by Fei Wu, Changjiang Zheng, Chen Zhang, Junze Ma and Kai Sun
Appl. Sci. 2023, 13(2), 711; https://doi.org/10.3390/app13020711 - 04 Jan 2023
Cited by 4 | Viewed by 1615
Abstract
The key to intelligent traffic control and guidance lies in accurate prediction of traffic flow. Since traffic flow data is nonlinear, complex, and dynamic, in order to overcome these issues, graph neural network techniques are employed to address these challenges. For this reason, [...] Read more.
The key to intelligent traffic control and guidance lies in accurate prediction of traffic flow. Since traffic flow data is nonlinear, complex, and dynamic, in order to overcome these issues, graph neural network techniques are employed to address these challenges. For this reason, we propose a deep-learning architecture called AMGC-AT and apply it to a real passenger flow dataset of the Hangzhou metro for evaluation. Based on a priori knowledge, we set up multi-view graphs to express the static feature similarity of each station in the metro network, such as geographic location and zone function, which are then input to the multi-graph neural network with the goal of extracting and aggregating features in order to realize the complex spatial dependence of each station’s passenger flow. Furthermore, based on periodic features of historical traffic flows, we categorize the flow data into three time patterns. Specifically, we propose two different self-attention mechanisms to fuse high-order spatiotemporal features of traffic flow. The final step is to integrate the two modules and obtain the output results using a gated convolution and a fully connected neural network. The experimental results show that the proposed model has better performance than eight other baseline models at 10 min, 15 min and 30 min time intervals. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

11 pages, 1872 KiB  
Article
Prediction of Suzhou’s Industrial Power Consumption Based on Grey Model with Seasonal Index Adjustment
by Huimin Chen, Xiaoyan Sun and Mei Li
Appl. Sci. 2022, 12(24), 12669; https://doi.org/10.3390/app122412669 - 10 Dec 2022
Cited by 1 | Viewed by 890
Abstract
The accurate prediction of industrial power consumption is conducive to the effective allocation of power resources by power and energy institutions, and it is also of great significance for the construction and planning of the national grid. By analyzing the characteristics of the [...] Read more.
The accurate prediction of industrial power consumption is conducive to the effective allocation of power resources by power and energy institutions, and it is also of great significance for the construction and planning of the national grid. By analyzing the characteristics of the data of Suzhou’s industrial power consumption between 2003 and 2005, this paper proposes a grey model with a seasonal index adjustment to predict industrial power consumption. The model results are compared with the traditional grey model, as well as the real value of Suzhou’s industrial power consumption, which shows that our model is more suitable for the prediction of industrial power consumption. The lasted Suzhou’s industrial power consumption data, from 2019–2021, are also investigated, and the results show that the prediction results are in very good agreement with the real data. The highlights of the paper are that all precision inspection indexes are excellent and the seasonal fluctuations in the data changes can be reflected in the present model. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

14 pages, 1691 KiB  
Article
A Seismic Phase Recognition Algorithm Based on Time Convolution Networks
by Zhenhua Han, Yu Li, Kai Guo, Gang Li, Wen Zheng and Hongfu Liu
Appl. Sci. 2022, 12(19), 9547; https://doi.org/10.3390/app12199547 - 23 Sep 2022
Cited by 1 | Viewed by 1394
Abstract
Over recent years, frequent earthquakes have caused huge losses in human life and property. Rapid and automatic earthquake detection plays an important role in earthquake warning systems and earthquake operation mechanism research. Temporal convolution networks (TCNs) are frameworks that use expansion convolution and [...] Read more.
Over recent years, frequent earthquakes have caused huge losses in human life and property. Rapid and automatic earthquake detection plays an important role in earthquake warning systems and earthquake operation mechanism research. Temporal convolution networks (TCNs) are frameworks that use expansion convolution and expansion, which have large and temporal receptive fields and can adapt to time series data. Given the excellent performance of temporal convolution networks using time series data, this paper proposes a deep learning framework based on the temporal convolution network model, which can be used to detect and obtain the accurate start times of seismic phases. In addition, a convolutional neural network (CNN) was added to the temporal convolution network model to automatically extract the deep features of seismic waves and the expansion convolution of each level was added to optimize its structure, which not only reduced the experimental parameters but also produced high-precision seismic phase detection results. Finally, the model was compared to the TCN, CNN-LSTM, SELD-TCN and the traditional AR-AIC methods. Our experimental results showed that the S-TCN method demonstrated great advantages in the accuracy and performance of seismic phase detection. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

17 pages, 3343 KiB  
Article
XLNet-Based Prediction Model for CVSS Metric Values
by Fan Shi, Shaofeng Kai, Jinghua Zheng and Yao Zhong
Appl. Sci. 2022, 12(18), 8983; https://doi.org/10.3390/app12188983 - 07 Sep 2022
Cited by 3 | Viewed by 1731
Abstract
A plethora of software vulnerabilities are exposed daily, posing a severe threat to the Internet. It is almost impossible for security experts or software developers to deal with all vulnerabilities. Therefore, it is imperative to rapidly assess the severity of the vulnerability to [...] Read more.
A plethora of software vulnerabilities are exposed daily, posing a severe threat to the Internet. It is almost impossible for security experts or software developers to deal with all vulnerabilities. Therefore, it is imperative to rapidly assess the severity of the vulnerability to be able to select which one should be given preferential attention. CVSS is now the industry’s de facto evaluation standard, which is calculated with a quantitative formula to measure the severity of a vulnerability. The CVSS formula consists of several metrics related to the vulnerability’s features. Security experts need to determine the values of each metric, which is tedious and time-consuming, therefore hindering the efficiency of severity assessment. To address this problem, in this paper, we propose a method based on a pre-trained model for the prediction of CVSS metric values. More specifically, this method utilizes the XLNet model that is fine-tuned with a self-built corpus to predict the metric values from the vulnerability description text, thus reducing the burden of the assessment procedure. To verify the performance of our method, we compare the XLNet model with other pre-trained models and conventional machine learning techniques. The experimental results show that the method outperforms these models on evaluation metrics, reaching state-of-the-art performance levels. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

18 pages, 3565 KiB  
Article
Rethinking Academic Conferences in the Age of Pandemic
by Qing Cai, Zhanwei Du, Ye Wu and Xiaoke Xu
Appl. Sci. 2022, 12(16), 8351; https://doi.org/10.3390/app12168351 - 21 Aug 2022
Cited by 1 | Viewed by 1366
Abstract
The year 2020 witnessed the havoc wreaked by the coronavirus disease COVID-19 due to its onset in late 2019. The COVID-19 pandemic is the cruelest public health crisis humankind has ever seen. The COVID-19 pandemic profoundly affected every walk of life, and academic [...] Read more.
The year 2020 witnessed the havoc wreaked by the coronavirus disease COVID-19 due to its onset in late 2019. The COVID-19 pandemic is the cruelest public health crisis humankind has ever seen. The COVID-19 pandemic profoundly affected every walk of life, and academic research has been no exception. Academic conferences are an indispensable component of research. Note that the pandemic together with its variants ravaged the globe in 2020, while their recurrences yet have a deep shadow across 2021 and 2022 with uncertainties for the near future. Under the sway of the pandemic, many conferences are conducted in virtual mode to mitigate the propagation of the virus. It is no surprise that academic conferences charge the attendees for registration fees with the amount varying by countries and disciplines. Here, we collect the registration fee information for conferences held in 2019, 2020 and 2021. Note that virtual conferences barely cater to attendees except by providing online platforms. However, we discover that most of the virtual conferences held in 2020 and 2021 still charged high registration fees compared to those in 2019, while the remaining conferences only applied small discounts. In light of the current situation of the pandemic as well as uncertainties in the future, virtual conferences could be a common form of academic activity. Considering the sluggish global economy at well as other potential issues, here, we advocate that going virtual should always be an option for academic conferences in the future. We also suggest that virtual conferences should charge less and the expenditure of the fees should be open to the public. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

Review

Jump to: Research

10 pages, 1496 KiB  
Review
Medical Application of Big Data: Between Systematic Review and Randomized Controlled Trials
by Sung Ryul Shim, Joon-Ho Lee and Jae Heon Kim
Appl. Sci. 2023, 13(16), 9260; https://doi.org/10.3390/app13169260 - 15 Aug 2023
Viewed by 754
Abstract
In terms of medical health, we are currently living in the era of data science, which has brought tremendous change. Big data related to healthcare includes medical data, genome data, and lifelog data. Among medical data, public medical data is very important for [...] Read more.
In terms of medical health, we are currently living in the era of data science, which has brought tremendous change. Big data related to healthcare includes medical data, genome data, and lifelog data. Among medical data, public medical data is very important for actual research and medical policy reflection because it has data on a large number of patients and is representative. However, there are many difficulties in actually using such public health big data and designing a study, and conducting systematic review (SR) on the research topic can help a lot in the methodology. In this review, in addition to the importance of research using big data for the public interest, we will introduce important public medical big data in Korea and show how SR can be specifically applied in research using public medical big data. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

19 pages, 3398 KiB  
Review
Imputation Methods for scRNA Sequencing Data
by Mengyuan Wang, Jiatao Gan, Changfeng Han, Yanbing Guo, Kaihao Chen, Ya-zhou Shi and Ben-gong Zhang
Appl. Sci. 2022, 12(20), 10684; https://doi.org/10.3390/app122010684 - 21 Oct 2022
Cited by 5 | Viewed by 2427
Abstract
More and more researchers use single-cell RNA sequencing (scRNA-seq) technology to characterize the transcriptional map at the single-cell level. They use it to study the heterogeneity of complex tissues, transcriptome dynamics, and the diversity of unknown organisms. However, there are generally lots of [...] Read more.
More and more researchers use single-cell RNA sequencing (scRNA-seq) technology to characterize the transcriptional map at the single-cell level. They use it to study the heterogeneity of complex tissues, transcriptome dynamics, and the diversity of unknown organisms. However, there are generally lots of technical and biological noises in the scRNA-seq data since the randomness of gene expression patterns. These data are often characterized by high-dimension, sparsity, large number of “dropout” values, and affected by batch effects. A large number of “dropout” values in scRNA-seq data seriously conceal the important relationship between genes and hinder the downstream analysis. Therefore, the imputation of dropout values of scRNA-seq data is particularly important. We classify, analyze and compare the current advanced scRNA-seq data imputation methods from different angles. Through the comparison and analysis of the principle, advantages and disadvantages of the algorithm, it can provide suggestions for the selection of imputation methods for specific problems and diverse data, and have basic research significance for the downstream function analysis of data. Full article
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)
Show Figures

Figure 1

Back to TopTop