Big Data Analysis and Visualization

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (29 February 2020) | Viewed by 48118

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Chungbuk National University, 1, Chungdae-ro, Seowon-gu, Cheongju-si 28644, Chungcheongbuk-do, Korea
Interests: big data analysis; data visualization; visual analytics; smart manufacturing; virtual reality; augmented reality
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, Chungbuk National University, 1, Chungdae-ro, Seowon-gu, Cheongju-si 28644, Chungcheongbuk-do, Korea
Interests: big data analysis; computer vision and pattern recognition; smart manufacturing

Special Issue Information

Dear Colleagues,

Big data has become a core technology to provide innovative solutions in many fields. Big data analytics is a process of examining data to discover information such as hidden patterns, unknown correlations, market insights, and customer preferences that can be useful to make various business decisions. Recent advances in deep learning, machine learning, and data mining have improved to the point where these techniques can be used in analyzing big data in healthcare, manufacturing, social life, etc.

On the other hand, big data are being investigated using various visual analytical tools. These tools assist in visualizing new meanings and interpretations of the big data and, thus, can help explore the data and simplify the complex big data analytics processes.

Therefore, we invite the academic community and relevant industrial partners to submit papers to this Special Issue, in the following fields:

  • Novel algorithms for big data analysis
  • Big data preprocessing techniques (acquisition, integration, and cleaning)
  • Data mining, machine learning, and deep learning analysis for big data analysis
  • Application of computer vision techniques in big data analysis
  • Visual analytics of big data
  • Visualization techniques for supporting the big data analysis process
  • Data structures for big data visualization
  • Application of big data visualization to a variety of fields
  • Big data visualization: case studies and applications

Prof. Dr. Kwan-Hee Yoo
Prof. Dr. Carson K. Leung
Prof. Dr. Aziz Nasridinov
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Big data 
  • Big data preprocessing
  • Big data analysis 
  • Big data visualization
  • Visual analytics 
  • Data mining 
  • Machine learning 
  • Deep learning 
  • Computer vision 
  • Multimedia big data

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

5 pages, 189 KiB  
Editorial
Big Data Analysis and Visualization: Challenges and Solutions
by Kwan-Hee Yoo, Carson K. Leung and Aziz Nasridinov
Appl. Sci. 2022, 12(16), 8248; https://doi.org/10.3390/app12168248 - 18 Aug 2022
Cited by 8 | Viewed by 3254
Abstract
Big data have become a core technology to provide innovative solutions in numerical applications and services in many fields [...] Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)

Research

Jump to: Editorial

21 pages, 5760 KiB  
Article
Convolutional Neural Network-Based Gear Type Identification from Automatic Identification System Trajectory Data
by Kwang-il Kim and Keon Myung Lee
Appl. Sci. 2020, 10(11), 4010; https://doi.org/10.3390/app10114010 - 10 Jun 2020
Cited by 23 | Viewed by 3337
Abstract
Marine resources are valuable assets to be protected from illegal, unreported, and unregulated (IUU) fishing and overfishing. IUU and overfishing detections require the identification of fishing gears for the fishing ships in operation. This paper is concerned with automatically identifying fishing gears from [...] Read more.
Marine resources are valuable assets to be protected from illegal, unreported, and unregulated (IUU) fishing and overfishing. IUU and overfishing detections require the identification of fishing gears for the fishing ships in operation. This paper is concerned with automatically identifying fishing gears from AIS (automatic identification system)-based trajectory data of fishing ships. It proposes a deep learning-based fishing gear-type identification method in which the six fishing gear type groups are identified from AIS-based ship movement data and environmental data. The proposed method conducts preprocessing to handle different lengths of messaging intervals, missing messages, and contaminated messages for the trajectory data. For capturing complicated dynamic patterns in trajectories of fishing gear types, a sliding window-based data slicing method is used to generate the training data set. The proposed method uses a CNN (convolutional neural network)-based deep neural network model which consists of the feature extraction module and the prediction module. The feature extraction module contains two CNN submodules followed by a fully connected network. The prediction module is a fully connected network which suggests a putative fishing gear type for the features extracted by the feature extraction module from input trajectory data. The proposed CNN-based model has been trained and tested with a real trajectory data set of 1380 fishing ships collected over a year. A new performance index, DPI (total performance of the day-wise performance index) is proposed to compare the performance of gear type identification techniques. To compare the performance of the proposed model, SVM (support vector machine)-based models have been also developed. In the experiments, the trained CNN-based model showed 0.963 DPI, while the SVM models showed 0.814 DPI on average for the 24-h window. The high value of the DPI index indicates that the trained model is good at identifying the types of fishing gears. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

14 pages, 1412 KiB  
Article
An Enhanced Multimodal Stacking Scheme for Online Pornographic Content Detection
by Kwangho Song and Yoo-Sung Kim
Appl. Sci. 2020, 10(8), 2943; https://doi.org/10.3390/app10082943 - 24 Apr 2020
Cited by 9 | Viewed by 2576
Abstract
An enhanced multimodal stacking scheme is proposed for quick and accurate online detection of harmful pornographic contents on the Internet. To accurately detect harmful contents, the implicative visual features (auditory features) are extracted using a bi-directional RNN (recurrent neural network) with VGG-16 (a [...] Read more.
An enhanced multimodal stacking scheme is proposed for quick and accurate online detection of harmful pornographic contents on the Internet. To accurately detect harmful contents, the implicative visual features (auditory features) are extracted using a bi-directional RNN (recurrent neural network) with VGG-16 (a multilayered dilated convolutional network) to implicitly express the signal change patterns over time within each input. Using only the implicative visual and auditory features, a video classifier and an audio classifier are trained, respectively. By using both features together, one fusion classifier is also trained. Then, these three component classifiers are stacked in the enhanced ensemble scheme to reduce the false negative errors in a serial order of the fusion classifier, video classifier, and audio classifier for a quick online detection. The proposed multimodal stacking scheme yields an improved true positive rate of 95.40% and a false negative rate of 4.60%, which are superior values to previous studies. In addition, the proposed stacking scheme can accurately detect harmful contents up to 74.58% and an average rate of 62.16% faster than the previous stacking scheme. Therefore, the proposed enhanced multimodal stacking scheme can be used to quickly and accurately filter out harmful contents in the online environments. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

14 pages, 312 KiB  
Article
Career Choice Prediction Based on Campus Big Data—Mining the Potential Behavior of College Students
by Min Nie, Zhaohui Xiong, Ruiyang Zhong, Wei Deng and Guowu Yang
Appl. Sci. 2020, 10(8), 2841; https://doi.org/10.3390/app10082841 - 20 Apr 2020
Cited by 21 | Viewed by 4737
Abstract
Career choice has a pivotal role in college students’ life planning. In the past, professional career appraisers used questionnaires or diagnoses to quantify the factors potentially influencing career choices. However, due to the complexity of each person’s goals and ideas, it is difficult [...] Read more.
Career choice has a pivotal role in college students’ life planning. In the past, professional career appraisers used questionnaires or diagnoses to quantify the factors potentially influencing career choices. However, due to the complexity of each person’s goals and ideas, it is difficult to properly forecast their career choices. Recent evidence suggests that we could use students’ behavioral data to predict their career choices. Based on the simple premise that the most remarkable characteristics of classes are reflected by the main samples of a category, we propose a model called the Approach Cluster Centers Based On XGBOOST (ACCBOX) model to predict students’ career choices. The experimental results of predicting students’ career choices clearly demonstrate the superiority of our method compared to the existing state-of-the-art techniques by evaluating on 13 M behavioral data of over four thousand students. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

14 pages, 4120 KiB  
Article
Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study
by JunWeon Yoon, TaeYoung Hong, ChanYeol Park, Seo-Young Noh and HeonChang Yu
Appl. Sci. 2020, 10(7), 2634; https://doi.org/10.3390/app10072634 - 10 Apr 2020
Cited by 4 | Viewed by 2881
Abstract
High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the [...] Read more.
High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

16 pages, 494 KiB  
Article
Graph Dilated Network with Rejection Mechanism
by Bencheng Yan, Chaokun Wang and Gaoyang Guo
Appl. Sci. 2020, 10(7), 2421; https://doi.org/10.3390/app10072421 - 02 Apr 2020
Cited by 2 | Viewed by 2188
Abstract
Recently, graph neural networks (GNNs) have achieved great success in dealing with graph-based data. The basic idea of GNNs is iteratively aggregating the information from neighbors, which is a special form of Laplacian smoothing. However, most of GNNs fall into the over-smoothing problem, [...] Read more.
Recently, graph neural networks (GNNs) have achieved great success in dealing with graph-based data. The basic idea of GNNs is iteratively aggregating the information from neighbors, which is a special form of Laplacian smoothing. However, most of GNNs fall into the over-smoothing problem, i.e., when the model goes deeper, the learned representations become indistinguishable. This reflects the inability of the current GNNs to explore the global graph structure. In this paper, we propose a novel graph neural network to address this problem. A rejection mechanism is designed to address the over-smoothing problem, and a dilated graph convolution kernel is presented to capture the high-level graph structure. A number of experimental results demonstrate that the proposed model outperforms the state-of-the-art GNNs, and can effectively overcome the over-smoothing problem. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

13 pages, 2495 KiB  
Article
Prediction of Weights during Growth Stages of Onion Using Agricultural Data Analysis Method
by Wanhyun Cho, Myung Hwan Na, Yuha Park, Deok Hyeon Kim and Yongbeen Cho
Appl. Sci. 2020, 10(6), 2094; https://doi.org/10.3390/app10062094 - 19 Mar 2020
Cited by 3 | Viewed by 3932
Abstract
In this study, we propose a new agricultural data analysis method that can predict the weight during the growth stages of the field onion using a functional regression model. We have used onion weight on growth stages as the response variable and six [...] Read more.
In this study, we propose a new agricultural data analysis method that can predict the weight during the growth stages of the field onion using a functional regression model. We have used onion weight on growth stages as the response variable and six environmental factors such as average temperature, average ground temperature, rainfall, wind speed, sunshine, and humidity as the explanatory variables in the functional regression model. We then define a least minimum integral squared residual (LMISE) measure to obtain an estimate of the function regression coefficient. In addition, a principal component regression analysis was applied to derive the estimates that minimize the defined measures. Next, to evaluate the performance of the proposed model, data were collected, and the following results were identified through analyses of the collected data. First, through graphical and correlation analysis, the ground temperature, mean temperature, and humidity have a very significant effect on the onion weights, but environmental factors such as wind speed, sunshine, and rainfall have a small negative effect on onion weights. Second, through functional regression analysis, we can determine that the ground temperature, sunshine, and precipitation have a significant effect on onion growth and are essential in the goodness-of-fit test. On the other hand, wind speed, mean temperature, and humidity did not significantly affect onion growth. In conclusion, to promote onion growth, the appropriate ground temperature and amount of sunshine are essential, the rainfall and the humidity must be low, and the appropriate wind or mean temperature must be maintained. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

24 pages, 5679 KiB  
Article
HI-Sky: Hash Index-Based Skyline Query Processing
by Jong-Hyeok Choi, Fei Hao and Aziz Nasridinov
Appl. Sci. 2020, 10(5), 1708; https://doi.org/10.3390/app10051708 - 02 Mar 2020
Cited by 5 | Viewed by 2854
Abstract
The skyline query has recently attracted a considerable amount of research interest in several fields. The query conducts computations using the domination test, where “domination” means that a data point does not have a worse value than others in any dimension, and has [...] Read more.
The skyline query has recently attracted a considerable amount of research interest in several fields. The query conducts computations using the domination test, where “domination” means that a data point does not have a worse value than others in any dimension, and has a better value in at least one dimension. Therefore, the skyline query can be used to construct efficient queries based on data from a variety of fields. However, when the number of dimensions or the amount of data increases, naïve skyline queries lead to a degradation in overall performance owing to the higher cost of comparisons among data. Several methods using index structures have been proposed to solve this problem but have not improved the performance of skyline queries because their indices are heavily influenced by the dimensionality and data amount. Therefore, in this study, we propose HI-Sky, a method that can perform quick skyline computations by using the hash index to overcome the above shortcomings. HI-Sky effectively manages data through the hash index and significantly improves performance by effectively eliminating unnecessary data comparisons when computing the skyline. We provide the theoretical background for HI-Sky and verify its improvement in skyline query performance through comparisons with prevalent methods. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

12 pages, 1251 KiB  
Article
Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations
by Woosuk Shin, Kwan-Hee Yoo and Nakhoon Baek
Appl. Sci. 2020, 10(5), 1656; https://doi.org/10.3390/app10051656 - 01 Mar 2020
Cited by 11 | Viewed by 3827
Abstract
Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. [...] Read more.
Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

13 pages, 9909 KiB  
Article
An Efficient Approach to Consolidating Job Schedulers in Traditional Independent Scientific Workflows
by Byungyun Kong, Geonmo Ryu, Sangwook Bae, Seo-Young Noh and Heejun Yoon
Appl. Sci. 2020, 10(4), 1455; https://doi.org/10.3390/app10041455 - 21 Feb 2020
Cited by 2 | Viewed by 1768
Abstract
The current research paradigm is one of data-driven research. Researchers are beginning to deploy computer facilities to produce and analyze large amounts of data. As requirements for computing power grow, data processing in traditional workstations is always under pressure for efficient resource management. [...] Read more.
The current research paradigm is one of data-driven research. Researchers are beginning to deploy computer facilities to produce and analyze large amounts of data. As requirements for computing power grow, data processing in traditional workstations is always under pressure for efficient resource management. In such an environment, a tremendous amount of data is being processed using parallel computing for efficient and effective research results. HTCondor, as an example, provides computing power for data analysis for researchers. Although such a system works well in a traditional computing cluster environment, we need an efficient methodology to meet the ever-increasing demands of computing using limited resources. In this paper, we propose an approach to integrating clusters that can share their computing power on the basis of a priority policy. Our approach makes it possible to share worker nodes while maintaining the resources allocated to each group. In addition, we have utilized the historical data of user usage in order to analyze problems that have occurred during job execution due to resource sharing and the actual operating results. Our findings can provide a reasonable guideline for limited computing powers shared by multiple scientific groups. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

15 pages, 2646 KiB  
Article
Real-Time Hand Gesture Spotting and Recognition Using RGB-D Camera and 3D Convolutional Neural Network
by Dinh-Son Tran, Ngoc-Huynh Ho, Hyung-Jeong Yang, Eu-Tteum Baek, Soo-Hyung Kim and Gueesang Lee
Appl. Sci. 2020, 10(2), 722; https://doi.org/10.3390/app10020722 - 20 Jan 2020
Cited by 62 | Viewed by 12406
Abstract
Using hand gestures is a natural method of interaction between humans and computers. We use gestures to express meaning and thoughts in our everyday conversations. Gesture-based interfaces are used in many applications in a variety of fields, such as smartphones, televisions (TVs), video [...] Read more.
Using hand gestures is a natural method of interaction between humans and computers. We use gestures to express meaning and thoughts in our everyday conversations. Gesture-based interfaces are used in many applications in a variety of fields, such as smartphones, televisions (TVs), video gaming, and so on. With advancements in technology, hand gesture recognition is becoming an increasingly promising and attractive technique in human–computer interaction. In this paper, we propose a novel method for fingertip detection and hand gesture recognition in real-time using an RGB-D camera and a 3D convolution neural network (3DCNN). This system can accurately and robustly extract fingertip locations and recognize gestures in real-time. We demonstrate the accurateness and robustness of the interface by evaluating hand gesture recognition across a variety of gestures. In addition, we develop a tool to manipulate computer programs to show the possibility of using hand gesture recognition. The experimental results showed that our system has a high level of accuracy of hand gesture recognition. This is thus considered to be a good approach to a gesture-based interface for human–computer interaction by hand in the future. Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

14 pages, 1053 KiB  
Article
A Study on Data Profiling: Focusing on Attribute Value Quality Index
by Won-Jung Jang, Sung-Taek Lee, Jong-Bae Kim and Gwang-Yong Gim
Appl. Sci. 2019, 9(23), 5054; https://doi.org/10.3390/app9235054 - 22 Nov 2019
Cited by 6 | Viewed by 2633
Abstract
In the era of the Fourth Industrial Revolution, companies are focusing on securing artificial intelligence (AI) technology to enhance their competitiveness via machine learning, which is the core technology of AI, and to allow computers to acquire a high level of quality data [...] Read more.
In the era of the Fourth Industrial Revolution, companies are focusing on securing artificial intelligence (AI) technology to enhance their competitiveness via machine learning, which is the core technology of AI, and to allow computers to acquire a high level of quality data through self-learning. Securing good-quality big data is becoming a very important asset for companies to enhance their competitiveness. The volume of digital information is expected to grow rapidly around the world, reaching 90 zettabytes (ZB) by 2020. It is very meaningful to present the value quality index on each data attribute as it may be desirable to evaluate the data quality for a user with regard to whether the data is suitable for use from the user’s point of view. As a result, this allows the user to determine whether they would take the data or not based on the data quality index. In this study, we propose a quality index calculation model with structured and unstructured data, as well as a calculation method for the attribute value quality index (AVQI) and the structured data value quality index (SDVQI). Full article
(This article belongs to the Special Issue Big Data Analysis and Visualization)
Show Figures

Figure 1

Back to TopTop