Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Big Data Analysis and Visualization

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (29 February 2020) | Viewed by 48118

Share This Special Issue

Special Issue Editors

Prof. Dr. Kwan-Hee Yoo

E-Mail Website
Guest Editor

Department of Computer Science, Chungbuk National University, 1, Chungdae-ro, Seowon-gu, Cheongju-si 28644, Chungcheongbuk-do, Korea
Interests: big data analysis; data visualization; visual analytics; smart manufacturing; virtual reality; augmented reality
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Carson K. Leung

E-Mail Website
Guest Editor

Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
Interests: AI; data mining and analysis
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Aziz Nasridinov

E-Mail Website
Guest Editor

Department of Computer Science, Chungbuk National University, 1, Chungdae-ro, Seowon-gu, Cheongju-si 28644, Chungcheongbuk-do, Korea
Interests: big data analysis; computer vision and pattern recognition; smart manufacturing

Special Issue Information

Dear Colleagues,

Big data has become a core technology to provide innovative solutions in many fields. Big data analytics is a process of examining data to discover information such as hidden patterns, unknown correlations, market insights, and customer preferences that can be useful to make various business decisions. Recent advances in deep learning, machine learning, and data mining have improved to the point where these techniques can be used in analyzing big data in healthcare, manufacturing, social life, etc.

On the other hand, big data are being investigated using various visual analytical tools. These tools assist in visualizing new meanings and interpretations of the big data and, thus, can help explore the data and simplify the complex big data analytics processes.

Therefore, we invite the academic community and relevant industrial partners to submit papers to this Special Issue, in the following fields:

Novel algorithms for big data analysis
Big data preprocessing techniques (acquisition, integration, and cleaning)
Data mining, machine learning, and deep learning analysis for big data analysis
Application of computer vision techniques in big data analysis
Visual analytics of big data
Visualization techniques for supporting the big data analysis process
Data structures for big data visualization
Application of big data visualization to a variety of fields
Big data visualization: case studies and applications

Prof. Dr. Kwan-Hee Yoo
Prof. Dr. Carson K. Leung
Prof. Dr. Aziz Nasridinov
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Big data
Big data preprocessing
Big data analysis
Big data visualization
Visual analytics
Data mining
Machine learning
Deep learning
Computer vision
Multimedia big data

Published Papers (12 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Editorial

Jump to: Research

5 pages, 189 KiB

Open AccessEditorial

Big Data Analysis and Visualization: Challenges and Solutions

by Kwan-Hee Yoo, Carson K. Leung and Aziz Nasridinov

Appl. Sci. 2022, 12(16), 8248; https://doi.org/10.3390/app12168248 - 18 Aug 2022

Cited by 8 | Viewed by 3254

Abstract

Big data have become a core technology to provide innovative solutions in numerical applications and services in many fields [...] Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

Research

Jump to: Editorial

21 pages, 5760 KiB

Open AccessArticle

Convolutional Neural Network-Based Gear Type Identification from Automatic Identification System Trajectory Data

by Kwang-il Kim and Keon Myung Lee

Appl. Sci. 2020, 10(11), 4010; https://doi.org/10.3390/app10114010 - 10 Jun 2020

Cited by 23 | Viewed by 3337

Abstract

Marine resources are valuable assets to be protected from illegal, unreported, and unregulated (IUU) fishing and overfishing. IUU and overfishing detections require the identification of fishing gears for the fishing ships in operation. This paper is concerned with automatically identifying fishing gears from AIS (automatic identification system)-based trajectory data of fishing ships. It proposes a deep learning-based fishing gear-type identification method in which the six fishing gear type groups are identified from AIS-based ship movement data and environmental data. The proposed method conducts preprocessing to handle different lengths of messaging intervals, missing messages, and contaminated messages for the trajectory data. For capturing complicated dynamic patterns in trajectories of fishing gear types, a sliding window-based data slicing method is used to generate the training data set. The proposed method uses a CNN (convolutional neural network)-based deep neural network model which consists of the feature extraction module and the prediction module. The feature extraction module contains two CNN submodules followed by a fully connected network. The prediction module is a fully connected network which suggests a putative fishing gear type for the features extracted by the feature extraction module from input trajectory data. The proposed CNN-based model has been trained and tested with a real trajectory data set of 1380 fishing ships collected over a year. A new performance index, DPI (total performance of the day-wise performance index) is proposed to compare the performance of gear type identification techniques. To compare the performance of the proposed model, SVM (support vector machine)-based models have been also developed. In the experiments, the trained CNN-based model showed 0.963 DPI, while the SVM models showed 0.814 DPI on average for the 24-h window. The high value of the DPI index indicates that the trained model is good at identifying the types of fishing gears. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

14 pages, 1412 KiB

Open AccessArticle

An Enhanced Multimodal Stacking Scheme for Online Pornographic Content Detection

by Kwangho Song and Yoo-Sung Kim

Appl. Sci. 2020, 10(8), 2943; https://doi.org/10.3390/app10082943 - 24 Apr 2020

Cited by 9 | Viewed by 2576

Abstract

An enhanced multimodal stacking scheme is proposed for quick and accurate online detection of harmful pornographic contents on the Internet. To accurately detect harmful contents, the implicative visual features (auditory features) are extracted using a bi-directional RNN (recurrent neural network) with VGG-16 (a multilayered dilated convolutional network) to implicitly express the signal change patterns over time within each input. Using only the implicative visual and auditory features, a video classifier and an audio classifier are trained, respectively. By using both features together, one fusion classifier is also trained. Then, these three component classifiers are stacked in the enhanced ensemble scheme to reduce the false negative errors in a serial order of the fusion classifier, video classifier, and audio classifier for a quick online detection. The proposed multimodal stacking scheme yields an improved true positive rate of 95.40% and a false negative rate of 4.60%, which are superior values to previous studies. In addition, the proposed stacking scheme can accurately detect harmful contents up to 74.58% and an average rate of 62.16% faster than the previous stacking scheme. Therefore, the proposed enhanced multimodal stacking scheme can be used to quickly and accurately filter out harmful contents in the online environments. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

14 pages, 312 KiB

Open AccessArticle

Career Choice Prediction Based on Campus Big Data—Mining the Potential Behavior of College Students

by Min Nie, Zhaohui Xiong, Ruiyang Zhong, Wei Deng and Guowu Yang

Appl. Sci. 2020, 10(8), 2841; https://doi.org/10.3390/app10082841 - 20 Apr 2020

Cited by 21 | Viewed by 4737

Abstract

Career choice has a pivotal role in college students’ life planning. In the past, professional career appraisers used questionnaires or diagnoses to quantify the factors potentially influencing career choices. However, due to the complexity of each person’s goals and ideas, it is difficult to properly forecast their career choices. Recent evidence suggests that we could use students’ behavioral data to predict their career choices. Based on the simple premise that the most remarkable characteristics of classes are reflected by the main samples of a category, we propose a model called the Approach Cluster Centers Based On XGBOOST (ACCBOX) model to predict students’ career choices. The experimental results of predicting students’ career choices clearly demonstrate the superiority of our method compared to the existing state-of-the-art techniques by evaluating on 13 M behavioral data of over four thousand students. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

14 pages, 4120 KiB

Open AccessArticle

Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study

by JunWeon Yoon, TaeYoung Hong, ChanYeol Park, Seo-Young Noh and HeonChang Yu

Appl. Sci. 2020, 10(7), 2634; https://doi.org/10.3390/app10072634 - 10 Apr 2020

Cited by 4 | Viewed by 2881

Abstract

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

16 pages, 494 KiB

Open AccessArticle

Graph Dilated Network with Rejection Mechanism

by Bencheng Yan, Chaokun Wang and Gaoyang Guo

Appl. Sci. 2020, 10(7), 2421; https://doi.org/10.3390/app10072421 - 02 Apr 2020

Cited by 2 | Viewed by 2188

Abstract

Recently, graph neural networks (GNNs) have achieved great success in dealing with graph-based data. The basic idea of GNNs is iteratively aggregating the information from neighbors, which is a special form of Laplacian smoothing. However, most of GNNs fall into the over-smoothing problem, i.e., when the model goes deeper, the learned representations become indistinguishable. This reflects the inability of the current GNNs to explore the global graph structure. In this paper, we propose a novel graph neural network to address this problem. A rejection mechanism is designed to address the over-smoothing problem, and a dilated graph convolution kernel is presented to capture the high-level graph structure. A number of experimental results demonstrate that the proposed model outperforms the state-of-the-art GNNs, and can effectively overcome the over-smoothing problem. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

13 pages, 2495 KiB

Open AccessArticle

Prediction of Weights during Growth Stages of Onion Using Agricultural Data Analysis Method

by Wanhyun Cho, Myung Hwan Na, Yuha Park, Deok Hyeon Kim and Yongbeen Cho

Appl. Sci. 2020, 10(6), 2094; https://doi.org/10.3390/app10062094 - 19 Mar 2020

Cited by 3 | Viewed by 3932

Abstract

In this study, we propose a new agricultural data analysis method that can predict the weight during the growth stages of the field onion using a functional regression model. We have used onion weight on growth stages as the response variable and six environmental factors such as average temperature, average ground temperature, rainfall, wind speed, sunshine, and humidity as the explanatory variables in the functional regression model. We then define a least minimum integral squared residual (LMISE) measure to obtain an estimate of the function regression coefficient. In addition, a principal component regression analysis was applied to derive the estimates that minimize the defined measures. Next, to evaluate the performance of the proposed model, data were collected, and the following results were identified through analyses of the collected data. First, through graphical and correlation analysis, the ground temperature, mean temperature, and humidity have a very significant effect on the onion weights, but environmental factors such as wind speed, sunshine, and rainfall have a small negative effect on onion weights. Second, through functional regression analysis, we can determine that the ground temperature, sunshine, and precipitation have a significant effect on onion growth and are essential in the goodness-of-fit test. On the other hand, wind speed, mean temperature, and humidity did not significantly affect onion growth. In conclusion, to promote onion growth, the appropriate ground temperature and amount of sunshine are essential, the rainfall and the humidity must be low, and the appropriate wind or mean temperature must be maintained. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

24 pages, 5679 KiB

Open AccessArticle

HI-Sky: Hash Index-Based Skyline Query Processing

by Jong-Hyeok Choi, Fei Hao and Aziz Nasridinov

Appl. Sci. 2020, 10(5), 1708; https://doi.org/10.3390/app10051708 - 02 Mar 2020

Cited by 5 | Viewed by 2854

Abstract

The skyline query has recently attracted a considerable amount of research interest in several fields. The query conducts computations using the domination test, where “domination” means that a data point does not have a worse value than others in any dimension, and has a better value in at least one dimension. Therefore, the skyline query can be used to construct efficient queries based on data from a variety of fields. However, when the number of dimensions or the amount of data increases, naïve skyline queries lead to a degradation in overall performance owing to the higher cost of comparisons among data. Several methods using index structures have been proposed to solve this problem but have not improved the performance of skyline queries because their indices are heavily influenced by the dimensionality and data amount. Therefore, in this study, we propose HI-Sky, a method that can perform quick skyline computations by using the hash index to overcome the above shortcomings. HI-Sky effectively manages data through the hash index and significantly improves performance by effectively eliminating unnecessary data comparisons when computing the skyline. We provide the theoretical background for HI-Sky and verify its improvement in skyline query performance through comparisons with prevalent methods. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

12 pages, 1251 KiB

Open AccessArticle

Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations

by Woosuk Shin, Kwan-Hee Yoo and Nakhoon Baek

Appl. Sci. 2020, 10(5), 1656; https://doi.org/10.3390/app10051656 - 01 Mar 2020

Cited by 11 | Viewed by 3827

Abstract

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

13 pages, 9909 KiB

Open AccessArticle

An Efficient Approach to Consolidating Job Schedulers in Traditional Independent Scientific Workflows

by Byungyun Kong, Geonmo Ryu, Sangwook Bae, Seo-Young Noh and Heejun Yoon

Appl. Sci. 2020, 10(4), 1455; https://doi.org/10.3390/app10041455 - 21 Feb 2020

Cited by 2 | Viewed by 1768

Abstract

The current research paradigm is one of data-driven research. Researchers are beginning to deploy computer facilities to produce and analyze large amounts of data. As requirements for computing power grow, data processing in traditional workstations is always under pressure for efficient resource management. In such an environment, a tremendous amount of data is being processed using parallel computing for efficient and effective research results. HTCondor, as an example, provides computing power for data analysis for researchers. Although such a system works well in a traditional computing cluster environment, we need an efficient methodology to meet the ever-increasing demands of computing using limited resources. In this paper, we propose an approach to integrating clusters that can share their computing power on the basis of a priority policy. Our approach makes it possible to share worker nodes while maintaining the resources allocated to each group. In addition, we have utilized the historical data of user usage in order to analyze problems that have occurred during job execution due to resource sharing and the actual operating results. Our findings can provide a reasonable guideline for limited computing powers shared by multiple scientific groups. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

15 pages, 2646 KiB

Open AccessArticle

Real-Time Hand Gesture Spotting and Recognition Using RGB-D Camera and 3D Convolutional Neural Network

by Dinh-Son Tran, Ngoc-Huynh Ho, Hyung-Jeong Yang, Eu-Tteum Baek, Soo-Hyung Kim and Gueesang Lee

Appl. Sci. 2020, 10(2), 722; https://doi.org/10.3390/app10020722 - 20 Jan 2020

Cited by 62 | Viewed by 12406

Abstract

Using hand gestures is a natural method of interaction between humans and computers. We use gestures to express meaning and thoughts in our everyday conversations. Gesture-based interfaces are used in many applications in a variety of fields, such as smartphones, televisions (TVs), video gaming, and so on. With advancements in technology, hand gesture recognition is becoming an increasingly promising and attractive technique in human–computer interaction. In this paper, we propose a novel method for fingertip detection and hand gesture recognition in real-time using an RGB-D camera and a 3D convolution neural network (3DCNN). This system can accurately and robustly extract fingertip locations and recognize gestures in real-time. We demonstrate the accurateness and robustness of the interface by evaluating hand gesture recognition across a variety of gestures. In addition, we develop a tool to manipulate computer programs to show the possibility of using hand gesture recognition. The experimental results showed that our system has a high level of accuracy of hand gesture recognition. This is thus considered to be a good approach to a gesture-based interface for human–computer interaction by hand in the future. Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Figure 1

14 pages, 1053 KiB

Open AccessArticle

A Study on Data Profiling: Focusing on Attribute Value Quality Index

by Won-Jung Jang, Sung-Taek Lee, Jong-Bae Kim and Gwang-Yong Gim

Appl. Sci. 2019, 9(23), 5054; https://doi.org/10.3390/app9235054 - 22 Nov 2019

Cited by 6 | Viewed by 2633

Abstract

In the era of the Fourth Industrial Revolution, companies are focusing on securing artificial intelligence (AI) technology to enhance their competitiveness via machine learning, which is the core technology of AI, and to allow computers to acquire a high level of quality data through self-learning. Securing good-quality big data is becoming a very important asset for companies to enhance their competitiveness. The volume of digital information is expected to grow rapidly around the world, reaching 90 zettabytes (ZB) by 2020. It is very meaningful to present the value quality index on each data attribute as it may be desirable to evaluate the data quality for a user with regard to whether the data is suitable for use from the user’s point of view. As a result, this allows the user to determine whether they would take the data or not based on the data quality index. In this study, we propose a quality index calculation model with structured and unstructured data, as well as a calculation method for the attribute value quality index (AVQI) and the structured data value quality index (SDVQI). Full article

(This article belongs to the Special Issue Big Data Analysis and Visualization)

► Show Figures

Journal Menu

Journal Browser

Big Data Analysis and Visualization

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (12 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI