Emerging Approaches and Advances in Big Data

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: closed (30 June 2018) | Viewed by 92742

Special Issue Editors


E-Mail
Guest Editor
Department of Computer Science and Software Engineering, Xi’an Jiaotong Liverpool University, Suzhou Dushu Lake Higher Education Town, Suzhou Industrial Park, Suzhou, Jiangsu Province, China
Interests: wireless sensor networks; Internet of Things; Artificial Intelligence and photovoltaic
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor
School of Science and Technology, Nottingham Trent University, Clifton Campus, Nottingham, NG11 8NS, UK

Special Issue Information

Dear Colleagues,

The growth of big data presents challenges, as well as opportunities, for industries and academia. Accumulated data can be extracted, processed, analyzed, and reported in time to deliver better data insights, complex patterns and valuable predictions to the design and analysis of various systems/platforms, including complex business models, highly scalable system and reconfigurable hardware and software systems, as well as wireless sensor and actuator networks. The main building blocks of big data analytics include:

  • big data thinking

  • computational tools

  • data modelling

  • analytical algorithms

  • data governance

Big data thinking is an exciting area that, not only involves business organizational data-related culture, but also big data projects initiation, team formation and best practices. Computational platforms and tools offer adaptive mechanisms that enable the understanding of data in complex and changing environments. Algorithms and analysis methods are the foundations for many solutions to real problems. Data and information governance and social responsibility directly affect data usage and social acceptance of business solutions.

This Special Issue on “Emerging Approaches and Advances in Big Data” will focus on emerging approaches and recent advances on architectures, design techniques, modeling and prototyping solutions for the design of complex business models, highly scalable system and reconfigurable hardware and software systems, and computing networks in the era of big data.

Prof. Dr. Ka Lok Man
Dr. Kevin Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data architecture, modelling and toolkits

  • big data for business model and intelligence

  • big data challenges for small, medium and large enterprises

  • big data analytics and innovations

  • big data systems/analytics on emerging hardware/software architectures and computing networks

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Other

2 pages, 128 KiB  
Editorial
Emerging Approaches and Advances in Big Data
by Ka Lok Man and Kevin Lee
Symmetry 2019, 11(2), 213; https://doi.org/10.3390/sym11020213 - 13 Feb 2019
Viewed by 1703
Abstract
This special issue of Symmetry entitled “Emerging Approaches and Advances in Big Data” consists of 17 papers [...] Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)

Research

Jump to: Editorial, Other

8 pages, 932 KiB  
Article
Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM
by Sang-Ick Kang and Sangmin Lee
Symmetry 2018, 10(11), 605; https://doi.org/10.3390/sym10110605 - 07 Nov 2018
Cited by 4 | Viewed by 3084
Abstract
The competition of speech recognition technology related to smartphones is now getting into full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is necessary to detect speech signals in various acoustic environments. Speech/music classification that facilitates optimized [...] Read more.
The competition of speech recognition technology related to smartphones is now getting into full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is necessary to detect speech signals in various acoustic environments. Speech/music classification that facilitates optimized signal processing from classification results has been extensively adapted as an essential part of various electronics applications, such as multi-rate audio codecs, automatic speech recognition, and multimedia document indexing. In this paper, we propose a new technique to improve robustness of a speech/music classifier for an enhanced voice service (EVS) codec adopted as a voice-over-LTE (VoLTE) speech codec using long short-term memory (LSTM). For effective speech/music classification, feature vectors implemented with the LSTM are chosen from the features of the EVS. To overcome the diversity of music data, a large scale of data is used for learning. Experiments show that LSTM-based speech/music classification provides better results than the conventional EVS speech/music classification algorithm in various conditions and types of speech/music data, especially at lower signal-to-noise ratio (SNR) than conventional EVS algorithm. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

23 pages, 4317 KiB  
Article
A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark
by Behrooz Hosseini and Kourosh Kiani
Symmetry 2018, 10(8), 342; https://doi.org/10.3390/sym10080342 - 15 Aug 2018
Cited by 15 | Viewed by 4160
Abstract
Unsupervised machine learning and knowledge discovery from large-scale datasets have recently attracted a lot of research interest. The present paper proposes a distributed big data clustering approach-based on adaptive density estimation. The proposed method is developed-based on Apache Spark framework and tested on [...] Read more.
Unsupervised machine learning and knowledge discovery from large-scale datasets have recently attracted a lot of research interest. The present paper proposes a distributed big data clustering approach-based on adaptive density estimation. The proposed method is developed-based on Apache Spark framework and tested on some of the prevalent datasets. In the first step of this algorithm, the input data is divided into partitions using a Bayesian type of Locality Sensitive Hashing (LSH). Partitioning makes the processing fully parallel and much simpler by avoiding unneeded calculations. Each of the proposed algorithm steps is completely independent of the others and no serial bottleneck exists all over the clustering procedure. Locality preservation also filters out the outliers and enhances the robustness of the proposed approach. Density is defined on the basis of Ordered Weighted Averaging (OWA) distance which makes clusters more homogenous. According to the density of each node, the local density peaks will be detected adaptively. By merging the local peaks, final cluster centers will be obtained and other data points will be a member of the cluster with the nearest center. The proposed method has been implemented and compared with similar recently published researches. Cluster validity indexes achieved from the proposed method shows its superiorities in precision and noise robustness in comparison with recent researches. Comparison with similar approaches also shows superiorities of the proposed method in scalability, high performance, and low computation cost. The proposed method is a general clustering approach and it has been used in gene expression clustering as a sample of its application. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

15 pages, 2227 KiB  
Article
A Quick Gbest Guided Artificial Bee Colony Algorithm for Stock Market Prices Prediction
by Habib Shah, Nasser Tairan, Harish Garg and Rozaida Ghazali
Symmetry 2018, 10(7), 292; https://doi.org/10.3390/sym10070292 - 20 Jul 2018
Cited by 26 | Viewed by 4900
Abstract
The objective of this work is to present a Quick Gbest Guided artificial bee colony (ABC) learning algorithm to train the feedforward neural network (QGGABC-FFNN) model for the prediction of the trends in the stock markets. As it is quite important to know [...] Read more.
The objective of this work is to present a Quick Gbest Guided artificial bee colony (ABC) learning algorithm to train the feedforward neural network (QGGABC-FFNN) model for the prediction of the trends in the stock markets. As it is quite important to know that nowadays, stock market prediction of trends is a significant financial global issue. The scientists, finance administration, companies, and leadership of a given country struggle towards developing a strong financial position. Several technical, industrial, fundamental, scientific, and statistical tools have been proposed and used with varying results. Still, predicting an exact or near-to-exact trend of the Stock Market values behavior is an open problem. In this respect, in the present manuscript, we propose an algorithm based on ABC to minimize the error in the trend and actual values by using the hybrid technique based on neural network and artificial intelligence. The presented approach has been verified and tested to predict the accurate trend of Saudi Stock Market (SSM) values. The proposed QGGABC-ANN based on bio-inspired learning algorithm with its high degree of accuracy could be used as an investment advisor for the investors and traders in the future of SSM. The proposed approach is based mainly on SSM historical data covering a large span of time. From the simulation findings, the proposed QGGABC-FFNN outperformed compared with other typical computational algorithms for prediction of SSM values. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

16 pages, 609 KiB  
Article
A Local Approximation Approach for Processing Time-Evolving Graphs
by Shuo Ji and Yinliang Zhao
Symmetry 2018, 10(7), 247; https://doi.org/10.3390/sym10070247 - 01 Jul 2018
Cited by 4 | Viewed by 2993
Abstract
To efficiently process time-evolving graphs where new vertices and edges are inserted over time, an incremental computing model, which processes the newly-constructed graph based on the results of the computation on the outdated graph, is widely adopted in distributed time-evolving graph computing systems. [...] Read more.
To efficiently process time-evolving graphs where new vertices and edges are inserted over time, an incremental computing model, which processes the newly-constructed graph based on the results of the computation on the outdated graph, is widely adopted in distributed time-evolving graph computing systems. In this paper, we first experimentally study how the results of the graph computation on the local graph structure can approximate the results of the graph computation on the complete graph structure in distributed environments. Then, we develop an optimization approach to reduce the response time in bulk synchronous parallel (BSP)-based incremental computing systems by processing time-evolving graphs on the local graph structure instead of on the complete graph structure. We have evaluated our optimization approach using the graph algorithms single-source shortest path (SSSP) and PageRankon the Amazon Elastic Compute Cloud(EC2), a central part of Amazon.com’s cloud-computing platform, with different scales of graph datasets. The experimental results demonstrate that the local approximation approach can reduce the response time for the SSSP algorithm by 22% and reduce the response time for the PageRank algorithm by 7% on average compared to the existing incremental computing framework of GraphTau. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

16 pages, 2728 KiB  
Article
Research on Electronic Voltage Transformer for Big Data Background
by Zhen-Hua Li, Yao Wang, Zheng-Tian Wu and Zhen-Xing Li
Symmetry 2018, 10(7), 234; https://doi.org/10.3390/sym10070234 - 21 Jun 2018
Cited by 24 | Viewed by 3484
Abstract
A new type of electronic voltage transformer is proposed in this study for big data background. By using the conventional inverted SF_6 transformer insulation structure, a coaxial capacitor sensor was constructed by designing a middle coaxial electrode between the high-voltage electrode and the [...] Read more.
A new type of electronic voltage transformer is proposed in this study for big data background. By using the conventional inverted SF_6 transformer insulation structure, a coaxial capacitor sensor was constructed by designing a middle coaxial electrode between the high-voltage electrode and the ground electrode. The measurement of the voltage signal could be obtained by detecting the capacitance current i of the SF_6 coaxial capacitor. To improve the accuracy of the integrator, a high-precision digital integrator based on the Romberg algorithm is proposed in this study. This can not only guarantee the accuracy of computation, but also reduce the consumption time; in addition, the sampling point can be reused. By adopting the double shielding effect of the high-voltage shell and the grounding metal shield, the ability and stability of the coaxial capacitor divide could be effectively improved to resist the interference of stray electric fields. The factors that affect the coaxial capacitor were studied, such as position, temperature, and pressure, which will influence the value of the coaxial capacitor. Tests were carried out to verify the performance. The results showed that the voltage transformer based on the SF_6 coaxial capacitor satisfies the requirements of the 0.2 accuracy class. This study can promote the use of new high-performance products for data transmission in the era of big data and specific test analyses. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

13 pages, 292 KiB  
Article
Adaptive Incremental Genetic Algorithm for Task Scheduling in Cloud Environments
by Kairong Duan, Simon Fong, Shirley W. I. Siu, Wei Song and Steven Sheng-Uei Guan
Symmetry 2018, 10(5), 168; https://doi.org/10.3390/sym10050168 - 17 May 2018
Cited by 23 | Viewed by 4386
Abstract
Cloud computing is a new commercial model that enables customers to acquire large amounts of virtual resources on demand. Resources including hardware and software can be delivered as services and measured by specific usage of storage, processing, bandwidth, etc. In Cloud computing, task [...] Read more.
Cloud computing is a new commercial model that enables customers to acquire large amounts of virtual resources on demand. Resources including hardware and software can be delivered as services and measured by specific usage of storage, processing, bandwidth, etc. In Cloud computing, task scheduling is a process of mapping cloud tasks to Virtual Machines (VMs). When binding the tasks to VMs, the scheduling strategy has an important influence on the efficiency of datacenter and related energy consumption. Although many traditional scheduling algorithms have been applied in various platforms, they may not work efficiently due to the large number of user requests, the variety of computation resources and complexity of Cloud environment. In this paper, we tackle the task scheduling problem which aims to minimize makespan by Genetic Algorithm (GA). We propose an incremental GA which has adaptive probabilities of crossover and mutation. The mutation and crossover rates change according to generations and also vary between individuals. Large numbers of tasks are randomly generated to simulate various scales of task scheduling problem in Cloud environment. Based on the instance types of Amazon EC2, we implemented virtual machines with different computing capacity on CloudSim. We compared the performance of the adaptive incremental GA with that of Standard GA, Min-Min, Max-Min , Simulated Annealing and Artificial Bee Colony Algorithm in finding the optimal scheme. Experimental results show that the proposed algorithm can achieve feasible solutions which have acceptable makespan with less computation time. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

16 pages, 1364 KiB  
Article
Carbon Oxides Gases for Occupancy Counting and Emergency Control in Fog Environment
by Kairong Duan, Simon Fong, Yan Zhuang and Wei Song
Symmetry 2018, 10(3), 66; https://doi.org/10.3390/sym10030066 - 15 Mar 2018
Cited by 3 | Viewed by 3833
Abstract
The information of human occupancy plays a crucial role in building management. For instance, fewer people, less demand for heat and electricity supply, and vice versa. Moreover, when there is a fire in a building, it is convenient to know how many persons [...] Read more.
The information of human occupancy plays a crucial role in building management. For instance, fewer people, less demand for heat and electricity supply, and vice versa. Moreover, when there is a fire in a building, it is convenient to know how many persons in a single room there are in order to plan a more efficient rescue strategy. However, currently most buildings have not installed adequate devices that can be used to count the number of people, and the most popular embedded fire alarm system triggers a warning only when a fire breaks out with plenty of smoke. In view of this constraint, in this paper we propose a carbon oxides gases based warning system to detect potential fire breakouts and to estimate the number of people in the proximity. In order to validate the efficiency of the devised system, we simulate its application in the Fog Computing environment. Furthermore, we also improve the iFogSim by giving data analytics capacity to it. Based on this framework, energy consumption, latency, and network usage of the designed system obtained from iFogSim are compared with those obtained from Cloud environment. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

15 pages, 4123 KiB  
Article
Detecting Ghost Targets Using Multilayer Perceptron in Multiple-Target Tracking
by In-hwan Ryu, Insu Won and Jangwoo Kwon
Symmetry 2018, 10(1), 16; https://doi.org/10.3390/sym10010016 - 04 Jan 2018
Cited by 15 | Viewed by 5449
Abstract
This paper deals with a method for removing a ghost target that is not a real object from the output of a multiple object-tracking algorithm. This method uses an artificial neural network (multilayer perceptron) and introduces a structure, learning, verification, and evaluation method [...] Read more.
This paper deals with a method for removing a ghost target that is not a real object from the output of a multiple object-tracking algorithm. This method uses an artificial neural network (multilayer perceptron) and introduces a structure, learning, verification, and evaluation method for the artificial neural network. The implemented system was tested at an intersection in a city center. Results from a 28-min measurement were 88% accurate when the multilayer perceptron for ghost target classification successfully detected the ghost targets, and 6.7% inaccurate when ghost targets were mistaken for actual targets. This method is expected to contribute to the advancement of intelligent transportation systems if the weaknesses revealed during the evaluation of the system are complemented and refined. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Graphical abstract

7571 KiB  
Article
A Novel String Grammar Unsupervised Possibilistic C-Medians Algorithm for Sign Language Translation Systems
by Atcharin Klomsae, Sansanee Auephanwiriyakul and Nipon Theera-Umpon
Symmetry 2017, 9(12), 321; https://doi.org/10.3390/sym9120321 - 19 Dec 2017
Cited by 7 | Viewed by 4640
Abstract
Sign language is a basic method for solving communication problems between deaf and hearing people. In order to communicate, deaf and hearing people normally use hand gestures, which include a combination of hand positioning, hand shapes, and hand movements. Thai Sign Language is [...] Read more.
Sign language is a basic method for solving communication problems between deaf and hearing people. In order to communicate, deaf and hearing people normally use hand gestures, which include a combination of hand positioning, hand shapes, and hand movements. Thai Sign Language is the communication method for Thai hearing-impaired people. Our objective is to improve the dynamic Thai Sign Language translation method with a video captioning technique that does not require prior hand region detection and segmentation through using the Scale Invariant Feature Transform (SIFT) method and the String Grammar Unsupervised Possibilistic C-Medians (sgUPCMed) algorithm. This work is the first to propose the sgUPCMed algorithm to cope with the unsupervised generation of multiple prototypes in the possibilistic sense for string data. In our experiments, the Thai Sign Language data set (10 isolated sign language words) was collected from 25 subjects. The best average result within the constrained environment of the blind test data sets of signer-dependent cases was 89–91%, and the successful rate of signer semi-independent cases was 81–85%, on average. For the blind test data sets of signer-independent cases, the best average classification rate was 77–80%. The average result of the system without a constrained environment was around 62–80% for the signer-independent experiments. To show that the proposed algorithm can be implemented in other sign languages, the American sign language (RWTH-BOSTON-50) data set, which consists of 31 isolated American Sign Language words, is also used in the experiment. The system provides 88.56% and 91.35% results on the validation set alone, and for both the training and validation sets, respectively. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

3196 KiB  
Article
System Framework for Cardiovascular Disease Prediction Based on Big Data Technology
by Sang Hun Han, Kyoung Ok Kim, Eun Jong Cha, Kyung Ah Kim and Ho Sun Shon
Symmetry 2017, 9(12), 293; https://doi.org/10.3390/sym9120293 - 27 Nov 2017
Cited by 16 | Viewed by 6217
Abstract
Amid growing concern over the changing climate, environment, and health care, the interconnectivity between cardiovascular diseases, coupled with rapid industrialization, and a variety of environmental factors, has been the focus of recent research. It is necessary to research risk factor extraction techniques that [...] Read more.
Amid growing concern over the changing climate, environment, and health care, the interconnectivity between cardiovascular diseases, coupled with rapid industrialization, and a variety of environmental factors, has been the focus of recent research. It is necessary to research risk factor extraction techniques that consider individual external factors and predict diseases and conditions. Therefore, we designed a framework to collect and store various domains of data on the causes of cardiovascular disease, and constructed a big data integrated database. A variety of open source databases were integrated and migrated onto distributed storage devices. The integrated database was composed of clinical data on cardiovascular diseases, national health and nutrition examination surveys, statistical geographic information, population and housing censuses, meteorological administration data, and Health Insurance Review and Assessment Service data. The framework was composed of data, speed, analysis, and service layers, all stored on distributed storage devices. Finally, we proposed a framework for a cardiovascular disease prediction system based on lambda architecture to solve the problems associated with the real-time analyses of big data. This system can be used to help predict and diagnose illnesses, such as cardiovascular diseases. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

3246 KiB  
Article
Self-Adaptive Pre-Processing Methodology for Big Data Stream Mining in Internet of Things Environmental Sensor Monitoring
by Kun Lan, Simon Fong, Wei Song, Athanasios V. Vasilakos and Richard C. Millham
Symmetry 2017, 9(10), 244; https://doi.org/10.3390/sym9100244 - 21 Oct 2017
Cited by 13 | Viewed by 5801
Abstract
Over the years, advanced IT technologies have facilitated the emergence of new ways of generating and gathering data rapidly, continuously, and largely and are associated with a new research and application branch, namely, data stream mining (DSM). Among those multiple scenarios of DSM, [...] Read more.
Over the years, advanced IT technologies have facilitated the emergence of new ways of generating and gathering data rapidly, continuously, and largely and are associated with a new research and application branch, namely, data stream mining (DSM). Among those multiple scenarios of DSM, the Internet of Things (IoT) plays a significant role, with a typical meaning of a tough and challenging computational case of big data. In this paper, we describe a self-adaptive approach to the pre-processing step of data stream classification. The proposed algorithm allows different divisions with both variable numbers and lengths of sub-windows under a whole sliding window on an input stream, and clustering-based particle swarm optimization (CPSO) is adopted as the main metaheuristic search method to guarantee that its stream segmentations are effective and adaptive to itself. In order to create a more abundant search space, statistical feature extraction (SFX) is applied after variable partitions of the entire sliding window. We validate and test the effort of our algorithm with other temporal methods according to several IoT environmental sensor monitoring datasets. The experiments yield encouraging outcomes, supporting the reality that picking significant appropriate variant sub-window segmentations heuristically with an incorporated clustering technique merit would allow these to perform better than others. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

1360 KiB  
Article
Toward Bulk Synchronous Parallel-Based Machine Learning Techniques for Anomaly Detection in High-Speed Big Data Networks
by Kamran Siddique, Zahid Akhtar, Haeng-gon Lee, Woongsup Kim and Yangwoo Kim
Symmetry 2017, 9(9), 197; https://doi.org/10.3390/sym9090197 - 19 Sep 2017
Cited by 19 | Viewed by 7837
Abstract
Anomaly detection systems, also known as intrusion detection systems (IDSs), continuously monitor network traffic aiming to identify malicious actions. Extensive research has been conducted to build efficient IDSs emphasizing two essential characteristics. The first is concerned with finding optimal feature selection, while another [...] Read more.
Anomaly detection systems, also known as intrusion detection systems (IDSs), continuously monitor network traffic aiming to identify malicious actions. Extensive research has been conducted to build efficient IDSs emphasizing two essential characteristics. The first is concerned with finding optimal feature selection, while another deals with employing robust classification schemes. However, the advent of big data concepts in anomaly detection domain and the appearance of sophisticated network attacks in the modern era require some fundamental methodological revisions to develop IDSs. Therefore, we first identify two more significant characteristics in addition to the ones mentioned above. These refer to the need for employing specialized big data processing frameworks and utilizing appropriate datasets for validating system’s performance, which is largely overlooked in existing studies. Afterwards, we set out to develop an anomaly detection system that comprehensively follows these four identified characteristics, i.e., the proposed system (i) performs feature ranking and selection using information gain and automated branch-and-bound algorithms respectively; (ii) employs logistic regression and extreme gradient boosting techniques for classification; (iii) introduces bulk synchronous parallel processing to cater computational requirements of high-speed big data networks; and; (iv) uses the Infromation Security Centre of Excellence, of the University of Brunswick real-time contemporary dataset for performance evaluation. We present experimental results that verify the efficacy of the proposed system. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

2368 KiB  
Article
A Robust Method for Finding the Automated Best Matched Genes Based on Grouping Similar Fragments of Large-Scale References for Genome Assembly
by Jaehee Jung, Jong Im Kim, Young-Sik Jeong and Gangman Yi
Symmetry 2017, 9(9), 192; https://doi.org/10.3390/sym9090192 - 13 Sep 2017
Cited by 8 | Viewed by 4672
Abstract
Big data research on genomic sequence analysis has accelerated considerably with the development of next-generation sequencing. Currently, research on genomic sequencing has been conducted using various methods, ranging from the assembly of reads consisting of fragments to the annotation of genetic information using [...] Read more.
Big data research on genomic sequence analysis has accelerated considerably with the development of next-generation sequencing. Currently, research on genomic sequencing has been conducted using various methods, ranging from the assembly of reads consisting of fragments to the annotation of genetic information using a database that contains known genome information. According to the development, most tools to analyze the new organelles’ genetic information requires different input formats such as FASTA, GeneBank (GB) and tab separated files. The various data formats should be modified to satisfy the requirements of the gene annotation system after genome assembly. In addition, the currently available tools for the analysis of organelles are usually developed only for specific organisms, thus the need for gene prediction tools, which are useful for any organism, has been increased. The proposed method—termed the genome_search_plotter—is designed for the easy analysis of genome information from the related references without any file format modification. Anyone who is interested in intracellular organelles such as the nucleus, chloroplast, and mitochondria can analyze the genetic information using the assembled contig of an unknown genome and a reference model without any modification of the data from the assembled contig. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

2395 KiB  
Article
An Efficient and Energy-Aware Cloud Consolidation Algorithm for Multimedia Big Data Applications
by JongBeom Lim, HeonChang Yu and Joon-Min Gil
Symmetry 2017, 9(9), 184; https://doi.org/10.3390/sym9090184 - 06 Sep 2017
Cited by 8 | Viewed by 4919
Abstract
It is well known that cloud computing has many potential advantages over traditional distributed systems. Many enterprises can build their own private cloud with open source infrastructure as a service (IaaS) frameworks. Since enterprise applications and data are migrating to private cloud, the [...] Read more.
It is well known that cloud computing has many potential advantages over traditional distributed systems. Many enterprises can build their own private cloud with open source infrastructure as a service (IaaS) frameworks. Since enterprise applications and data are migrating to private cloud, the performance of cloud computing environments is of utmost importance for both cloud providers and users. To improve the performance, previous studies on cloud consolidation have been focused on live migration of virtual machines based on resource utilization. However, the approaches are not suitable for multimedia big data applications. In this paper, we reveal the performance bottleneck of multimedia big data applications in cloud computing environments and propose a cloud consolidation algorithm that considers application types. We show that our consolidation algorithm outperforms previous approaches. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

2685 KiB  
Article
Using Knowledge Transfer and Rough Set to Predict the Severity of Android Test Reports via Text Mining
by Shikai Guo, Rong Chen and Hui Li
Symmetry 2017, 9(8), 161; https://doi.org/10.3390/sym9080161 - 17 Aug 2017
Cited by 19 | Viewed by 5171
Abstract
Crowdsourcing is an appealing and economic solution to software application testing because of its ability to reach a large international audience. Meanwhile, crowdsourced testing could have brought a lot of bug reports. Thus, in crowdsourced software testing, the inspection of a large number [...] Read more.
Crowdsourcing is an appealing and economic solution to software application testing because of its ability to reach a large international audience. Meanwhile, crowdsourced testing could have brought a lot of bug reports. Thus, in crowdsourced software testing, the inspection of a large number of test reports is an enormous but essential software maintenance task. Therefore, automatic prediction of the severity of crowdsourced test reports is important because of their high numbers and large proportion of noise. Most existing approaches to this problem utilize supervised machine learning techniques, which often require users to manually label a large number of training data. However, Android test reports are not labeled with their severity level, and manual labeling is time-consuming and labor-intensive. To address the above problems, we propose a Knowledge Transfer Classification (KTC) approach based on text mining and machine learning methods to predict the severity of test reports. Our approach obtains training data from bug repositories and uses knowledge transfer to predict the severity of Android test reports. In addition, our approach uses an Importance Degree Reduction (IDR) strategy based on rough set to extract characteristic keywords to obtain more accurate reduction results. The results of several experiments indicate that our approach is beneficial for predicting the severity of android test reports. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

19024 KiB  
Article
A Case Study on Iteratively Assessing and Enhancing Wearable User Interface Prototypes
by Hyoseok Yoon, Se-Ho Park, Kyung-Taek Lee, Jung Wook Park, Anind K. Dey and SeungJun Kim
Symmetry 2017, 9(7), 114; https://doi.org/10.3390/sym9070114 - 10 Jul 2017
Cited by 9 | Viewed by 8019
Abstract
Wearable devices are being explored and investigated as a promising computing platform as well as a source of personal big data for the post smartphone era. To deal with a series of rapidly developed wearable prototypes, a well-structured strategy is required to assess [...] Read more.
Wearable devices are being explored and investigated as a promising computing platform as well as a source of personal big data for the post smartphone era. To deal with a series of rapidly developed wearable prototypes, a well-structured strategy is required to assess the prototypes at various development stages. In this paper, we first design and develop variants of advanced wearable user interface prototypes, including joystick-embedded, potentiometer-embedded, motion-gesture and contactless infrared user interfaces for rapidly assessing hands-on user experience of potential futuristic user interfaces. To achieve this goal systematically, we propose a conceptual test framework and present a case study of using the proposed framework in an iterative cyclic process to prototype, test, analyze, and refine the wearable user interface prototypes. We attempt to improve the usability of the user interface prototypes by integrating initial user feedback into the leading phase of the test framework. In the following phase of the test framework, we track signs of improvements through the overall results of usability assessments, task workload assessments and user experience evaluation of the prototypes. The presented comprehensive and in-depth case study demonstrates that the iterative approach employed by the test framework was effective in assessing and enhancing the prototypes, as well as gaining insights on potential applications and establishing practical guidelines for effective and usable wearable user interface development. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

Other

Jump to: Editorial, Research

5818 KiB  
Project Report
A Study on Big Data Thinking of the Internet of Things-Based Smart-Connected Car in Conjunction with Controller Area Network Bus and 4G-Long Term Evolution
by Donghwoon Kwon, Suwoo Park and Jeong-Tak Ryu
Symmetry 2017, 9(8), 152; https://doi.org/10.3390/sym9080152 - 09 Aug 2017
Cited by 19 | Viewed by 10140
Abstract
A smart connected car in conjunction with the Internet of Things (IoT) is an emerging topic. The fundamental concept of the smart connected car is connectivity, and such connectivity can be provided by three aspects, such as Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I), and Vehicle-to-Everything [...] Read more.
A smart connected car in conjunction with the Internet of Things (IoT) is an emerging topic. The fundamental concept of the smart connected car is connectivity, and such connectivity can be provided by three aspects, such as Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I), and Vehicle-to-Everything (V2X). To meet the aspects of V2V and V2I connectivity, we developed modules in accordance with international standards with respect to On-Board Diagnostics II (OBDII) and 4G Long Term Evolution (4G-LTE) to obtain and transmit vehicle information. We also developed software to visually check information provided by our modules. Information related to a user’s driving, which is transmitted to a cloud-based Distributed File System (DFS), was then analyzed for the purpose of big data analysis to provide information on driving habits to users. Yet, since this work is an ongoing research project, we focus on proposing an idea of system architecture and design in terms of big data analysis. Therefore, our contributions through this work are as follows: (1) Develop modules based on Controller Area Network (CAN) bus, OBDII, and 4G-LTE; (2) Develop software to check vehicle information on a PC; (3) Implement a database related to vehicle diagnostic codes; (4) Propose system architecture and design for big data analysis. Full article
(This article belongs to the Special Issue Emerging Approaches and Advances in Big Data)
Show Figures

Figure 1

Back to TopTop