Cloud Computing for Big Data Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 August 2021) | Viewed by 17286

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronics, Computer Science and System Sciences (DIMES), University of Calabria Via Pietro Bucci – Cubo 41C (5th floor), 87036 Rende (CS), Italy
Interests: cloud computing; social media and Big Data analysis; distributed knowledge discovery; data mining
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

It is our pleasure to announce the opening of a new Special Issue in Applied Science. The main topic for the Issue is cloud computing for big data analysis.

In the era of the Internet of Things, huge amounts of digital data are generated and collected by several sources, such as sensors, mobile devices, and social media. This huge amount of data, commonly referred as big data, represents a challenge for the current storage, process, and analysis capabilities.

Novel technologies, architectures, and algorithms have been and are being developed to capture and analyze big data. For example, in the scientific and business fields, researchers and data scientists are analyzing big data to extract information and knowledge useful for making new discoveries and supporting decision processes. 

In this context, cloud computing is a valid and cost-effective solution for supporting big data storage and executing data analytic applications. Due to elastic resource allocation and high computing power, cloud computing represents a compelling solution for big data analytics, allowing faster data analysis, resulting in more timely results and then greater data value.

From this perspective, this Special Issue aims to contribute to the field, presenting the most relevant advances in this research area.

The following are some of the topics proposed for this Special Issue (but not limited to):

  • Programming models and algorithms for distributed computing environments;
  • Systems for data processing on cloud platforms;
  • Data analysis workflows for distributed environments;
  • Scalable data mining algorithms;
  • Programming models and scalable algorithms for big data;
  • Big data analytics and applications;
  • Applications of machine learning in big data;
  • Cloud-based data mining applications; and
  • Libraries, algorithms, and applications for big social data analysis.

We hope you will contribute your high quality research and we look forward to reading your results.

Dr. Fabrizio Marozzo
Dr. Loris Belcastro
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Cloud computing
  • Big data
  • Scalable data mining
  • Data analysis workflows
  • Social media analysis
  • Parallel and distributed algorithms
  • High performance computing
  • Machine learning applications

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

4 pages, 183 KiB  
Editorial
Cloud Computing for Big Data Analysis
by Fabrizio Marozzo and Loris Belcastro
Appl. Sci. 2022, 12(20), 10567; https://doi.org/10.3390/app122010567 - 19 Oct 2022
Cited by 3 | Viewed by 1514
Abstract
With the spread of the Internet of Things, large amounts of digital data are generated and collected from different sources, such as sensors, cameras, in-vehicle infotainment, smart meters, mobile devices, applications, and web services [...] Full article
(This article belongs to the Special Issue Cloud Computing for Big Data Analysis)
Show Figures

Graphical abstract

Research

Jump to: Editorial

14 pages, 396 KiB  
Article
Knowledge Discovery from Large Amounts of Social Media Data
by Loris Belcastro, Riccardo Cantini and Fabrizio Marozzo
Appl. Sci. 2022, 12(3), 1209; https://doi.org/10.3390/app12031209 - 24 Jan 2022
Cited by 7 | Viewed by 3254
Abstract
In recent years, social media analysis is arousing great interest in various scientific fields, such as sociology, political science, linguistics, and computer science. Large amounts of data gathered from social media are widely analyzed for extracting useful information concerning people’s behaviors and interactions. [...] Read more.
In recent years, social media analysis is arousing great interest in various scientific fields, such as sociology, political science, linguistics, and computer science. Large amounts of data gathered from social media are widely analyzed for extracting useful information concerning people’s behaviors and interactions. In particular, they can be exploited to analyze the collective sentiment of people, understand the behavior of user groups during global events, monitor public opinion close to important events, identify the main topics in a public discussion, or detect the most frequent routes followed by social media users. As an example of the countless works in the state-of-the-art on social media analysis, this paper presents three significant applications in the field of opinion and pattern mining from social media data: (i) an automatic application for discovering user mobility patterns, (ii) a novel application for estimating the political polarization of public opinion, and (iii) an application for discovering interesting social media discussion topics through a hashtag recommendation system. Such applications clearly highlight the abundance and wealth of useful information in many application contexts of human life that can be extracted from social media posts. Full article
(This article belongs to the Special Issue Cloud Computing for Big Data Analysis)
Show Figures

Figure 1

12 pages, 933 KiB  
Article
A Computational Intelligence Approach to Predict Energy Demand Using Random Forest in a Cloudera Cluster
by Laura Cáceres, Jose Ignacio Merino and Norberto Díaz-Díaz
Appl. Sci. 2021, 11(18), 8635; https://doi.org/10.3390/app11188635 - 17 Sep 2021
Cited by 7 | Viewed by 2252
Abstract
Society’s energy consumption has shot up in recent years, making the prediction of its demand a current challenge to ensure an efficient and responsible use. Artificial intelligence techniques have proven to be potential tools in handling tedious tasks and making sense of large-scale [...] Read more.
Society’s energy consumption has shot up in recent years, making the prediction of its demand a current challenge to ensure an efficient and responsible use. Artificial intelligence techniques have proven to be potential tools in handling tedious tasks and making sense of large-scale data to make better business decisions in different areas of knowledge. In this article, the use of random forests algorithms in a Big Data environment is proposed for household energy demand forecasting. The predictions are based on the use of information from different sources, confirming a fundamental role of socioeconomic data in consumer’s behaviours. On the other hand, the use of Big Data architectures is proposed to perform horizontal and vertical scaling of the solution to be used in real environments. Finally, a tool for high-resolution predictions with great efficiency is introduced, which enables energy management in a very accurate way. Full article
(This article belongs to the Special Issue Cloud Computing for Big Data Analysis)
Show Figures

Figure 1

13 pages, 526 KiB  
Article
Employing Vertical Elasticity for Efficient Big Data Processing in Container-Based Cloud Environments
by Jin-young Choi, Minkyoung Cho and Jik-Soo Kim
Appl. Sci. 2021, 11(13), 6200; https://doi.org/10.3390/app11136200 - 04 Jul 2021
Cited by 5 | Viewed by 2122
Abstract
Recently, “Big Data” platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly. In order to effectively manage these Big Data, Cloud Computing has been playing an important role by providing [...] Read more.
Recently, “Big Data” platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly. In order to effectively manage these Big Data, Cloud Computing has been playing an important role by providing scalable data storage and computing resources for competitive and economical Big Data processing. Accordingly, server virtualization technologies that are the cornerstone of Cloud Computing have attracted a lot of research interests. However, conventional hypervisor-based virtualization can cause performance degradation problems due to its heavily loaded guest operating systems and rigid resource allocations. On the other hand, container-based virtualization technology can provide the same level of service faster with a lightweight capacity by effectively eliminating the guest OS layers. In addition, container-based virtualization enables efficient cloud resource management by dynamically adjusting the allocated computing resources (e.g., CPU and memory) during the runtime through “Vertical Elasticity”. In this paper, we present our practice and experience of employing an adaptive resource utilization scheme for Big Data workloads in container-based cloud environments by leveraging the vertical elasticity of Docker, a representative container-based virtualization technique. We perform extensive experiments running several Big Data workloads on representative Big Data platforms: Apache Hadoop and Spark. During the workload executions, our adaptive resource utilization scheme periodically monitors the resource usage patterns of running containers and dynamically adjusts allocated computing resources that could result in substantial improvements in the overall system throughput. Full article
(This article belongs to the Special Issue Cloud Computing for Big Data Analysis)
Show Figures

Figure 1

13 pages, 1562 KiB  
Article
Benchmarking and Performance Evaluations on Various Configurations of Virtual Machine and Containers for Cloud-Based Scientific Workloads
by Syed Asif Raza Shah, Ahmad Waqas, Moon-Hyun Kim, Tae-Hyung Kim, Heejun Yoon and Seo-Young Noh
Appl. Sci. 2021, 11(3), 993; https://doi.org/10.3390/app11030993 - 22 Jan 2021
Cited by 9 | Viewed by 2813
Abstract
Cloud computing manages system resources such as processing, storage, and networking by providing users with multiple virtual machines (VMs) as needed. It is one of the rapidly growing fields that come with huge computational power for scientific workloads. Currently, the scientific community is [...] Read more.
Cloud computing manages system resources such as processing, storage, and networking by providing users with multiple virtual machines (VMs) as needed. It is one of the rapidly growing fields that come with huge computational power for scientific workloads. Currently, the scientific community is ready to work over the cloud as it is considered as a resource-rich paradigm. The traditional way of executing scientific workloads on cloud computing is by using virtual machines. However, the latest emerging concept of containerization is growing more rapidly and gained popularity because of its unique features. Containers are treated as lightweight as compared to virtual machines in cloud computing. In this regard, a few VMs/containers-associated problems of performance and throughput are encountered because of middleware technologies such as virtualization or containerization. In this paper, we introduce the configurations of VMs and containers for cloud-based scientific workloads in order to utilize the technologies to solve scientific problems and handle their workloads. This paper also tackles throughput and efficiency problems related to VMs and containers in the cloud environment and explores efficient resource provisioning by combining four unique methods: hyperthreading (HT), vCPU cores selection, vCPU affinity, and isolation of vCPUs. The HEPSCPEC06 benchmark suite is used to evaluate the throughput and efficiency of VMs and containers. The proposed solution is to implement four basic techniques to reduce the effect of virtualization and containerization. Additionally, these techniques are used to make virtual machines and containers more effective and powerful for scientific workloads. The results show that allowing hyperthreading, isolation of CPU cores, proper numbering, and allocation of vCPU cores can improve the throughput and performance of virtual machines and containers. Full article
(This article belongs to the Special Issue Cloud Computing for Big Data Analysis)
Show Figures

Figure 1

16 pages, 2520 KiB  
Article
Spatiotemporal Analysis of Web News Archives for Crime Prediction
by Areeba Umair, Muhammad Shahzad Sarfraz, Muhammad Ahmad, Usman Habib, Muhammad Habib Ullah and Manuel Mazzara
Appl. Sci. 2020, 10(22), 8220; https://doi.org/10.3390/app10228220 - 20 Nov 2020
Cited by 23 | Viewed by 3427
Abstract
In today’s world, security is the most prominent aspect which has been given higher priority. Despite the rapid growth and usage of digital devices, lucrative measurement of crimes in under-developing countries is still challenging. In this work, unstructural crime data (900 records) from [...] Read more.
In today’s world, security is the most prominent aspect which has been given higher priority. Despite the rapid growth and usage of digital devices, lucrative measurement of crimes in under-developing countries is still challenging. In this work, unstructural crime data (900 records) from the news archives of the previous eight years were extracted to predict the behavior of criminals’ networks and transform it into useful information using natural language processing (NLP). To estimate the next move of criminals in Pakistan, we performed hotspot-based spatial analysis. Later, this information is fed to two different classifiers for possible identification and prediction. We achieved the maximum accuracy of 92% using K-Nearest Neighbor (KNN) and 62% using the Random Forest algorithm. In terms of crimes, the results showed that the most prevalent crime events are robberies. Thus, the usage of digital information archives, spatial analysis, and machine learning techniques can open new ways of handling a peaceful and sustainable society in eradicating crimes for countries having paucity of financial resources. Full article
(This article belongs to the Special Issue Cloud Computing for Big Data Analysis)
Show Figures

Figure 1

Back to TopTop