Application of Machine Learning in Big Data

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 August 2023) | Viewed by 6587

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Engineering and Information Security, International Information Technology University, Almaty, Kazakhstan
Interests: big data; data mining; machine learning; WSNs; cloud computing; network security; ambient intelligence
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science at College of Charleston, Charleston, SC, USA
Interests: blockchains; cyber security; machine learning; smart-grid; vehicular ad hoc networks

E-Mail Website
Guest Editor
School of Information Security and Applied Computing, College of Engineering & Technology, Eastern Michigan University, Ypsilanti, MI 48197, USA
Interests: microelectronics/hardware assisted security; emerging IoT and connected autonomous systems security; security and privacy of smart building and spaces in modern smart cities environment; trusted next generations smart power grid networks
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Computing and Information Technology, University of Tabuk, Tabuk, Saudi Arabia
Interests: cybersecurity; internet of things; blockchain; artificial intelligence

E-Mail Website
Guest Editor
College of Computing and Information Technology, Shaqra University, Shaqra, Saudi Arabia
Interests: machine learning; deep learning; cybersecurity; IoT
Information Sciences and Technology, The Pennsylvania State University, State College, PA 16801, USA
Interests: network security; cybersecurity; secure cloud computing; IoT; big data; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science & Engineering, University of Bridgeport, Bridgeport, CT 06604, USA
Interests: neural network; cybersecurity; secure cloud computing; IoT; big data; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Big data are data that are large and complex and that cannot be processed with traditional methods. Therefore, big data require powerful machine learning models, methods, and algorithms to improve decision-making skills. The use of machine learning with big data can boost several commercial and home applications including health, education, tourism, security, crime control, surveillance, etc.

With the support of authoritative machine learning algorithms, big data can handle and incorporate features of IoT devices to provide inter-connectivity for the media industry, the government, and global industries. Machine learning can detect terrorist activities using big data.

Currently, scalable machine learning architectures can provide faster fault detection to provide the services without interruption. However, there are several challenges of big data to be addressed, such as handling the continuous data growth problems, misperception with big data tool selection, securing data, integrating the data from several different sources, multi-dimensional and multi-variety data issues, etc. These issues can be addressed with properties of machine learning using deep learning, convolutional neural networks, supervised learning, unsupervised learning, reinforcement learning, etc. Thus, the main goal of this Special Issue is to invite high-quality submissions that should consist of original and novel research on the data-driven algorithms for big data, machine intelligence for big data, machine learning classifiers on big data for healthcare fast response, quantum-enhanced machine learning for IoT, drug discovery and toxicology in big data. Additionally, attention will be paid to several big data industry-driven machine learning algorithms.

Prof. Dr. Abdul Razaque
Dr. Mohamed Baza
Dr. Fathi Amsaad
Dr. Bandar Alotaibi
Dr. Munif Alotaibi
Dr. Syed Rizvi
Dr. Mohamed Ben Haj Frej
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

 

Keywords

  • data-driven algorithms for big data
  • supervised learning, unsupervised learning, and reinforcement learning for big data
  • machine learning models for human interaction
  • deep learning for V-6
  • big data principles for machine learning
  • machine intelligence for big data
  • machine learning models for IoT
  • deep learning revolution for big data
  • automatic speech recognition for terrorist detection in big data
  • drug discovery and toxicology in big data
  • deep learning applications in big data analytics
  • anomalies and statistical binary classification for big data
  • impact of neural networks and tensorflow for big data
  • big data barriers in machine learning
  • machine learning classifiers on big data for healthcare fast response
  • quantum-enhanced machine learning for IoT
  • fake news detection from big data using deep learning approaches
  • scalable architectures for parallel big data processing
  • big data analytics for fault detection
  • deployment of big data analytics techniques for policymaking

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 3796 KiB  
Article
Enhancing Workplace Safety: PPE_Swin—A Robust Swin Transformer Approach for Automated Personal Protective Equipment Detection
by Mudassar Riaz, Jianbiao He, Kai Xie, Hatoon S. Alsagri, Syed Atif Moqurrab, Haya Abdullah A. Alhakbani and Waeal J. Obidallah
Electronics 2023, 12(22), 4675; https://doi.org/10.3390/electronics12224675 - 16 Nov 2023
Viewed by 762
Abstract
Accidents occur in the construction industry as a result of non-compliance with personal protective equipment (PPE). As a result of diverse environments, it is difficult to detect PPE automatically. Traditional image detection models like convolutional neural network (CNN) and vision transformer (ViT) struggle [...] Read more.
Accidents occur in the construction industry as a result of non-compliance with personal protective equipment (PPE). As a result of diverse environments, it is difficult to detect PPE automatically. Traditional image detection models like convolutional neural network (CNN) and vision transformer (ViT) struggle to capture both local and global features in construction safety. This study introduces a new approach for automating the detection of personal protective equipment (PPE) in the construction industry, called PPE_Swin. By combining global and local feature extraction using the self-attention mechanism based on Swin-Unet, we address challenges related to accurate segmentation, robustness to image variations, and generalization across different environments. In order to train and evaluate our system, we have compiled a new dataset, which provides more reliable and accurate detection of personal protective equipment (PPE) in diverse construction scenarios. Our approach achieves a remarkable 97% accuracy in detecting workers with and without PPE, surpassing existing state-of-the-art methods. This research presents an effective solution for enhancing worker safety on construction sites by automating PPE compliance detection. Full article
(This article belongs to the Special Issue Application of Machine Learning in Big Data)
Show Figures

Figure 1

19 pages, 879 KiB  
Article
A Fog-Based Privacy-Preserving Federated Learning System for Smart Healthcare Applications
by Maryum Butt, Noshina Tariq, Muhammad Ashraf, Hatoon S. Alsagri, Syed Atif Moqurrab, Haya Abdullah A. Alhakbani and Yousef A. Alduraywish
Electronics 2023, 12(19), 4074; https://doi.org/10.3390/electronics12194074 - 28 Sep 2023
Cited by 2 | Viewed by 1189
Abstract
During the COVID-19 pandemic, the urgency of effective testing strategies had never been more apparent. The fusion of Artificial Intelligence (AI) and Machine Learning (ML) models, particularly within medical imaging (e.g., chest X-rays), holds promise in smart healthcare systems. Deep Learning (DL), a [...] Read more.
During the COVID-19 pandemic, the urgency of effective testing strategies had never been more apparent. The fusion of Artificial Intelligence (AI) and Machine Learning (ML) models, particularly within medical imaging (e.g., chest X-rays), holds promise in smart healthcare systems. Deep Learning (DL), a subset of AI, has exhibited prowess in enhancing classification accuracy, a crucial aspect in expediting COVID-19 diagnosis. However, the journey to harness DL’s potential is rife with challenges: notably, the intricate landscape of medical data privacy. Striking a balance between utilizing patient data for insights while upholding privacy is formidable. Federated Learning (FL) emerges as a solution by enabling collaborative model training across decentralized data sources, thus bypassing data centralization and preserving data privacy. This study presents a tailored, collaborative FL architecture for COVID-19 screening via chest X-ray images. Designed to facilitate cooperation among medical institutions, the framework ensures patient data remain localized, eliminating the need for direct data sharing. Addressing imbalanced and non-identically distributed data, the architecture is a robust solution. Implementation entails localized and fog-computing-based FL models. Localized models utilize Convolutional Neural Networks (CNNs) on institution-specific datasets, while the FL model, refined iteratively, takes precedence in the final classification. Intriguingly, the global FL model, fortified by fog computing, emerges as the frontrunner in classification after weight refinement, surpassing local models. Validation within the COLAB platform gauges the model’s performance through metrics such as accuracy, precision, recall, and F1-score. Remarkably, the proposed model excels across these metrics, solidifying its efficacy. This research navigates the confluence of AI, FL, and medical imaging, unveiling insights that could reshape healthcare delivery. The study enriches scientific discourse by addressing data privacy in collaborative learning and carries potential implications for enhanced patient care. Full article
(This article belongs to the Special Issue Application of Machine Learning in Big Data)
Show Figures

Figure 1

20 pages, 847 KiB  
Article
Distributed ItemCF Recommendation Algorithm Based on the Combination of MapReduce and Hive
by Yijia Feng and Lei Wang
Electronics 2023, 12(16), 3398; https://doi.org/10.3390/electronics12163398 - 10 Aug 2023
Viewed by 1021
Abstract
The ItemCF algorithm is currently the most widely used recommendation algorithm in commercial applications. In the early days of recommender systems, most recommendation algorithms were run on a single machine rather than in parallel. This approach, coupled with the rapid growth of massive [...] Read more.
The ItemCF algorithm is currently the most widely used recommendation algorithm in commercial applications. In the early days of recommender systems, most recommendation algorithms were run on a single machine rather than in parallel. This approach, coupled with the rapid growth of massive user behavior data in the current big data era, has led to a bottleneck in improving the execution efficiency of recommender systems. With the vigorous development of distributed technology, distributed ItemCF algorithms have become a research hotspot. Hadoop is a very popular distributed system infrastructure. MapReduce, which provides massive data computing, and Hive, a data warehousing tool, are the two core components of Hadoop, each with its own advantages and applicable scenarios. Scholars have already utilized MapReduce and Hive for the parallelization of the ItemCF algorithm. However, these pieces of literature make use of either MapReduce or Hive alone without fully leveraging the strengths of both. As a result, it has been difficult for parallel ItemCF recommendation algorithms to feature both simple and efficient implementation and high running efficiency. To address this issue, we proposed a distributed ItemCF recommendation algorithm based on the combination of MapReduce and Hive and named it HiMRItemCF. This algorithm divided ItemCF into six steps: deduplication, obtaining the preference matrixes of all users, obtaining the co-occurrence matrixes of all items, multiplying the two matrices to generate a three-dimensional matrix, aggregating the data of the three-dimensional matrix to obtain the recommendation scores of all users for all items, and sorting the scores in descending order, with Hive being used to carry out steps 1 and 6, and MapReduce for the other four steps involving more complex calculations and operations. The Hive jobs and MapReduce jobs are linked through Hive’s external tables. After implementing the proposed algorithm using Java and running the program on three publicly available user shopping behavior datasets, we found that compared to algorithms that only use MapReduce jobs, the program implementing the proposed algorithm has fewer lines of source code, lower cyclomatic complexity and Halstead complexity, and can achieve a higher speedup ratio and parallel computing efficiency when processing all datasets. These experimental results indicate that the parallel and distributed ItemCF algorithm proposed in this paper, which combines MapReduce and Hive, has both the advantages of concise and easy-to-understand code as well as high time efficiency. Full article
(This article belongs to the Special Issue Application of Machine Learning in Big Data)
Show Figures

Figure 1

26 pages, 1311 KiB  
Article
Towards Fake News Detection: A Multivocal Literature Review of Credibility Factors in Online News Stories and Analysis Using Analytical Hierarchical Process
by Muhammad Faisal Abrar, Muhammad Sohail Khan, Inayat Khan, Mohammed ElAffendi and Sadique Ahmad
Electronics 2023, 12(15), 3280; https://doi.org/10.3390/electronics12153280 - 30 Jul 2023
Cited by 2 | Viewed by 1588
Abstract
Information and communication technologies have grown globally in the past two decades, expanding the reach of news networks. However, the credibility of the information is now in question. Credibility refers to a person’s belief in the truth of a subject, and online readers [...] Read more.
Information and communication technologies have grown globally in the past two decades, expanding the reach of news networks. However, the credibility of the information is now in question. Credibility refers to a person’s belief in the truth of a subject, and online readers consider various factors to determine whether a source is trustworthy. Credibility significantly impacts public behaviour, and less credible news spreads faster due to people’s interest in emotions like fear and disgust. This can have negative consequences for individuals and economies. To determine the credibility factors in digital news stories, a Multivocal Literature Review (MLR) was conducted to identify relevant studies in both white and grey literature. A total of 161 primary studies were identified from published (white) literature and 61 were identified from unpublished (grey) literature. As a result, 14 credibility factors were identified, including “number of views”, “reporter reputations”, “source information”, and “impartiality”. These factors were then analysed using statistical tests and the Analytic Hierarchy Process (AHP) for decision-making to determine their criticality and importance in different domains. Full article
(This article belongs to the Special Issue Application of Machine Learning in Big Data)
Show Figures

Figure 1

27 pages, 4989 KiB  
Article
Quality of Service Generalization using Parallel Turing Integration Paradigm to Support Machine Learning
by Abdul Razaque, Mohamed Ben Haj Frej, Gulnara Bektemyssova, Muder Almi’ani, Fathi Amsaad, Aziz Alotaibi, Noor Z. Jhanjhi, Mohsin Ali, Saule Amanzholova and Majid Alshammari
Electronics 2023, 12(5), 1129; https://doi.org/10.3390/electronics12051129 - 25 Feb 2023
Cited by 2 | Viewed by 1114
Abstract
The Quality-of-Service (QoS) provision in machine learning is affected by lesser accuracy, noise, random error, and weak generalization (ML). The Parallel Turing Integration Paradigm (PTIP) is introduced as a solution to lower accuracy and weak generalization. A logical table (LT) is part of [...] Read more.
The Quality-of-Service (QoS) provision in machine learning is affected by lesser accuracy, noise, random error, and weak generalization (ML). The Parallel Turing Integration Paradigm (PTIP) is introduced as a solution to lower accuracy and weak generalization. A logical table (LT) is part of the PTIP and is used to store datasets. The PTIP has elements that enhance classifier learning, enhance 3-D cube logic for security provision, and balance the engineering process of paradigms. The probability weightage function for adding and removing algorithms during the training phase is included in the PTIP. Additionally, it uses local and global error functions to limit overconfidence and underconfidence in learning processes. By utilizing the local gain (LG) and global gain (GG), the optimization of the model’s constituent parts is validated. By blending the sub-algorithms with a new dataset in a foretelling and realistic setting, the PTIP validation is further ensured. A mathematical modeling technique is used to ascertain the efficacy of the proposed PTIP. The results of the testing show that the proposed PTIP obtains lower relative accuracy of 38.76% with error bounds reflection. The lower relative accuracy with low GG is considered good. The PTIP also obtains 70.5% relative accuracy with high GG, which is considered an acceptable accuracy. Moreover, the PTIP gets better accuracy of 99.91% with a 100% fitness factor. Finally, the proposed PTIP is compared with cutting-edge, well-established models and algorithms based on different state-of-the-art parameters (e.g., relative accuracy, accuracy with fitness factor, fitness process, error reduction, and generalization measurement). The results confirm that the proposed PTIP demonstrates better results as compared to contending models and algorithms. Full article
(This article belongs to the Special Issue Application of Machine Learning in Big Data)
Show Figures

Figure 1

Back to TopTop