Machine Learning Algorithms for Big Data Analysis

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Evolutionary Algorithms and Machine Learning".

Deadline for manuscript submissions: closed (1 September 2023) | Viewed by 11201

Special Issue Editor

Los Alamos National Laboratory, Los Alamos, NM 87544, USA
Interests: visualization; data analysis; graphics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The utilization of machine learning-based approaches has seen a tremendous growth in the past decade, and these methods continue to attract extensive of attention as we enter the exascale era. With more powerful machines and equipment being developed, we continue to generate (e.g., via computer simulations) and collect (e.g., via powerful telescopes) massive amounts of data that require thorough analysis. Machine learning methods (ranging from SVM to deep neural networks) hold great promise for use in aiding in big data analysis workflows. They can assist researchers in tasks such as data reduction, prediction, and feature detection, to name but a few.

In this Special Issue, we invite researchers to submit their authentic and innovative new algorithms and approaches for use in the handling, processing, and analysis of large-scale data sets via machine learning methods.

Dr. Ayan Biswas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • streaming
  • parallel
  • computer vision
  • image processing
  • machine learning
  • big data analysis

Related Special Issue

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 3369 KiB  
Article
Representing and Inferring Massive Network Traffic Condition: A Case Study in Nashville, Tennessee
by Hairuilong Zhang, Yangsong Gu and Lee D. Han
Algorithms 2023, 16(10), 485; https://doi.org/10.3390/a16100485 - 19 Oct 2023
Viewed by 1151
Abstract
Intelligent transportation systems (ITSs) usually require monitoring of massive road networks and gathering traffic data at a high spatial and temporal resolution. This leads to the accumulation of substantial data volumes, necessitating the development of more concise data representations. Approaches like principal component [...] Read more.
Intelligent transportation systems (ITSs) usually require monitoring of massive road networks and gathering traffic data at a high spatial and temporal resolution. This leads to the accumulation of substantial data volumes, necessitating the development of more concise data representations. Approaches like principal component analysis (PCA), which operate within subspaces, can construct precise low-dimensional models. However, interpreting these models can be challenging, primarily because the principal components often encompass a multitude of links within the traffic network. To overcome this issue, this study presents a novel approach for representing and indexing network traffic conditions through weighted CUR matrix decomposition integrated with clustering analysis. The proposed approach selects a subset group of detectors from the original network to represent and index traffic condition through a matrix decomposition method, allowing for more efficient management and analysis. The proposed method is evaluated using traffic detector data from the city of Nashville, TN. The results demonstrate that the approach is effective in representing and indexing network traffic conditions, with high accuracy and efficiency. Overall, this study contributes to the field of network traffic monitoring by proposing a novel approach for representing massive traffic networks and exploring the effects of incorporating clustering into CUR decomposition. The proposed approach can help traffic analysts and practitioners to more efficiently manage and analyze traffic conditions, ultimately leading to more effective transportation systems. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

26 pages, 822 KiB  
Article
Data-Driven Analysis of Student Engagement in Time-Limited Computer Laboratories
by Luca Cagliero, Lorenzo Canale and Laura Farinetti
Algorithms 2023, 16(10), 464; https://doi.org/10.3390/a16100464 - 02 Oct 2023
Viewed by 1010
Abstract
Computer laboratories are learning environments where students learn programming languages by practicing under teaching assistants’ supervision. This paper presents the outcomes of a real case study carried out in our university in the context of a database course, where learning SQL is one [...] Read more.
Computer laboratories are learning environments where students learn programming languages by practicing under teaching assistants’ supervision. This paper presents the outcomes of a real case study carried out in our university in the context of a database course, where learning SQL is one of the main topics. The aim of the study is to analyze the level of engagement of the laboratory participants by tracing and correlating the accesses of the students to each laboratory exercise, the successful/failed attempts to solve the exercises, the students’ requests for help, and the interventions of teaching assistants. The acquired data are analyzed by means of a sequence pattern mining approach, which automatically discovers recurrent temporal patterns. The mined patterns are mapped to behavioral, cognitive engagement, and affective key indicators, thus allowing students to be profiled according to their level of engagement in all the identified dimensions. To efficiently extract the desired indicators, the mining algorithm enforces ad hoc constraints on the pattern categories of interest. The student profiles and the correlations among different engagement dimensions extracted from the experimental data have been shown to be helpful for the planning of future learning experiences. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

18 pages, 362 KiB  
Article
Nonsmooth Optimization-Based Hyperparameter-Free Neural Networks for Large-Scale Regression
by Napsu Karmitsa, Sona Taheri, Kaisa Joki, Pauliina Paasivirta, Adil M. Bagirov and Marko M. Mäkelä
Algorithms 2023, 16(9), 444; https://doi.org/10.3390/a16090444 - 14 Sep 2023
Cited by 1 | Viewed by 1013
Abstract
In this paper, a new nonsmooth optimization-based algorithm for solving large-scale regression problems is introduced. The regression problem is modeled as fully-connected feedforward neural networks with one hidden layer, piecewise linear activation, and the L1-loss functions. A modified version of the [...] Read more.
In this paper, a new nonsmooth optimization-based algorithm for solving large-scale regression problems is introduced. The regression problem is modeled as fully-connected feedforward neural networks with one hidden layer, piecewise linear activation, and the L1-loss functions. A modified version of the limited memory bundle method is applied to minimize this nonsmooth objective. In addition, a novel constructive approach for automated determination of the proper number of hidden nodes is developed. Finally, large real-world data sets are used to evaluate the proposed algorithm and to compare it with some state-of-the-art neural network algorithms for regression. The results demonstrate the superiority of the proposed algorithm as a predictive tool in most data sets used in numerical experiments. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

14 pages, 15445 KiB  
Article
PigSNIPE: Scalable Neuroimaging Processing Engine for Minipig MRI
by Michal Brzus, Kevin Knoernschild, Jessica C. Sieren and Hans J. Johnson
Algorithms 2023, 16(2), 116; https://doi.org/10.3390/a16020116 - 15 Feb 2023
Cited by 1 | Viewed by 1266
Abstract
Translation of basic animal research to find effective methods of diagnosing and treating human neurological disorders requires parallel analysis infrastructures. Small animals such as mice provide exploratory animal disease models. However, many interventions developed using small animal models fail to translate to human [...] Read more.
Translation of basic animal research to find effective methods of diagnosing and treating human neurological disorders requires parallel analysis infrastructures. Small animals such as mice provide exploratory animal disease models. However, many interventions developed using small animal models fail to translate to human use due to physical or biological differences. Recently, large-animal minipigs have emerged in neuroscience due to both their brain similarity and economic advantages. Medical image processing is a crucial part of research, as it allows researchers to monitor their experiments and understand disease development. By pairing four reinforcement learning models and five deep learning UNet segmentation models with existing algorithms, we developed PigSNIPE, a pipeline for the automated handling, processing, and analyzing of large-scale data sets of minipig MR images. PigSNIPE allows for image registration, AC-PC alignment, detection of 19 anatomical landmarks, skull stripping, brainmask and intracranial volume segmentation (DICE 0.98), tissue segmentation (DICE 0.82), and caudate-putamen brain segmentation (DICE 0.8) in under two minutes. To the best of our knowledge, this is the first automated pipeline tool aimed at large animal images, which can significantly reduce the time and resources needed for analyzing minipig neuroimages. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

14 pages, 2372 KiB  
Article
Nemesis: Neural Mean Teacher Learning-Based Emotion-Centric Speaker
by Aryan Yousefi and Kalpdrum Passi
Algorithms 2023, 16(2), 97; https://doi.org/10.3390/a16020097 - 09 Feb 2023
Viewed by 1214
Abstract
Image captioning is the multi-modal task of automatically describing a digital image based on its contents and their semantic relationship. This research area has gained increasing popularity over the past few years; however, most of the previous studies have been focused on purely [...] Read more.
Image captioning is the multi-modal task of automatically describing a digital image based on its contents and their semantic relationship. This research area has gained increasing popularity over the past few years; however, most of the previous studies have been focused on purely objective content-based descriptions of the image scenes. In this study, efforts have been made to generate more engaging captions by leveraging human-like emotional responses. To achieve this task, a mean teacher learning-based method has been applied to the recently introduced ArtEmis dataset. ArtEmis is the first large-scale dataset for emotion-centric image captioning, containing 455K emotional descriptions of 80K artworks from WikiArt. This method includes a self-distillation relationship between memory-augmented language models with meshed connectivity. These language models are trained in a cross-entropy phase and then fine-tuned in a self-critical sequence training phase. According to various popular natural language processing metrics, such as BLEU, METEOR, ROUGE-L, and CIDEr, our proposed model has obtained a new state of the art on ArtEmis. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

11 pages, 893 KiB  
Communication
Parallel Algorithm for Connected-Component Analysis Using CUDA
by Dominic Windisch, Christian Kaever, Guido Juckeland and André Bieberle
Algorithms 2023, 16(2), 80; https://doi.org/10.3390/a16020080 - 01 Feb 2023
Cited by 3 | Viewed by 1854
Abstract
In this article, we introduce a parallel algorithm for connected-component analysis (CCA) on GPUs which drastically reduces the volume of data to transfer from GPU to the host. CCA algorithms targeting GPUs typically store the extracted features in arrays large enough to potentially [...] Read more.
In this article, we introduce a parallel algorithm for connected-component analysis (CCA) on GPUs which drastically reduces the volume of data to transfer from GPU to the host. CCA algorithms targeting GPUs typically store the extracted features in arrays large enough to potentially hold the maximum possible number of objects for the given image size. Transferring these large arrays to the host requires large portions of the overall execution time. Therefore, we propose an algorithm which uses a CUDA kernel to merge trees of connected component feature structs. During the tree merging, various connected-component properties, such as total area, centroid and bounding box, are extracted and accumulated. The tree structure then enables us to only transfer features of valid objects to the host for further processing or storing. Our benchmarks show that this implementation significantly reduces memory transfer volume for processing results on the host whilst maintaining similar performance to state-of-the-art CCA algorithms. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 3370 KiB  
Review
Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and Applications
by Kara Combs, Hongjing Lu and Trevor J. Bihl
Algorithms 2023, 16(3), 146; https://doi.org/10.3390/a16030146 - 07 Mar 2023
Cited by 2 | Viewed by 2509
Abstract
Artificial intelligence and machine learning (AI/ML) research has aimed to achieve human-level performance in tasks that require understanding and decision making. Although major advances have been made, AI systems still struggle to achieve adaptive learning for generalization. One of the main approaches to [...] Read more.
Artificial intelligence and machine learning (AI/ML) research has aimed to achieve human-level performance in tasks that require understanding and decision making. Although major advances have been made, AI systems still struggle to achieve adaptive learning for generalization. One of the main approaches to generalization in ML is transfer learning, where previously learned knowledge is utilized to solve problems in a different, but related, domain. Another approach, pursued by cognitive scientists for several decades, has investigated the role of analogical reasoning in comparisons aimed at understanding human generalization ability. Analogical reasoning has yielded rich empirical findings and general theoretical principles underlying human analogical inference and generalization across distinctively different domains. Though seemingly similar, there are fundamental differences between the two approaches. To clarify differences and similarities, we review transfer learning algorithms, methods, and applications in comparison with work based on analogical inference. Transfer learning focuses on exploring feature spaces shared across domains through data vectorization while analogical inferences focus on identifying relational structure shared across domains via comparisons. Rather than treating these two learning approaches as synonymous or as independent and mutually irrelevant fields, a better understanding of how they are interconnected can guide a multidisciplinary synthesis of the two approaches. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Big Data Analysis)
Show Figures

Figure 1

Back to TopTop