Computer Vision and Deep Learning: Trends and Applications (2nd Edition)

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: 31 December 2024 | Viewed by 6178

Special Issue Editors


E-Mail Website
Guest Editor
National Reseach Council of Italy (CNR), ISASI Institute of Applied Sciences & Intelligent Systems, Pozzuoli, Italy
Interests: multimedia signal processing; image processing and understanding; image feature extraction and selection; neural network classifiers; object classification and tracking
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Business, Law, Economics and Consumption, Faculty of Communication, IULM University, 20143 Milan, Italy
Interests: computer vision; artificial intelligence; deep learning; image analysis and processing; visual saliency; biomedical image analysis; large language models
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The aim of this Special Issue is to discuss the latest innovations in deep learning technologies applied to computer vision and image processing contexts from a software development company perspective. The Special Issue will focus on:

  • No-Code Deep Learning—a way of programming DL applications without having to go through the long and arduous processes of pre-processing, modeling, designing algorithms, collecting new data, retraining, deployment, and more;
  • TinyDL—IoT-driven; while large-scale machine learning applications exist, their usability is fairly limited. Smaller-scale applications are often necessary. It can take time for a web request to send data to a large server for them to be processed by a machine learning algorithm and then sent back;
  • Full-Stack Deep Learning—a form of wide spreading of deep learning frameworks; the business needs to be able to include deep learning solutions into products, which has led to the emergence of a large demand for “full-stack deep learning”;
  • General Adversarial Networks (GANs)—a way of producing stronger solutions for implementations such as differentiating between different kinds of images. Generative neural networks produce samples that must be checked by discriminative networks which toss out unwanted generated content;
  • Unsupervised and Self-Supervised DL—as automation improves, increased data science solutions are needed without human intervention. We already know from previous techniques that machines cannot learn in a vacuum. They must be able to take new information and analyze those data for the solution that they provide. However, this typically requires human data scientists to feed that information into the system;
  • Reinforcement Learning—where the machine learning system learns from direct experiences with its environment. The environment can use reward/punishment systems to assign value to the observations that the ML system sees;
  • Few-Shot, One-Shot, and Zero-Shot Learning—few-shot learning focuses on limited data. While this has limitations, it does have various applications in fields such as image classification, facial recognition, and text classification. Likewise, one-shot learning uses even less data. Zero-shot learning is an initially confusing prospect. How can machine learning algorithms function without initial data? Zero-shot ML systems observe a subject and use information about that object to predict what classification they may fall into. This is possible for humans.

Dr. Pier Luigi Mazzeo
Dr. Alessandro Bruno
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • machine learning
  • reinforcement learning
  • unsupervised and self-supervised learning
  • general adversarial networks (GANs)
  • no-code machine learning
  • full-stack deep learning
  • few-shot, one-shot, and zero-shot learning
  • tiny machine learning

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

19 pages, 10662 KiB  
Article
SVD-Based Mind-Wandering Prediction from Facial Videos in Online Learning
by Nguy Thi Lan Anh, Nguyen Gia Bach, Nguyen Thi Thanh Tu, Eiji Kamioka and Phan Xuan Tan
J. Imaging 2024, 10(5), 97; https://doi.org/10.3390/jimaging10050097 (registering DOI) - 24 Apr 2024
Viewed by 261
Abstract
This paper presents a novel approach to mind-wandering prediction in the context of webcam-based online learning. We implemented a Singular Value Decomposition (SVD)-based 1D temporal eye-signal extraction method, which relies solely on eye landmark detection and eliminates the need for gaze tracking or [...] Read more.
This paper presents a novel approach to mind-wandering prediction in the context of webcam-based online learning. We implemented a Singular Value Decomposition (SVD)-based 1D temporal eye-signal extraction method, which relies solely on eye landmark detection and eliminates the need for gaze tracking or specialized hardware, then extract suitable features from the signals to train the prediction model. Our thorough experimental framework facilitates the evaluation of our approach alongside baseline models, particularly in the analysis of temporal eye signals and the prediction of attentional states. Notably, our SVD-based signal captures both subtle and major eye movements, including changes in the eye boundary and pupil, surpassing the limited capabilities of eye aspect ratio (EAR)-based signals. Our proposed model exhibits a 2% improvement in the overall Area Under the Receiver Operating Characteristics curve (AUROC) metric and 7% in the F1-score metric for ‘not-focus’ prediction, compared to the combination of EAR-based and computationally intensive gaze-based models used in the baseline study These contributions have potential implications for enhancing the field of attentional state prediction in online learning, offering a practical and effective solution to benefit educational experiences. Full article
Show Figures

Figure 1

15 pages, 2379 KiB  
Article
Enhancing Apple Cultivar Classification Using Multiview Images
by Silvia Krug and Tino Hutschenreuther
J. Imaging 2024, 10(4), 94; https://doi.org/10.3390/jimaging10040094 - 17 Apr 2024
Viewed by 549
Abstract
Apple cultivar classification is challenging due to the inter-class similarity and high intra-class variations. Human experts do not rely on single-view features but rather study each viewpoint of the apple to identify a cultivar, paying close attention to various details. Following our previous [...] Read more.
Apple cultivar classification is challenging due to the inter-class similarity and high intra-class variations. Human experts do not rely on single-view features but rather study each viewpoint of the apple to identify a cultivar, paying close attention to various details. Following our previous work, we try to establish a similar multiview approach for machine-learning (ML)-based apple classification in this paper. In our previous work, we studied apple classification using one single view. While these results were promising, it also became clear that one view alone might not contain enough information in the case of many classes or cultivars. Therefore, exploring multiview classification for this task is the next logical step. Multiview classification is nothing new, and we use state-of-the-art approaches as a base. Our goal is to find the best approach for the specific apple classification task and study what is achievable with the given methods towards our future goal of applying this on a mobile device without the need for internet connectivity. In this study, we compare an ensemble model with two cases where we use single networks: one without view specialization trained on all available images without view assignment and one where we combine the separate views into a single image of one specific instance. The two latter options reflect dataset organization and preprocessing to allow the use of smaller models in terms of stored weights and number of operations than an ensemble model. We compare the different approaches based on our custom apple cultivar dataset. The results show that the state-of-the-art ensemble provides the best result. However, using images with combined views shows a decrease in accuracy by 3% while requiring only 60% of the memory for weights. Thus, simpler approaches with enhanced preprocessing can open a trade-off for classification tasks on mobile devices. Full article
Show Figures

Figure 1

17 pages, 687 KiB  
Article
Enhancing Embedded Object Tracking: A Hardware Acceleration Approach for Real-Time Predictability
by Mingyang Zhang, Kristof Van Beeck and Toon Goedemé
J. Imaging 2024, 10(3), 70; https://doi.org/10.3390/jimaging10030070 - 13 Mar 2024
Viewed by 897
Abstract
While Siamese object tracking has witnessed significant advancements, its hard real-time behaviour on embedded devices remains inadequately addressed. In many application cases, an embedded implementation should not only have a minimal execution latency, but this latency should ideally also have zero variance, i.e., [...] Read more.
While Siamese object tracking has witnessed significant advancements, its hard real-time behaviour on embedded devices remains inadequately addressed. In many application cases, an embedded implementation should not only have a minimal execution latency, but this latency should ideally also have zero variance, i.e., be predictable. This study aims to address this issue by meticulously analysing real-time predictability across different components of a deep-learning-based video object tracking system. Our detailed experiments not only indicate the superiority of Field-Programmable Gate Array (FPGA) implementations in terms of hard real-time behaviour but also unveil important time predictability bottlenecks. We introduce dedicated hardware accelerators for key processes, focusing on depth-wise cross-correlation and padding operations, utilizing high-level synthesis (HLS). Implemented on a KV260 board, our enhanced tracker exhibits not only a speed up, with a factor of 6.6, in mean execution time but also significant improvements in hard real-time predictability by yielding 11 times less latency variation as compared to our baseline. A subsequent analysis of power consumption reveals our approach’s contribution to enhanced power efficiency. These advancements underscore the crucial role of hardware acceleration in realizing time-predictable object tracking on embedded systems, setting new standards for future hardware–software co-design endeavours in this domain. Full article
Show Figures

Figure 1

17 pages, 4016 KiB  
Article
Enhancing Deep Edge Detection through Normalized Hadamard-Product Fusion
by Gang Hu and Conner Saeli
J. Imaging 2024, 10(3), 62; https://doi.org/10.3390/jimaging10030062 - 29 Feb 2024
Viewed by 985
Abstract
Deep edge detection is challenging, especially with the existing methods, like HED (holistic edge detection). These methods combine multiple feature side outputs (SOs) to create the final edge map, but they neglect diverse edge importance within one output. This creates a problem: to [...] Read more.
Deep edge detection is challenging, especially with the existing methods, like HED (holistic edge detection). These methods combine multiple feature side outputs (SOs) to create the final edge map, but they neglect diverse edge importance within one output. This creates a problem: to include desired edges, unwanted noise must also be accepted. As a result, the output often has increased noise or thick edges, ignoring important boundaries. To address this, we propose a new approach called the normalized Hadamard-product (NHP) operation-based deep network for edge detection. By multiplying the side outputs from the backbone network, the Hadamard-product operation encourages agreement among features across different scales while suppressing disagreed weak signals. This method produces additional Mutually Agreed Salient Edge (MASE) maps to enrich the hierarchical level of side outputs without adding complexity. Our experiments demonstrate that the NHP operation significantly improves performance, e.g., an ODS score reaching 0.818 on BSDS500, outperforming human performance (0.803), achieving state-of-the-art results in deep edge detection. Full article
Show Figures

Figure 1

13 pages, 2270 KiB  
Article
Multispectral Deep Neural Network Fusion Method for Low-Light Object Detection
by Keval Thaker, Sumanth Chennupati, Nathir Rawashdeh and Samir A. Rawashdeh
J. Imaging 2024, 10(1), 12; https://doi.org/10.3390/jimaging10010012 - 31 Dec 2023
Viewed by 1829
Abstract
Despite significant strides in achieving vehicle autonomy, robust perception under low-light conditions still remains a persistent challenge. In this study, we investigate the potential of multispectral imaging, thereby leveraging deep learning models to enhance object detection performance in the context of nighttime driving. [...] Read more.
Despite significant strides in achieving vehicle autonomy, robust perception under low-light conditions still remains a persistent challenge. In this study, we investigate the potential of multispectral imaging, thereby leveraging deep learning models to enhance object detection performance in the context of nighttime driving. Features encoded from the red, green, and blue (RGB) visual spectrum and thermal infrared images are combined to implement a multispectral object detection model. This has proven to be more effective compared to using visual channels only, as thermal images provide complementary information when discriminating objects in low-illumination conditions. Additionally, there is a lack of studies on effectively fusing these two modalities for optimal object detection performance. In this work, we present a framework based on the Faster R-CNN architecture with a feature pyramid network. Moreover, we design various fusion approaches using concatenation and addition operators at varying stages of the network to analyze their impact on object detection performance. Our experimental results on the KAIST and FLIR datasets show that our framework outperforms the baseline experiments of the unimodal input source and the existing multispectral object detectors. Full article
Show Figures

Figure 1

Review

Jump to: Research

29 pages, 1382 KiB  
Review
Applied Artificial Intelligence in Healthcare: A Review of Computer Vision Technology Application in Hospital Settings
by Heidi Lindroth, Keivan Nalaie, Roshini Raghu, Ivan N. Ayala, Charles Busch, Anirban Bhattacharyya, Pablo Moreno Franco, Daniel A. Diedrich, Brian W. Pickering and Vitaly Herasevich
J. Imaging 2024, 10(4), 81; https://doi.org/10.3390/jimaging10040081 - 28 Mar 2024
Viewed by 1168
Abstract
Computer vision (CV), a type of artificial intelligence (AI) that uses digital videos or a sequence of images to recognize content, has been used extensively across industries in recent years. However, in the healthcare industry, its applications are limited by factors like privacy, [...] Read more.
Computer vision (CV), a type of artificial intelligence (AI) that uses digital videos or a sequence of images to recognize content, has been used extensively across industries in recent years. However, in the healthcare industry, its applications are limited by factors like privacy, safety, and ethical concerns. Despite this, CV has the potential to improve patient monitoring, and system efficiencies, while reducing workload. In contrast to previous reviews, we focus on the end-user applications of CV. First, we briefly review and categorize CV applications in other industries (job enhancement, surveillance and monitoring, automation, and augmented reality). We then review the developments of CV in the hospital setting, outpatient, and community settings. The recent advances in monitoring delirium, pain and sedation, patient deterioration, mechanical ventilation, mobility, patient safety, surgical applications, quantification of workload in the hospital, and monitoring for patient events outside the hospital are highlighted. To identify opportunities for future applications, we also completed journey mapping at different system levels. Lastly, we discuss the privacy, safety, and ethical considerations associated with CV and outline processes in algorithm development and testing that limit CV expansion in healthcare. This comprehensive review highlights CV applications and ideas for its expanded use in healthcare. Full article
Show Figures

Figure 1

Back to TopTop