Deep Learning in Image Processing and Computer Vision

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: 15 October 2024 | Viewed by 3553

Special Issue Editor


E-Mail Website
Guest Editor
School of Info Technology, Deakin University, Geelong, VIC 3220, Australia
Interests: computer vision; pattern recognition; image processing; multimedia

Special Issue Information

Dear Colleagues,

Deep learning has been widely applied in various research fields and played a crucial role in many applications. The successes and achievements of deep learning have been clearly demonstrated by the realisation of handwritten character recognition, image classification and retrieval, object detection and segmentation, action recognition, video analysis, and 3D scene understanding. Over the last decade, the research community has witnessed the rapid growth of deep learning, with many advanced architectures and learning algorithms having been developed and applied to solve complex and real-world problems.

This Special Issue aims to promote the field of deep learning with a focus on deep learning-based techniques for image processing and computer vision. We call for submissions showcasing cutting-edge research and novel applications in image processing and computer vision using deep learning approaches. Original research articles and reviews are welcome. Research areas may include (but are not limited to) the following topics:

  • Image recognition;
  • Object detection;
  • Image and object segmentation;
  • Action detection and recognition;
  • Video analysis;
  • 3D vision (scene understanding, point cloud analysis);
  • Image and video synthesis;
  • Image processing/computer vision-based applications (healthcare, robotics, environmental protection).

I look forward to receiving your contributions.

Dr. Duc Thanh Nguyen
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • image processing
  • computer vision

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 4242 KiB  
Article
Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain
by Yuan Gao, Yu Zhang, Ping Zeng and Yingjie Ma
Electronics 2024, 13(9), 1749; https://doi.org/10.3390/electronics13091749 - 01 May 2024
Viewed by 261
Abstract
The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into [...] Read more.
The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into a significant threat, complicating the task of deepfake detection. Despite the notable success of current deepfake detection methods, which predominantly employ data-driven CNN classification models, these methods exhibit limited generalization capabilities and insufficient robustness against novel data unseen during training. To tackle these challenges, this paper introduces a novel detection framework, ReLAF-Net. This framework employs a restricted self-attention mechanism that applies self-attention to deep CNN features flexibly, facilitating the learning of local relationships and inter-regional dependencies at both fine-grained and global levels. This attention mechanism has a modular design that can be seamlessly integrated into CNN networks to improve overall detection performance. Additionally, we propose an adaptive local frequency feature extraction algorithm that decomposes RGB images into fine-grained frequency domains in a data-driven manner, effectively isolating fake indicators in the frequency space. Moreover, an attention-based channel fusion strategy is developed to amalgamate RGB and frequency information, achieving a comprehensive facial representation. Tested on the high-quality version of the FaceForensics++ dataset, our method attained a detection accuracy of 97.92%, outperforming other approaches. Cross-dataset validation on Celeb-DF, DFDC, and DFD confirms the robust generalizability, offering a new solution for detecting high-quality deepfake videos. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
18 pages, 1827 KiB  
Article
Design and Development of a CCSDS 131.2-B Software-Defined Radio Receiver Based on Graphics Processing Unit Accelerators
by Roberto Ciardi, Gianluca Giuffrida, Matteo Bertolucci and Luca Fanucci
Electronics 2024, 13(1), 209; https://doi.org/10.3390/electronics13010209 - 02 Jan 2024
Viewed by 903
Abstract
In recent years, the number of Earth Observation missions has been exponentially increasing. Satellites dedicated to these missions usually embark with payloads that produce large amount of data and that need to be transmitted towards ground stations, in time-limited windows. Moreover, the noisy [...] Read more.
In recent years, the number of Earth Observation missions has been exponentially increasing. Satellites dedicated to these missions usually embark with payloads that produce large amount of data and that need to be transmitted towards ground stations, in time-limited windows. Moreover, the noisy nature of the link between satellites and ground stations makes it hard to achieve reliable communication. To address these problems, a standard for a flexible advanced coding and modulation scheme for high-rate telemetry applications has been defined by the Consultative Committee for Space Data Systems (CCSDS). The defined standard, referred to as CCSDS 131.2-B, makes use of Serially Concatenated Convolutional Codes (SCCC) based on 27 ModCods to optimize transmission quality. A limiting factor in the adoption of this standard is represented by the complexity and the cost of the hardware required for developing high-performance receivers. In the last decade, the performance of software has grown due to the advancement of general-purpose processing hardware, leading to the development of many high-performance software systems even in the telecommunication sector. These are commonly referred to as Software-Defined Radio (SDR), indicating a radio communication system in which components that are usually implemented in hardware, by means of FPGAs or ASICs, are instead implemented in software, offering many advantages such as flexibility, modularity, extensibility, cheaper maintenance, and cost saving. This paper proposes the development of an SDR based on NVIDIA Graphics Processing Units (GPU) for implementing the receiver end of the CCSDS 131.2-B standard. At first, a brief description of the CCSDS 131.2-B standard is given, focusing on the architecture of the transmitter and receiver sides. Then, the receiver architecture is shown, giving an overview of its functional blocks and of the implementation choices made to optimize the processing of the signal, especially for the SCCC Decoder. Finally, the performance of the system is analyzed in terms of data-rate and error correction and compared with other SW systems to highlight the achieved improvements. The presented system has been demonstrated to be a perfect solution for CCSDS 131.2-B-compliant device testing and for its use in science missions, providing a valid low-cost alternative with respect to the state-of-the-art HW receivers. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 3197 KiB  
Article
DepressionGraph: A Two-Channel Graph Neural Network for the Diagnosis of Major Depressive Disorders Using rs-fMRI
by Zhiqiang Xia, Yusi Fan, Kewei Li, Yueying Wang, Lan Huang and Fengfeng Zhou
Electronics 2023, 12(24), 5040; https://doi.org/10.3390/electronics12245040 - 18 Dec 2023
Viewed by 902
Abstract
Major depressive disorder (MDD) is a prevalent psychiatric condition with a complex and unknown pathological mechanism. Resting-state functional magnetic resonance imaging (rs-fMRI) has emerged as a valuable non-invasive technology for MDD diagnosis. By utilizing rs-fMRI data, a dynamic brain functional connection network (FCN) [...] Read more.
Major depressive disorder (MDD) is a prevalent psychiatric condition with a complex and unknown pathological mechanism. Resting-state functional magnetic resonance imaging (rs-fMRI) has emerged as a valuable non-invasive technology for MDD diagnosis. By utilizing rs-fMRI data, a dynamic brain functional connection network (FCN) can be constructed to represent the complex interacting relationships of multiple brain sub-regions. Graph neural network (GNN) models have been widely employed to extract disease-associated information. The simple averaging or summation graph readout functions of GNNs may lead to a loss of critical information. This study introduces a two-channel graph neural network (DepressionGraph) that effectively aggregates more comprehensive graph information from the two channels based on the node feature number and node number. Our proposed DepressionGraph model leverages the transformer–encoder architecture to extract the relevant information from the time-series FCN. The rs-fMRI data were obtained from a cohort of 533 subjects, and the experimental data show that DepressionGraph outperforms both traditional GNNs and simple graph readout functions for the MDD diagnosis task. The introduced DepressionGraph framework demonstrates efficacy in extracting complex patterns from rs-fMRI data and exhibits promising capabilities for the precise diagnosis of complex neurological disorders. The current study acknowledges a potential gender bias due to an imbalanced gender distribution in the dataset. Future research should prioritize the development and utilization of gender-balanced datasets to mitigate this limitation and enhance the generalizability of the findings. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 7020 KiB  
Article
YOLO-GCRS: A Remote Sensing Image Object Detection Algorithm Incorporating a Global Contextual Attention Mechanism
by Huan Liao and Wenqiu Zhu
Electronics 2023, 12(20), 4272; https://doi.org/10.3390/electronics12204272 - 16 Oct 2023
Viewed by 974
Abstract
With the significant advancements in deep learning technology, the domain of remote sensing image processing has witnessed a surge in attention, particularly in the field of object detection. The detection of targets in remotely sensed images is a challenging task, primarily due to [...] Read more.
With the significant advancements in deep learning technology, the domain of remote sensing image processing has witnessed a surge in attention, particularly in the field of object detection. The detection of targets in remotely sensed images is a challenging task, primarily due to the abundance of small-sized targets and their multi-scale distribution. These challenges often result in inaccurate object detection, leading to both missed detections and false positives. To overcome these issues, this paper presents a novel algorithm called YOLO-GCRS. This algorithm builds upon the original YOLOv5s algorithm by enhancing the feature capture capability of the backbone network. This enhancement is achieved by integrating a new module, the Global Context Block (GC-C3), with the C3 backbone network. Additionally, the algorithm incorporates a convoluted block known as CBM (Convolution + BatchNormalization + Mish) to enhance the network model’s capability of extracting depth features. Moreover, a detection head, ECAHead, is proposed, which integrates an efficient attention channel (ECA) for extracting high-dimensional features from images. It achieves higher precision, recall, and mAP@0.5 values (98.3%, 94.7%, and 97.7%, respectively) on the publicly available RSOD dataset compared to the original YOLOv5s algorithm (improving by 5.3%, 0.8%, and 2.7%, respectively). Furthermore, when compared to mainstream detection algorithms like YOLOv7-tiny and YOLOv8s, the proposed algorithm exhibits improvements of 2.0% and 7.5%, respectively, in mAP@0.5. These results provide validation for the effectiveness of our YOLO-GCRS algorithm in addressing the challenges of missed and false detections in remote sensing object detection. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop