Image/Video Processing and Encoding for Contemporary Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 31 May 2024 | Viewed by 4977

Special Issue Editor


E-Mail Website
Guest Editor
School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon 440-746, Republic of Korea
Interests: deep learning; image/video signal processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Even though the topics of image/video signal processing and compression have been studied for many years, the research trend is evolving with recent emerging ideas and methods for various new applications and needs. For example, due to artificial intelligence (AI) advances, many improvements in image/video signal processing have been made to improve the quality and understanding of the images and video scenes. Also, there are increasing numbers of papers to apply machine learning and learning-based approaches to image and video encoding and image communication areas. These image/video processing and encoding with AI techniques show a state-of-the-art performance at diverse applications: autonomous driving, medical imaging, CCTV surveillance, factory inspection, image/video coding, and communication, etc.

The focus of this Special Issue is the state-of-the-art research related to the image/video processing and encoding of recent learning-based methods and/or novel approaches into diverse applications. Topics of interest include, but are not limited to:

  • Image/video acquisition, representation, presentation, and display
  • Image/video processing, filtering and transforms, analysis and synthesis
  • Learning and understanding of image/video data
  • Image/video compression, transmission, communication, and networking
  • Image/video pre/post-processing, video restoration, and super-resolution, etc.
  • Machine learning/deep learning schemes for image/video processing and coding
  • Diverse image/video applications such as medical imaging, autonomous driving, etc.

Prof. Dr. Jitae Shin
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image and video processing
  • image and video coding
  • machine learning/deep learning
  • image/video applications
  • image/video communication

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 4002 KiB  
Article
UFCC: A Unified Forensic Approach to Locating Tampered Areas in Still Images and Detecting Deepfake Videos by Evaluating Content Consistency
by Po-Chyi Su, Bo-Hong Huang and Tien-Ying Kuo
Electronics 2024, 13(4), 804; https://doi.org/10.3390/electronics13040804 - 19 Feb 2024
Viewed by 576
Abstract
Image inpainting and Deepfake techniques have the potential to drastically alter the meaning of visual content, posing a serious threat to the integrity of both images and videos. Addressing this challenge requires the development of effective methods to verify the authenticity of investigated [...] Read more.
Image inpainting and Deepfake techniques have the potential to drastically alter the meaning of visual content, posing a serious threat to the integrity of both images and videos. Addressing this challenge requires the development of effective methods to verify the authenticity of investigated visual data. This research introduces UFCC (Unified Forensic Scheme by Content Consistency), a novel forensic approach based on deep learning. UFCC can identify tampered areas in images and detect Deepfake videos by examining content consistency, assuming that manipulations can create dissimilarity between tampered and intact portions of visual data. The term “Unified” signifies that the same methodology is applicable to both still images and videos. Recognizing the challenge of collecting a diverse dataset for supervised learning due to various tampering methods, we overcome this limitation by incorporating information from original or unaltered content in the training process rather than relying solely on tampered data. A neural network for feature extraction is trained to classify imagery patches, and a Siamese network measures the similarity between pairs of patches. For still images, tampered areas are identified as patches that deviate from the majority of the investigated image. In the case of Deepfake video detection, the proposed scheme involves locating facial regions and determining authenticity by comparing facial region similarity across consecutive frames. Extensive testing is conducted on publicly available image forensic datasets and Deepfake datasets with various manipulation operations. The experimental results highlight the superior accuracy and stability of the UFCC scheme compared to existing methods. Full article
(This article belongs to the Special Issue Image/Video Processing and Encoding for Contemporary Applications)
Show Figures

Figure 1

12 pages, 2534 KiB  
Article
Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster Centroids
by Hyekyoung Hwang, Il Yong Chun and Jitae Shin
Electronics 2024, 13(1), 21; https://doi.org/10.3390/electronics13010021 - 19 Dec 2023
Viewed by 543
Abstract
Deep learning (DL) systems have been remarkably successful in various applications, but they could have critical misbehaviors. To identify the weakness of a trained model and overcome it with new data collection(s), one needs to figure out the corner cases of a trained [...] Read more.
Deep learning (DL) systems have been remarkably successful in various applications, but they could have critical misbehaviors. To identify the weakness of a trained model and overcome it with new data collection(s), one needs to figure out the corner cases of a trained model. Constructing new datasets to retrain a DL model requires extra budget and time. Test input prioritization (TIP) techniques have been proposed to identify corner cases more effectively. The state-of-the-art TIP approach adopts a monitoring method to TIP and prioritizes based on Gini impurity; one estimates the similarity between a DL prediction probability and uniform distribution. This letter proposes a new TIP method that uses a distance between false prediction cluster (FPC) centroids in a training set and a test instance in the last-layer feature space to prioritize error-inducing instances among an unlabeled test set. We refer to the proposed method as DeepFPC. Our numerical experiments show that the proposed DeepFPC method achieves significantly improved TIP performance in several image classification and active learning tasks. Full article
(This article belongs to the Special Issue Image/Video Processing and Encoding for Contemporary Applications)
Show Figures

Figure 1

15 pages, 6647 KiB  
Article
Transform-Based Feature Map Compression Method for Video Coding for Machines (VCM)
by Minhun Lee, Seungjin Park, Seoung-Jun Oh, Younhee Kim, Se Yoon Jeong, Jooyoung Lee and Donggyu Sim
Electronics 2023, 12(19), 4042; https://doi.org/10.3390/electronics12194042 - 26 Sep 2023
Viewed by 966
Abstract
The burgeoning field of machine vision has led to the development by the Moving Picture Experts Group (MPEG) of a new type of compression technology called video coding for machines (VCM), to enhance machine recognition through video information compression. This research proposes a [...] Read more.
The burgeoning field of machine vision has led to the development by the Moving Picture Experts Group (MPEG) of a new type of compression technology called video coding for machines (VCM), to enhance machine recognition through video information compression. This research proposes a principal component analysis (PCA)-based compression methodology for multi-level feature maps extracted from the feature pyramid network (FPN) structure. Unlike current PCA-based studies that independently carry out PCA for each feature map, our approach employs a generalized basis matrix and mean vector derived from channel correlations by a generalized PCA process to eliminate the need for a PCA process. Further compression is achieved by amalgamating high-dimensional feature maps, capitalizing on the spatial redundancy within these multi-level feature maps. As a result, the proposed VCM encoder forgoes the PCA process, and the generalized data do not incur any compression loss. It only requires compressing the coefficients for each feature map using versatile video coding (VVC). Experimental results demonstrate superior performance by our method over all feature anchors for each machine vision task, as specified by the MPEG-VCM common test conditions, outperforming previous PCA-based feature map compression methods. Notably, it achieved an 89.3% BD-rate reduction for instance segmentation tasks. Full article
(This article belongs to the Special Issue Image/Video Processing and Encoding for Contemporary Applications)
Show Figures

Figure 1

18 pages, 12846 KiB  
Article
Optical and SAR Image Registration Based on Multi-Scale Orientated Map of Phase Congruency
by Leilei Jia, Jian Dong, Siyuan Huang, Limin Liu and Junning Zhang
Electronics 2023, 12(7), 1635; https://doi.org/10.3390/electronics12071635 - 30 Mar 2023
Cited by 3 | Viewed by 1334
Abstract
Optical and Synthetic Aperture Radar (SAR) images are highly complementary, and their registrations are a fundamental task for other remote sensing applications. Traditional feature-matching algorithms fail to solve the significant nonlinear radiation difference (NRD) caused by different sensors. To address this problem, a [...] Read more.
Optical and Synthetic Aperture Radar (SAR) images are highly complementary, and their registrations are a fundamental task for other remote sensing applications. Traditional feature-matching algorithms fail to solve the significant nonlinear radiation difference (NRD) caused by different sensors. To address this problem, a robust registration algorithm with the multi-scale orientated map of phase congruency (MSPCO) is proposed. First, a nonlinear diffusion scale space is established to obtain the scale invariance of feature points. Compared with the linear Gaussian scale space, the nonlinear diffusion scale space can better preserve the edge and texture information. Second, to ensure the quantity and repeatability of features, corner points and edge points are detected on the moment map of phase congruency, respectively, which is the foundation to the next feature matching. Third, the MSPCO descriptor is constructed via the orientation of phase congruency (PCO). PCO is highly robust to NRD, and the different scales of PCOs enhance the robustness of the descriptor. Finally, a feature-matching strategy based on an effective scale ratio is proposed, which reduces the number of comparisons among features and improves computational efficiency. The experimental results show that the proposed method is better than the existing feature-based methods in terms of the number of correct matches and registration accuracy. The registration accuracy is only inferior to that of the most advanced template matching method, and the accuracy difference is within 0.3 pixels, which fully demonstrates the robustness and accuracy of the proposed method in optical and SAR image registration. Full article
(This article belongs to the Special Issue Image/Video Processing and Encoding for Contemporary Applications)
Show Figures

Figure 1

Review

Jump to: Research

20 pages, 3580 KiB  
Review
Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches
by Zainab Ouardirhi, Sidi Ahmed Mahmoudi and Mostapha Zbakh
Electronics 2024, 13(3), 541; https://doi.org/10.3390/electronics13030541 - 29 Jan 2024
Viewed by 832
Abstract
Smart video surveillance systems (SVSs) have garnered significant attention for their autonomous monitoring capabilities, encompassing automated detection, tracking, analysis, and decision making within complex environments, with minimal human intervention. In this context, object detection is a fundamental task in SVS. However, many current [...] Read more.
Smart video surveillance systems (SVSs) have garnered significant attention for their autonomous monitoring capabilities, encompassing automated detection, tracking, analysis, and decision making within complex environments, with minimal human intervention. In this context, object detection is a fundamental task in SVS. However, many current approaches often overlook occlusion by nearby objects, posing challenges to real-world SVS applications. To address this crucial issue, this paper presents a comprehensive comparative analysis of occlusion-handling techniques tailored for object detection. The review outlines the pretext tasks common to both domains and explores various architectural solutions to combat occlusion. Unlike prior studies that primarily focus on a single dataset, our analysis spans multiple benchmark datasets, providing a thorough assessment of various object detection methods. By extending the evaluation to datasets beyond the KITTI benchmark, this study offers a more holistic understanding of each approach’s strengths and limitations. Additionally, we delve into persistent challenges in existing occlusion-handling approaches and emphasize the need for innovative strategies and future research directions to drive substantial progress in this field. Full article
(This article belongs to the Special Issue Image/Video Processing and Encoding for Contemporary Applications)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Detecting Forged Images and Deepfake Videos via Content Consistency Evaluation
Authors: Po-Chyi Su; Bo-Hong Huang; Tien-Ying Kuo
Affiliation: Dept. of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
Abstract: Image inpainting and Deepfake can significantly change the meaning of imagery contents, so both are considered serious threats to the integrity of visual data. Observing that such manipulations may lead to content inconsistency in images and video frames, we propose a deep-learning-based forensic scheme in this research via evaluating content consistency for identifying forgery or affected areas in images and videos. Since the ways of tampering are diverse, it is impractical to collect enough tampered data for supervised learning. The proposed method avoids using tampered data of various kinds in the training process but employs the information of original/unaltered contents instead. A feature extraction neural network is trained for classifying imagery blocks or patches. The similarity measurement using the Siamese network to evaluate the consistency of patch pairs helps to locate tampered areas. For image manipulation detection, a segmentation network is employed to refine the manipulated regions further. In the cases of Deepfake video detection, facial regions are first located, and then the video's authenticity is determined by comparing the similarity of such regions between consecutive frames. Extensive tests are applied on publicly available datasets, encompassing images and videos with various manipulation operations. The experimental results demonstrate superior accuracy and stability compared to existing methods.

Back to TopTop