Emerging Trends in Advanced Video and Sequence Technology

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: closed (15 September 2023) | Viewed by 7872

Special Issue Editors


E-Mail Website
Guest Editor
School of Control Science and Engineering, Shandong University, Jinan 250061, China
Interests: 3D vision; image/video coding and processing; IndRNN
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Software, Shandong University, Jinan 250101, China
Interests: video coding; computer vision; 3D video processing

E-Mail Website
Guest Editor
School of Information and Communication Engineering, North University of China, Taiyuan 030051, China
Interests: action recognition; sequence processing
Beijing Institute of Artificial Intelligence, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Interests: image/video coding; image processing; multimedia technology

E-Mail Website
Guest Editor
School of Cybersecurity, Chengdu University of Information Technology, Chengdu 610225, China
Interests: video coding

Special Issue Information

Dear Colleagues,

With the expansion of short-video applications, video surveillance, intelligent video analysis, as well as general sensor-based sequence processing, such as lidar or mobile sensors, and combined video/sensor applications, such as autonomous driving, video and general sequences have become more and more popular. With the rich information brought by the video and general sequences, it also poses new challenges in coding, transmission, processing, and analysis. Advanced video and sequence technologies are highly desired.

On the other hand, many new sequence processing techniques have been developed, especially in the era of deep learning, such as the transformer and IndRNN (Independently Recurrent Neural Network). Many conventional video/sequence methods have been renovated, or even completely replaced with deep learning. With the strong power of new tools, it starts a new age for the advanced video and sequence technologies.

This Special Issue focuses on the emerging trends in advanced video and sequence technologies, including new video/sequence applications and datasets, new video/sequence processing methods, and new general tools for video/sequence tasks. Prospective authors are invited to submit high-quality original contributions and reviews to this Special Issue. Specific topics of interest include, but are not limited to:

  • Video coding, including deep learning enhanced methods, fully deep learning-based methods;
  • Video processing, including super-resolution, denoising;
  • 3D video processing, including video-based depth estimation, vision based autonomous driving;
  • Video/sequence-based recognition, including action recognition, skeleton-based action recognition, video-based object detection/segmentation, smartphone-based recognition;
  • New video/sequence representation format, including light field, virtual reality;
  • Various sequence applications and processing methods;
  • New tools for video/sequence processing, including enhanced and specialized transformer, IndRNN, and others.

Prof. Dr. Shuai Li 
Dr. Yanbo Gao
Dr. Chuankun Li
Dr. Jin Wang
Prof. Dr. Yimin Zhou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • video coding
  • video processing
  • 3D video
  • sequence processing
  • action/gesture recognition and transformer
  • IndRNN

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 11940 KiB  
Article
An Investigation of ECAPA-TDNN Audio Type Recognition Method Based on Mel Acoustic Spectrograms
by Jian Wang, Zhongzheng Wang, Xingcheng Han and Yan Han
Electronics 2023, 12(21), 4421; https://doi.org/10.3390/electronics12214421 - 27 Oct 2023
Viewed by 1018
Abstract
Audio signals play a crucial role in our perception of our surroundings. People rely on sound to assess motion, distance, direction, and environmental conditions, aiding in danger avoidance and decision making. However, in real-world environments, during the acquisition and transmission of audio signals, [...] Read more.
Audio signals play a crucial role in our perception of our surroundings. People rely on sound to assess motion, distance, direction, and environmental conditions, aiding in danger avoidance and decision making. However, in real-world environments, during the acquisition and transmission of audio signals, we often encounter various types of noises that interfere with the intended signals. As a result, the essential features of audio signals become significantly obscured. Under the interference of strong noise, identifying noise segments or sound segments, and distinguishing audio types becomes pivotal for detecting specific events and sound patterns or isolating abnormal sounds. This study analyzes the characteristics of Mel’s acoustic spectrogram, explores the application of the deep learning ECAPA-TDNN method for audio type recognition, and substantiates its effectiveness through experiments. Ultimately, the experimental results demonstrate that the deep learning ECAPA-TDNN method for audio type recognition, utilizing Mel’s acoustic spectrogram as features, achieves a notably high recognition accuracy. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

14 pages, 4972 KiB  
Article
Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
by Liule Chen, Jianqiang Li, Yunyu Li and Qing Zhao
Electronics 2023, 12(20), 4305; https://doi.org/10.3390/electronics12204305 - 18 Oct 2023
Cited by 1 | Viewed by 554
Abstract
Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method [...] Read more.
Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

17 pages, 5844 KiB  
Article
Decoding Electroencephalography Underlying Natural Grasp Tasks across Multiple Dimensions
by Hao Gu, Jian Wang, Fengyuan Jiao, Yan Han, Wang Xu and Xin Zhao
Electronics 2023, 12(18), 3894; https://doi.org/10.3390/electronics12183894 - 15 Sep 2023
Viewed by 693
Abstract
Individuals suffering from motor dysfunction due to various diseases often face challenges in performing essential activities such as grasping objects using their upper limbs, eating, writing, and more. This limitation significantly impacts their ability to live independently. Brain–computer interfaces offer a promising solution, [...] Read more.
Individuals suffering from motor dysfunction due to various diseases often face challenges in performing essential activities such as grasping objects using their upper limbs, eating, writing, and more. This limitation significantly impacts their ability to live independently. Brain–computer interfaces offer a promising solution, enabling them to interact with the external environment in a meaningful way. This exploration focused on decoding the electroencephalography of natural grasp tasks across three dimensions: movement-related cortical potentials, event-related desynchronization/synchronization, and brain functional connectivity, aiming to provide assistance for the development of intelligent assistive devices controlled by electroencephalography signals generated during natural movements. Furthermore, electrode selection was conducted using global coupling strength, and a random forest classification model was employed to decode three types of natural grasp tasks (palmar grasp, lateral grasp, and rest state). The results indicated that a noteworthy lateralization phenomenon in brain activity emerged, which is closely associated with the right or left of the executive hand. The reorganization of the frontal region is closely associated with external visual stimuli and the central and parietal regions play a crucial role in the process of motor execution. An overall average classification accuracy of 80.3% was achieved in a natural grasp task involving eight subjects. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

22 pages, 4328 KiB  
Article
An Energy Focusing-Based Scanning and Localization Method for Shallow Underground Explosive Sources
by Dan Wu, Liming Wang and Jian Li
Electronics 2023, 12(18), 3825; https://doi.org/10.3390/electronics12183825 - 10 Sep 2023
Viewed by 755
Abstract
To address the problem of slow speed and low accuracy for recognizing and locating the explosive source in complex shallow underground blind spaces, this paper proposes an energy-focusing-based scanning and localization method. First, the three-dimensional (3D) energy field formed by the source explosion [...] Read more.
To address the problem of slow speed and low accuracy for recognizing and locating the explosive source in complex shallow underground blind spaces, this paper proposes an energy-focusing-based scanning and localization method. First, the three-dimensional (3D) energy field formed by the source explosion is reconstructed using the energy-focusing properties of the steered response power (SRP) localization model, and the velocity field is calculated based on a multilayered stochastic medium model by considering the random statistical characteristics of the medium. Then, a power function factor is introduced to quantum particle swarm optimization (QPSO) to search for and solve the above energy field and to approach the real location of the energy focus point. Additionally, the initial population is constructed based on the logistic chaos model to realize global traversal. Finally, extensive simulation results based on the real-world dataset show that compared to the baseline algorithm, the focusing accuracy of the energy field of the proposed scheme is improved by 117.20%, the root mean square error (RMSE) is less than 0.0551 m, the triaxial relative error (RE) is within 0.2595%, and the average time cost is reduced by 98.40%. It has strong advantages in global search capability and fast convergence, as well as robustness and generalization. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

13 pages, 2288 KiB  
Communication
Unsupervised Multi-Scale-Stage Content-Aware Homography Estimation
by Bin Hou, Jinlai Ren and Weiqing Yan
Electronics 2023, 12(9), 1976; https://doi.org/10.3390/electronics12091976 - 24 Apr 2023
Cited by 2 | Viewed by 986
Abstract
Homography estimation is a critical component in many computer-vision tasks. However, most deep homography methods focus on extracting local features and ignore global features or the corresponding relationship between features from two images or video frames. These methods are effective for alignment of [...] Read more.
Homography estimation is a critical component in many computer-vision tasks. However, most deep homography methods focus on extracting local features and ignore global features or the corresponding relationship between features from two images or video frames. These methods are effective for alignment of image pairs with small displacement. In this paper, we propose an unsupervised Multi-Scale-Stage Content-Aware Homography Estimation Network (MS2CA-HENet). In the framework, we use multi-scale input images for different stages to cope with different scales of transformations. In each stage, we consider local and global features via our Self-Attention-augmented ConvNet (SAC). Furthermore, feature matching is explicitly enhanced using feature-matching modules. By shrinking the error residual of each stage, our network achieves coarse-to-fine results. Experiments show that our MS2CA-HENet achieves better results than other methods. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

12 pages, 3825 KiB  
Article
A Gunshot Recognition Method Based on Multi-Scale Spectrum Shift Module
by Jian Li, Jinming Guo, Mingxing Ma, Yuan Zeng, Chuankun Li and Jibin Xu
Electronics 2022, 11(23), 3859; https://doi.org/10.3390/electronics11233859 - 23 Nov 2022
Cited by 2 | Viewed by 1526
Abstract
In view of the issues such as the larger network model and lower recognition accuracy of the current gunshot recognition networks, a neural network based on a multi-scale spectrum shift module is proposed in this paper to fully mine the relevant information among [...] Read more.
In view of the issues such as the larger network model and lower recognition accuracy of the current gunshot recognition networks, a neural network based on a multi-scale spectrum shift module is proposed in this paper to fully mine the relevant information among the gunshot spectrums. This network employs the architecture of a densely connected convolutional network and uses a multi-scale spectrum shift module on the branch to realize the interaction among spectrum information. This spectrum shift replaces the under-sampling operation among the spectrums, realizes the globalized feature extraction of the spectrum, avoids the loss of information during the under-sampling process, and further improves the quality of the spectrum feature map. Experiments were conducted based on the NIJ Grant 2016-DN-BX-0183 gunshot dataset and YouTube dataset on gunshots that have been open to the public, both of whose classification accuracy reached 83.2% and 95.1%, respectively, with the size of the network model being controlled at around 16 MB. The experimental results indicate that, compared with other existing methods for convolutional neural network, the proposed network can mine globalized time-frequency information better and effectively, and has a higher accuracy of gunshot recognition. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

16 pages, 5923 KiB  
Article
Identifying the Acoustic Source via MFF-ResNet with Low Sample Complexity
by Min Cui, Yang Liu, Yanbo Wang and Pan Wang
Electronics 2022, 11(21), 3578; https://doi.org/10.3390/electronics11213578 - 01 Nov 2022
Viewed by 1281
Abstract
Acoustic signal classification plays a central role in acoustic source identification. In practical applications, however, varieties of training data are typically inadequate, which leads to a low sample complexity. Applying classical deep learning methods to identify acoustic signals involves a large number of [...] Read more.
Acoustic signal classification plays a central role in acoustic source identification. In practical applications, however, varieties of training data are typically inadequate, which leads to a low sample complexity. Applying classical deep learning methods to identify acoustic signals involves a large number of parameters in the classification model, which calls for great sample complexity. Therefore, low sample complexity modeling is one of the most important issues related to the performance of the acoustic signal classification. In this study, the authors propose a novel data fusion model named MFF-ResNet, in which manual design features and deep representation of log-Mel spectrogram features are fused with bi-level attention. The proposed approach involves an amount of prior human knowledge as implicit regularization, thus leading to an interpretable and low sample complexity model of the acoustic signal classification. The experimental results suggested that MFF-ResNet is capable of accurate acoustic signal classification with fewer training samples. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

Back to TopTop