sensors-logo

Journal Browser

Journal Browser

Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing

A topical collection in Sensors (ISSN 1424-8220). This collection belongs to the section "Sensing and Imaging".

Viewed by 80714

Editors

School of Electronics and Communication Engineering, Sun Yat-Sen University, Shenzhen 518017, China
Interests: video coding; image processing
Department of Computer Science, City University of Hong Kong, 83 Tatchee Ave., Kowloon, Hong Kong, China
Interests: image processing; video processing; image segmentation; machine learning
State Key Laboratory of Space Weather, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
Interests: image processing; machine learning; solar radio astronomy; synthesis aperture imaging
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Interests: multimedia signal processing; coding and transmission; computer vision

Topical Collection Information

Dear Colleagues,

Deep learning techniques are capable of discovering knowledge from massive unstructured data and providing data-driven solutions. They have significantly improved technical advancements in many research fields and applications, such as audio-visual signal processing, computer vision, and pattern recognition. Additionally, deep learning and its improved techniques are expected to be included in future sensors and imaging systems.

Today, with the rapid development of advanced deep learning models and techniques, such as GAN, DNN, RNN, and LSTM, and the increasing demands around the effectiveness of visual signal processing, new opputunities are emerging in advances in deep-learning-based sensing, imaging, and video processing. This Special Issue aims at promoting cutting-edge research along this direction and offering a timely collection of works for researchers. We welcome high-quality original submissions related to advances in deep-learning-based sensing, imaging, and video processing.

Topics of interest include, but are not limited to

  • Deep learning theory, framework, database, and learning optimization;
  • Deep-learning-based remote sensing, multispectral, and/or high spectral sensing;
  • Deep-learning-based computational imaging and pre-processing;
  • Deep-learning-based visual perceptual model and quality assessment metrics;
  • Deep-learning-based image/video compression and communication;
  • Deep-learning-based 3D/multiview sensing, imaging, and video processing;
  • Deep-learning-based depth sensing and estimation;
  • Deep-learning-based image/video rendering, reconstruction, and enhancement;
  • Deep-learning-based visual object detection, tracking, and understanding;
  • Low complexity optimizations on deep-learning-based sensing, imaging and video processing;
  • Other advanced deep-learning-based visual sensing and signal processing.

Prof. Dr. Yun Zhang
Prof. Dr. KWONG Tak Wu Sam
Prof. Dr. Xu Long
Prof. Dr. Tiesong Zhao
Collection Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the collection website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • neural network
  • video compression
  • visual quality assessment
  • image and video enhancement
  • mutlispetral sensing and imaging

Published Papers (31 papers)

2024

Jump to: 2023, 2022, 2021

19 pages, 4886 KiB  
Article
Burst-Enhanced Super-Resolution Network (BESR)
by Jiaao Li, Qunbo Lv, Wenjian Zhang, Yu Zhang and Zheng Tan
Sensors 2024, 24(7), 2052; https://doi.org/10.3390/s24072052 - 23 Mar 2024
Viewed by 226
Abstract
Multi-frame super-resolution (MFSR) leverages complementary information between image sequences of the same scene to increase the resolution of the reconstructed image. As a branch of MFSR, burst super-resolution aims to restore image details by leveraging the complementary information between noisy sequences. In this [...] Read more.
Multi-frame super-resolution (MFSR) leverages complementary information between image sequences of the same scene to increase the resolution of the reconstructed image. As a branch of MFSR, burst super-resolution aims to restore image details by leveraging the complementary information between noisy sequences. In this paper, we propose an efficient burst-enhanced super-resolution network (BESR). Specifically, we introduce Geformer, a gate-enhanced transformer, and construct an enhanced CNN-Transformer block (ECTB) by combining convolutions to enhance local perception. ECTB efficiently aggregates intra-frame context and inter-frame correlation information, yielding an enhanced feature representation. Additionally, we leverage reference features to facilitate inter-frame communication, enhancing spatiotemporal coherence among multiple frames. To address the critical processes of inter-frame alignment and feature fusion, we propose optimized pyramid alignment (OPA) and hybrid feature fusion (HFF) modules to capture and utilize complementary information between multiple frames to recover more high-frequency details. Extensive experiments demonstrate that, compared to state-of-the-art methods, BESR achieves higher efficiency and competitively superior reconstruction results. On the synthetic dataset and real-world dataset of BurstSR, our BESR achieves PSNR values of 42.79 dB and 48.86 dB, respectively, outperforming other MFSR models significantly. Full article
Show Figures

Figure 1

31 pages, 3474 KiB  
Review
A Survey of Deep Learning Road Extraction Algorithms Using High-Resolution Remote Sensing Images
by Shaoyi Mo, Yufeng Shi, Qi Yuan and Mingyue Li
Sensors 2024, 24(5), 1708; https://doi.org/10.3390/s24051708 - 06 Mar 2024
Viewed by 569
Abstract
Roads are the fundamental elements of transportation, connecting cities and rural areas, as well as people’s lives and work. They play a significant role in various areas such as map updates, economic development, tourism, and disaster management. The automatic extraction of road features [...] Read more.
Roads are the fundamental elements of transportation, connecting cities and rural areas, as well as people’s lives and work. They play a significant role in various areas such as map updates, economic development, tourism, and disaster management. The automatic extraction of road features from high-resolution remote sensing images has always been a hot and challenging topic in the field of remote sensing, and deep learning network models are widely used to extract roads from remote sensing images in recent years. In light of this, this paper systematically reviews and summarizes the deep-learning-based techniques for automatic road extraction from high-resolution remote sensing images. It reviews the application of deep learning network models in road extraction tasks and classifies these models into fully supervised learning, semi-supervised learning, and weakly supervised learning based on their use of labels. Finally, a summary and outlook of the current development of deep learning techniques in road extraction are provided. Full article
Show Figures

Figure 1

2023

Jump to: 2024, 2022, 2021

22 pages, 2825 KiB  
Article
Thermal Image Super-Resolution Based on Lightweight Dynamic Attention Network for Infrared Sensors
by Haikun Zhang, Yueli Hu and Ming Yan
Sensors 2023, 23(21), 8717; https://doi.org/10.3390/s23218717 - 25 Oct 2023
Cited by 2 | Viewed by 995
Abstract
Infrared sensors capture infrared rays radiated by objects to form thermal images. They have a steady ability to penetrate smoke and fog, and are widely used in security monitoring, military, etc. However, civilian infrared detectors with lower resolution cannot compare with megapixel RGB [...] Read more.
Infrared sensors capture infrared rays radiated by objects to form thermal images. They have a steady ability to penetrate smoke and fog, and are widely used in security monitoring, military, etc. However, civilian infrared detectors with lower resolution cannot compare with megapixel RGB camera sensors. In this paper, we propose a dynamic attention mechanism-based thermal image super-resolution network for infrared sensors. Specifically, the dynamic attention modules adaptively reweight the outputs of the attention and non-attention branches according to features at different depths of the network. The attention branch, which consists of channel- and pixel-wise attention blocks, is responsible for extracting the most informative features, while the non-attention branch is adopted as a supplement to extract the remaining ignored features. The dynamic weights block operates with 1D convolution instead of the full multi-layer perceptron on the global average pooled features, reducing parameters and enhancing information interaction between channels, and the same structure is adopted in the channel attention block. Qualitative and quantitative results on three testing datasets demonstrate that the proposed network can superior restore high-frequency details while improving the resolution of thermal images. And the lightweight structure of the proposed network with lower computing cost can be practically deployed on edge devices, effectively improving the imaging perception quality of infrared sensors. Full article
Show Figures

Figure 1

19 pages, 1503 KiB  
Article
Scale-Hybrid Group Distillation with Knowledge Disentangling for Continual Semantic Segmentation
by Zichen Song, Xiaoliang Zhang and Zhaofeng Shi
Sensors 2023, 23(18), 7820; https://doi.org/10.3390/s23187820 - 12 Sep 2023
Viewed by 632
Abstract
Continual semantic segmentation (CSS) aims to learn new tasks sequentially and extract object(s) and stuff represented by pixel-level maps of new categories while preserving the original segmentation capabilities even when the old class data is absent. Current CSS methods typically preserve the capacities [...] Read more.
Continual semantic segmentation (CSS) aims to learn new tasks sequentially and extract object(s) and stuff represented by pixel-level maps of new categories while preserving the original segmentation capabilities even when the old class data is absent. Current CSS methods typically preserve the capacities of segmenting old classes via knowledge distillation, which encounters the limitations of insufficient utilization of the semantic knowledge, i.e., only distilling the last layer of the feature encoder, and the semantic shift of background caused by directly distilling the entire feature map of the decoder. In this paper, we propose a novel CCS method based on scale-hybrid distillation and knowledge disentangling to address these limitations. Firstly, we propose a scale-hybrid group semantic distillation (SGD) method for encoding, which transfers the multi-scale knowledge from the old model’s feature encoder with group pooling refinement to improve the stability of new models. Then, the knowledge disentangling distillation (KDD) method for decoding is proposed to distillate feature maps with the guidance of the old class regions and reduce incorrect guides from old models towards better plasticity. Extensive experiments are conducted on the Pascal VOC and ADE20K datasets. Competitive performance compared with other state-of-the-art methods demonstrates the effectiveness of our proposed method. Full article
Show Figures

Figure 1

19 pages, 4308 KiB  
Article
Deep Sensing for Compressive Video Acquisition
by Michitaka Yoshida, Akihiko Torii, Masatoshi Okutomi, Rin-ichiro Taniguchi, Hajime Nagahara and Yasushi Yagi
Sensors 2023, 23(17), 7535; https://doi.org/10.3390/s23177535 - 30 Aug 2023
Cited by 1 | Viewed by 803
Abstract
A camera captures multidimensional information of the real world by convolving it into two dimensions using a sensing matrix. The original multidimensional information is then reconstructed from captured images. Traditionally, multidimensional information has been captured by uniform sampling, but by optimizing the sensing [...] Read more.
A camera captures multidimensional information of the real world by convolving it into two dimensions using a sensing matrix. The original multidimensional information is then reconstructed from captured images. Traditionally, multidimensional information has been captured by uniform sampling, but by optimizing the sensing matrix, we can capture images more efficiently and reconstruct multidimensional information with high quality. Although compressive video sensing requires random sampling as a theoretical optimum, when designing the sensing matrix in practice, there are many hardware limitations (such as exposure and color filter patterns). Existing studies have found random sampling is not always the best solution for compressive sensing because the optimal sampling pattern is related to the scene context, and it is hard to manually design a sampling pattern and reconstruction algorithm. In this paper, we propose an end-to-end learning approach that jointly optimizes the sampling pattern as well as the reconstruction decoder. We applied this deep sensing approach to the video compressive sensing problem. We modeled the spatio–temporal sampling and color filter pattern using a convolutional neural network constrained by hardware limitations during network training. We demonstrated that the proposed method performs better than the manually designed method in gray-scale video and color video acquisitions. Full article
Show Figures

Figure 1

22 pages, 898 KiB  
Article
Small Object Detection and Tracking: A Comprehensive Review
by Behzad Mirzaei, Hossein Nezamabadi-pour, Amir Raoof and Reza Derakhshani
Sensors 2023, 23(15), 6887; https://doi.org/10.3390/s23156887 - 03 Aug 2023
Cited by 4 | Viewed by 6497
Abstract
Object detection and tracking are vital in computer vision and visual surveillance, allowing for the detection, recognition, and subsequent tracking of objects within images or video sequences. These tasks underpin surveillance systems, facilitating automatic video annotation, identification of significant events, and detection of [...] Read more.
Object detection and tracking are vital in computer vision and visual surveillance, allowing for the detection, recognition, and subsequent tracking of objects within images or video sequences. These tasks underpin surveillance systems, facilitating automatic video annotation, identification of significant events, and detection of abnormal activities. However, detecting and tracking small objects introduce significant challenges within computer vision due to their subtle appearance and limited distinguishing features, which results in a scarcity of crucial information. This deficit complicates the tracking process, often leading to diminished efficiency and accuracy. To shed light on the intricacies of small object detection and tracking, we undertook a comprehensive review of the existing methods in this area, categorizing them from various perspectives. We also presented an overview of available datasets specifically curated for small object detection and tracking, aiming to inform and benefit future research in this domain. We further delineated the most widely used evaluation metrics for assessing the performance of small object detection and tracking techniques. Finally, we examined the present challenges within this field and discussed prospective future trends. By tackling these issues and leveraging upcoming trends, we aim to push forward the boundaries in small object detection and tracking, thereby augmenting the functionality of surveillance systems and broadening their real-world applicability. Full article
Show Figures

Figure 1

19 pages, 6755 KiB  
Article
Pixel Intensity Resemblance Measurement and Deep Learning Based Computer Vision Model for Crack Detection and Analysis
by Nirmala Paramanandham, Kishore Rajendiran, Florence Gnana Poovathy J, Yeshwant Santhanakrishnan Premanand, Sanjeeve Raveenthiran Mallichetty and Pramod Kumar
Sensors 2023, 23(6), 2954; https://doi.org/10.3390/s23062954 - 08 Mar 2023
Cited by 1 | Viewed by 2159
Abstract
This research article is aimed at improving the efficiency of a computer vision system that uses image processing for detecting cracks. Images are prone to noise when captured using drones or under various lighting conditions. To analyze this, the images were gathered under [...] Read more.
This research article is aimed at improving the efficiency of a computer vision system that uses image processing for detecting cracks. Images are prone to noise when captured using drones or under various lighting conditions. To analyze this, the images were gathered under various conditions. To address the noise issue and to classify the cracks based on the severity level, a novel technique is proposed using a pixel-intensity resemblance measurement (PIRM) rule. Using PIRM, the noisy images and noiseless images were classified. Then, the noise was filtered using a median filter. The cracks were detected using VGG-16, ResNet-50 and InceptionResNet-V2 models. Once the crack was detected, the images were then segregated using a crack risk-analysis algorithm. Based on the severity level of the crack, an alert can be given to the authorized person to take the necessary action to avoid major accidents. The proposed technique achieved a 6% improvement without PIRM and a 10% improvement with the PIRM rule for the VGG-16 model. Similarly, it showed 3 and 10% for ResNet-50, 2 and 3% for Inception ResNet and a 9 and 10% increment for the Xception model. When the images were corrupted from a single noise alone, 95.6% accuracy was achieved using the ResNet-50 model for Gaussian noise, 99.65% accuracy was achieved through Inception ResNet-v2 for Poisson noise, and 99.95% accuracy was achieved by the Xception model for speckle noise. Full article
Show Figures

Figure 1

17 pages, 5156 KiB  
Article
Blind Video Quality Assessment for Ultra-High-Definition Video Based on Super-Resolution and Deep Reinforcement Learning
by Zefeng Ying, Da Pan and Ping Shi
Sensors 2023, 23(3), 1511; https://doi.org/10.3390/s23031511 - 29 Jan 2023
Cited by 1 | Viewed by 1775
Abstract
Ultra-high-definition (UHD) video has brought new challenges to objective video quality assessment (VQA) due to its high resolution and high frame rate. Most existing VQA methods are designed for non-UHD videos—when they are employed to deal with UHD videos, the processing speed will [...] Read more.
Ultra-high-definition (UHD) video has brought new challenges to objective video quality assessment (VQA) due to its high resolution and high frame rate. Most existing VQA methods are designed for non-UHD videos—when they are employed to deal with UHD videos, the processing speed will be slow and the global spatial features cannot be fully extracted. In addition, these VQA methods usually segment the video into multiple segments, predict the quality score of each segment, and then average the quality score of each segment to obtain the quality score of the whole video. This breaks the temporal correlation of the video sequences and is inconsistent with the characteristics of human visual perception. In this paper, we present a no-reference VQA method, aiming to effectively and efficiently predict quality scores for UHD videos. First, we construct a spatial distortion feature network based on a super-resolution model (SR-SDFNet), which can quickly extract the global spatial distortion features of UHD videos. Then, to aggregate the spatial distortion features of each UHD frame, we propose a time fusion network based on a reinforcement learning model (RL-TFNet), in which the actor network continuously combines multiple frame features extracted by SR-SDFNet and outputs an action to adjust the current quality score to approximate the subjective score, and the critic network outputs action values to optimize the quality perception of the actor network. Finally, we conduct large-scale experiments on UHD VQA databases and the results reveal that, compared to other state-of-the-art VQA methods, our method achieves competitive quality prediction performance with a shorter runtime and fewer model parameters. Full article
Show Figures

Figure 1

19 pages, 3561 KiB  
Article
Privacy Preserving Image Encryption with Optimal Deep Transfer Learning Based Accident Severity Classification Model
by Uddagiri Sirisha and Bolem Sai Chandana
Sensors 2023, 23(1), 519; https://doi.org/10.3390/s23010519 - 03 Jan 2023
Cited by 15 | Viewed by 1779
Abstract
Effective accident management acts as a vital part of emergency and traffic control systems. In such systems, accident data can be collected from different sources (unmanned aerial vehicles, surveillance cameras, on-site people, etc.) and images are considered a major source. Accident site photos [...] Read more.
Effective accident management acts as a vital part of emergency and traffic control systems. In such systems, accident data can be collected from different sources (unmanned aerial vehicles, surveillance cameras, on-site people, etc.) and images are considered a major source. Accident site photos and measurements are the most important evidence. Attackers will steal data and breach personal privacy, causing untold costs. The massive number of images commonly employed poses a significant challenge to privacy preservation, and image encryption can be used to accomplish cloud storage and secure image transmission. Automated severity estimation using deep-learning (DL) models becomes essential for effective accident management. Therefore, this article presents a novel Privacy Preserving Image Encryption with Optimal Deep-Learning-based Accident Severity Classification (PPIE-ODLASC) method. The primary objective of the PPIE-ODLASC algorithm is to securely transmit the accident images and classify accident severity into different levels. In the presented PPIE-ODLASC technique, two major processes are involved, namely encryption and severity classification (i.e., high, medium, low, and normal). For accident image encryption, the multi-key homomorphic encryption (MKHE) technique with lion swarm optimization (LSO)-based optimal key generation procedure is involved. In addition, the PPIE-ODLASC approach involves YOLO-v5 object detector to identify the region of interest (ROI) in the accident images. Moreover, the accident severity classification module encompasses Xception feature extractor, bidirectional gated recurrent unit (BiGRU) classification, and Bayesian optimization (BO)-based hyperparameter tuning. The experimental validation of the proposed PPIE-ODLASC algorithm is tested utilizing accident images and the outcomes are examined in terms of many measures. The comparative examination revealed that the PPIE-ODLASC technique showed an enhanced performance of 57.68 dB over other existing models. Full article
Show Figures

Figure 1

2022

Jump to: 2024, 2023, 2021

15 pages, 4035 KiB  
Article
Lightweight Super-Resolution with Self-Calibrated Convolution for Panoramic Videos
by Fanjie Shang, Hongying Liu, Wanhao Ma, Yuanyuan Liu, Licheng Jiao, Fanhua Shang, Lijun Wang and Zhenyu Zhou
Sensors 2023, 23(1), 392; https://doi.org/10.3390/s23010392 - 30 Dec 2022
Cited by 2 | Viewed by 1519
Abstract
Panoramic videos are shot by an omnidirectional camera or a collection of cameras, and can display a view in every direction. They can provide viewers with an immersive feeling. The study of super-resolution of panoramic videos has attracted much attention, and many methods [...] Read more.
Panoramic videos are shot by an omnidirectional camera or a collection of cameras, and can display a view in every direction. They can provide viewers with an immersive feeling. The study of super-resolution of panoramic videos has attracted much attention, and many methods have been proposed, especially deep learning-based methods. However, due to complex architectures of all the methods, they always result in a large number of hyperparameters. To address this issue, we propose the first lightweight super-resolution method with self-calibrated convolution for panoramic videos. A new deformable convolution module is designed first, with self-calibration convolution, which can learn more accurate offset and enhance feature alignment. Moreover, we present a new residual dense block for feature reconstruction, which can significantly reduce the parameters while maintaining performance. The performance of the proposed method is compared to those of the state-of-the-art methods, and is verified on the MiG panoramic video dataset. Full article
Show Figures

Figure 1

23 pages, 5688 KiB  
Article
Automatic Recognition of Road Damage Based on Lightweight Attentional Convolutional Neural Network
by Han Liang, Seong-Cheol Lee and Suyoung Seo
Sensors 2022, 22(24), 9599; https://doi.org/10.3390/s22249599 - 07 Dec 2022
Cited by 4 | Viewed by 2504
Abstract
An efficient road damage detection system can reduce the risk of road defects to motorists and road maintenance costs to traffic management authorities, for which a lightweight end-to-end road damage detection network is proposed in this paper, aiming at fast and automatic accurate [...] Read more.
An efficient road damage detection system can reduce the risk of road defects to motorists and road maintenance costs to traffic management authorities, for which a lightweight end-to-end road damage detection network is proposed in this paper, aiming at fast and automatic accurate identification and classification of multiple types of road damage. The proposed technique consists of a backbone network based on a combination of lightweight feature detection modules constituted with a multi-scale feature fusion network, which is more beneficial for target identification and classification at different distances and angles than other studies. An embedded lightweight attention module was also developed that can enhance feature information by assigning weights to multi-scale convolutional kernels to improve detection accuracy with fewer parameters. The proposed model generally has higher performance and fewer parameters than other representative models. According to our practice tests, it can identify many types of road damage based on the images captured by vehicle cameras and meet the real-time detection required when piggybacking on mobile systems. Full article
Show Figures

Figure 1

11 pages, 7541 KiB  
Article
Industrial Anomaly Detection with Skip Autoencoder and Deep Feature Extractor
by Ta-Wei Tang, Hakiem Hsu, Wei-Ren Huang and Kuan-Ming Li
Sensors 2022, 22(23), 9327; https://doi.org/10.3390/s22239327 - 30 Nov 2022
Cited by 3 | Viewed by 1848
Abstract
Over recent years, with the advances in image recognition technology for deep learning, researchers have devoted continued efforts toward importing anomaly detection technology into the production line of automatic optical detection. Although unsupervised learning helps overcome the high costs associated with labeling, the [...] Read more.
Over recent years, with the advances in image recognition technology for deep learning, researchers have devoted continued efforts toward importing anomaly detection technology into the production line of automatic optical detection. Although unsupervised learning helps overcome the high costs associated with labeling, the accuracy of anomaly detection still needs to be improved. Accordingly, this paper proposes a novel deep learning model for anomaly detection to overcome this bottleneck. Leveraging a powerful pre-trained feature extractor and the skip connection, the proposed method achieves better feature extraction and image reconstructing capabilities. Results reveal that the areas under the curve (AUC) for the proposed method are higher than those of previous anomaly detection models for 16 out of 17 categories. This indicates that the proposed method can realize the most appropriate adjustments to the needs of production lines in order to maximize economic benefits. Full article
Show Figures

Figure 1

20 pages, 3364 KiB  
Article
Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
by Shima Javanmardi, Ali Mohammad Latif, Mohammad Taghi Sadeghi, Mehrdad Jahanbanifard, Marcello Bonsangue and Fons J. Verbeek
Sensors 2022, 22(21), 8376; https://doi.org/10.3390/s22218376 - 01 Nov 2022
Cited by 4 | Viewed by 1747
Abstract
In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach [...] Read more.
In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach to handling issues related to semantics and their relationships. Despite the improvements, the existing techniques suffer from inadequate positional and geometrical attributes concepts. The reason is that most of the abovementioned approaches depend on Convolutional Neural Networks (CNNs) for object detection. CNN is notorious for failing to detect equivariance and rotational invariance in objects. Moreover, the pooling layers in CNNs cause valuable information to be lost. Inspired by the recent successful approaches, this paper introduces a novel framework for extracting meaningful descriptions based on a parallelized capsule network that describes the content of images through a high level of understanding of the semantic contents of an image. The main contribution of this paper is proposing a new method that not only overrides the limitations of CNNs but also generates descriptions with a wide variety of words by using Wikipedia. In our framework, capsules focus on the generation of meaningful descriptions with more detailed spatial and geometrical attributes for a given set of images by considering the position of the entities as well as their relationships. Qualitative experiments on the benchmark dataset MS-COCO show that our framework outperforms state-of-the-art image captioning models when describing the semantic content of the images. Full article
Show Figures

Figure 1

21 pages, 13912 KiB  
Article
Deep Learning-Based Synthesized View Quality Enhancement with DIBR Distortion Mask Prediction Using Synthetic Images
by Huan Zhang, Jiangzhong Cao, Dongsheng Zheng, Ximei Yao and Bingo Wing-Kuen Ling
Sensors 2022, 22(21), 8127; https://doi.org/10.3390/s22218127 - 24 Oct 2022
Cited by 4 | Viewed by 1857
Abstract
Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus [...] Read more.
Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus Depth (MVD) data, the training data for quality enhancement models is small, which limits the performance and progress of these models. Augmenting the training data to enhance the synthesized view quality enhancement (SVQE) models is a feasible solution. In this paper, a deep learning-based SVQE model using more synthetic synthesized view images (SVIs) is suggested. To simulate the irregular geometric displacement of DIBR distortion, a random irregular polygon-based SVI synthesis method is proposed based on existing massive RGB/RGBD data, and a synthetic synthesized view database is constructed, which includes synthetic SVIs and the DIBR distortion mask. Moreover, to further guide the SVQE models to focus more precisely on DIBR distortion, a DIBR distortion mask prediction network which could predict the position and variance of DIBR distortion is embedded into the SVQE models. The experimental results on public MVD sequences demonstrate that the PSNR performance of the existing SVQE models, e.g., DnCNN, NAFNet, and TSAN, pre-trained on NYU-based synthetic SVIs could be greatly promoted by 0.51-, 0.36-, and 0.26 dB on average, respectively, while the MPPSNRr performance could also be elevated by 0.86, 0.25, and 0.24 on average, respectively. In addition, by introducing the DIBR distortion mask prediction network, the SVI quality obtained by the DnCNN and NAFNet pre-trained on NYU-based synthetic SVIs could be further enhanced by 0.02- and 0.03 dB on average in terms of the PSNR and 0.004 and 0.121 on average in terms of the MPPSNRr. Full article
Show Figures

Figure 1

17 pages, 4653 KiB  
Article
Semi-Supervised Defect Detection Method with Data-Expanding Strategy for PCB Quality Inspection
by Yusen Wan, Liang Gao, Xinyu Li and Yiping Gao
Sensors 2022, 22(20), 7971; https://doi.org/10.3390/s22207971 - 19 Oct 2022
Cited by 6 | Viewed by 1847
Abstract
Printed circuit board (PCB) defect detection plays a crucial role in PCB production, and the popular methods are based on deep learning and require large-scale datasets with high-level ground-truth labels, in which it is time-consuming and costly to label these datasets. Semi-supervised learning [...] Read more.
Printed circuit board (PCB) defect detection plays a crucial role in PCB production, and the popular methods are based on deep learning and require large-scale datasets with high-level ground-truth labels, in which it is time-consuming and costly to label these datasets. Semi-supervised learning (SSL) methods, which reduce the need for labeled samples by leveraging unlabeled samples, can address this problem well. However, for PCB defects, the detection accuracy on small numbers of labeled samples still needs to be improved because the number of labeled samples is small, and the training process will be disturbed by the unlabeled samples. To overcome this problem, this paper proposed a semi-supervised defect detection method with a data-expanding strategy (DE-SSD). The proposed DE-SSD uses both the labeled and unlabeled samples, which can reduce the cost of data labeling, and a batch-adding strategy (BA-SSL) is introduced to leverage the unlabeled data with less disturbance. Moreover, a data-expanding (DE) strategy is proposed to use the labeled samples from other datasets to expand the target dataset, which can also prevent the disturbance by the unlabeled samples. Based on the improvements, the proposed DE-SSD can achieve competitive results for PCB defects with fewer labeled samples. The experimental results on DeepPCB indicate that the proposed DE-SSD achieves state-of-the-art performance, which is improved by 4.7 mAP at least compared with the previous methods. Full article
Show Figures

Figure 1

15 pages, 9839 KiB  
Article
Low-Light Image Enhancement Using Photometric Alignment with Hierarchy Pyramid Network
by Jing Ye, Xintao Chen, Changzhen Qiu and Zhiyong Zhang
Sensors 2022, 22(18), 6799; https://doi.org/10.3390/s22186799 - 08 Sep 2022
Viewed by 1730
Abstract
Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. [...] Read more.
Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. In this paper, inspired by a coarse-to-fine strategy, we propose an end-to-end image-level alignment with pixel-wise perceptual information enhancement pipeline for low-light image enhancement. A coarse adaptive global photometric alignment sub-network is constructed to reduce style differences, which facilitates improving illumination and revealing under-exposure area information. After the learned aligned image, a hierarchy pyramid enhancement sub-network is used to optimize image quality, which helps to remove amplified noise and enhance the local detail of low-light images. We also propose a multi-residual cascade attention block (MRCAB) that involves channel split and concatenation strategy, polarized self-attention mechanism, which leads to high-resolution reconstruction images in perceptual quality. Extensive experiments have demonstrated the effectiveness of our method on various datasets and significantly outperformed other state-of-the-art methods in detail and color reproduction. Full article
Show Figures

Figure 1

3 pages, 166 KiB  
Editorial
Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing
by Yun Zhang, Sam Kwong, Long Xu and Tiesong Zhao
Sensors 2022, 22(16), 6192; https://doi.org/10.3390/s22166192 - 18 Aug 2022
Cited by 3 | Viewed by 1424
Abstract
Deep learning techniques have shown their capabilities to discover knowledge from massive unstructured data, providing data-driven solutions for representation and decision making [...] Full article
14 pages, 3795 KiB  
Article
A Timestamp-Independent Haptic–Visual Synchronization Method for Haptic-Based Interaction System
by Yiwen Xu, Liangtao Huang, Tiesong Zhao, Ying Fang and Liqun Lin
Sensors 2022, 22(15), 5502; https://doi.org/10.3390/s22155502 - 23 Jul 2022
Cited by 2 | Viewed by 1384
Abstract
The booming haptic data significantly improve the users’ immersion during multimedia interaction. As a result, the study of a Haptic-based Interaction System has attracted the attention of the multimedia community. To construct such a system, a challenging task is the synchronization of multiple [...] Read more.
The booming haptic data significantly improve the users’ immersion during multimedia interaction. As a result, the study of a Haptic-based Interaction System has attracted the attention of the multimedia community. To construct such a system, a challenging task is the synchronization of multiple sensorial signals that is critical to the user experience. Despite audio-visual synchronization efforts, there is still a lack of a haptic-aware multimedia synchronization model. In this work, we propose a timestamp-independent synchronization for haptic–visual signal transmission. First, we exploit the sequential correlations during delivery and playback of a haptic–visual communication system. Second, we develop a key sample extraction of haptic signals based on the force feedback characteristics and a key frame extraction of visual signals based on deep-object detection. Third, we combine the key samples and frames to synchronize the corresponding haptic–visual signals. Without timestamps in the signal flow, the proposed method is still effective and more robust in complicated network conditions. Subjective evaluation also shows a significant improvement of user experience with the proposed method. Full article
Show Figures

Figure 1

17 pages, 13643 KiB  
Article
Inspection of Underwater Hull Surface Condition Using the Soft Voting Ensemble of the Transfer-Learned Models
by Byung Chul Kim, Hoe Chang Kim, Sungho Han and Dong Kyou Park
Sensors 2022, 22(12), 4392; https://doi.org/10.3390/s22124392 - 10 Jun 2022
Cited by 8 | Viewed by 1973
Abstract
In this study, we propose a method for inspecting the condition of hull surfaces using underwater images acquired from the camera of a remotely controlled underwater vehicle (ROUV). To this end, a soft voting ensemble classifier comprising six well-known convolutional neural network models [...] Read more.
In this study, we propose a method for inspecting the condition of hull surfaces using underwater images acquired from the camera of a remotely controlled underwater vehicle (ROUV). To this end, a soft voting ensemble classifier comprising six well-known convolutional neural network models was used. Using the transfer learning technique, the images of the hull surfaces were used to retrain the six models. The proposed method exhibited an accuracy of 98.13%, a precision of 98.73%, a recall of 97.50%, and an F1-score of 98.11% for the classification of the test set. Furthermore, the time taken for the classification of one image was verified to be approximately 56.25 ms, which is applicable to ROUVs that require real-time inspection. Full article
Show Figures

Figure 1

25 pages, 10159 KiB  
Article
A Hybrid Deep Learning and Visualization Framework for Pushing Behavior Detection in Pedestrian Dynamics
by Ahmed Alia, Mohammed Maree and Mohcine Chraibi
Sensors 2022, 22(11), 4040; https://doi.org/10.3390/s22114040 - 26 May 2022
Cited by 9 | Viewed by 2862
Abstract
Crowded event entrances could threaten the comfort and safety of pedestrians, especially when some pedestrians push others or use gaps in crowds to gain faster access to an event. Studying and understanding pushing dynamics leads to designing and building more comfortable and safe [...] Read more.
Crowded event entrances could threaten the comfort and safety of pedestrians, especially when some pedestrians push others or use gaps in crowds to gain faster access to an event. Studying and understanding pushing dynamics leads to designing and building more comfortable and safe entrances. Researchers—to understand pushing dynamics—observe and analyze recorded videos to manually identify when and where pushing behavior occurs. Despite the accuracy of the manual method, it can still be time-consuming, tedious, and hard to identify pushing behavior in some scenarios. In this article, we propose a hybrid deep learning and visualization framework that aims to assist researchers in automatically identifying pushing behavior in videos. The proposed framework comprises two main components: (i) Deep optical flow and wheel visualization; to generate motion information maps. (ii) A combination of an EfficientNet-B0-based classifier and a false reduction algorithm for detecting pushing behavior at the video patch level. In addition to the framework, we present a new patch-based approach to enlarge the data and alleviate the class imbalance problem in small-scale pushing behavior datasets. Experimental results (using real-world ground truth of pushing behavior videos) demonstrate that the proposed framework achieves an 86% accuracy rate. Moreover, the EfficientNet-B0-based classifier outperforms baseline CNN-based classifiers in terms of accuracy. Full article
Show Figures

Figure 1

17 pages, 2998 KiB  
Article
Color-Dense Illumination Adjustment Network for Removing Haze and Smoke from Fire Scenario Images
by Chuansheng Wang, Jinxing Hu, Xiaowei Luo, Mei-Po Kwan, Weihua Chen and Hao Wang
Sensors 2022, 22(3), 911; https://doi.org/10.3390/s22030911 - 25 Jan 2022
Cited by 4 | Viewed by 2452
Abstract
The atmospheric particles and aerosols from burning usually cause visual artifacts in single images captured from fire scenarios. Most existing haze removal methods exploit the atmospheric scattering model (ASM) for visual enhancement, which inevitably leads to inaccurate estimation of the atmosphere light and [...] Read more.
The atmospheric particles and aerosols from burning usually cause visual artifacts in single images captured from fire scenarios. Most existing haze removal methods exploit the atmospheric scattering model (ASM) for visual enhancement, which inevitably leads to inaccurate estimation of the atmosphere light and transmission matrix of the smoky and hazy inputs. To solve these problems, we present a novel color-dense illumination adjustment network (CIANet) for joint recovery of transmission matrix, illumination intensity, and the dominant color of aerosols from a single image. Meanwhile, to improve the visual effects of the recovered images, the proposed CIANet jointly optimizes the transmission map, atmospheric optical value, the color of aerosol, and a preliminary recovered scene. Furthermore, we designed a reformulated ASM, called the aerosol scattering model (ESM), to smooth out the enhancement results while keeping the visual effects and the semantic information of different objects. Experimental results on both the proposed RFSIE and NTIRE’20 demonstrate our superior performance favorably against state-of-the-art dehazing methods regarding PSNR, SSIM and subjective visual quality. Furthermore, when concatenating CIANet with Faster R-CNN, we witness an improvement of the objection performance with a large margin. Full article
Show Figures

Figure 1

2021

Jump to: 2024, 2023, 2022

16 pages, 28869 KiB  
Article
Small Object Detection in Traffic Scenes Based on YOLO-MXANet
by Xiaowei He, Rao Cheng, Zhonglong Zheng and Zeji Wang
Sensors 2021, 21(21), 7422; https://doi.org/10.3390/s21217422 - 08 Nov 2021
Cited by 22 | Viewed by 3837
Abstract
In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is [...] Read more.
In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages. Full article
Show Figures

Figure 1

11 pages, 2880 KiB  
Article
Deepfake Detection Using the Rate of Change between Frames Based on Computer Vision
by Gihun Lee and Mihui Kim
Sensors 2021, 21(21), 7367; https://doi.org/10.3390/s21217367 - 05 Nov 2021
Cited by 12 | Viewed by 9490
Abstract
Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is [...] Read more.
Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is a compound word for deep learning and fake. It refers to a fake video created using artificial intelligence technology or the production process itself. Deepfakes can be exploited for political abuse, pornography, and fake information. This paper proposes a method to determine integrity by analyzing the computer vision features of digital content. The proposed method extracts the rate of change in the computer vision features of adjacent frames and then checks whether the video is manipulated. The test demonstrated the highest detection rate of 97% compared to the existing method or machine learning method. It also maintained the highest detection rate of 96%, even for the test that manipulates the matrix of the image to avoid the convolutional neural network detection method. Full article
Show Figures

Figure 1

16 pages, 7505 KiB  
Article
Improving the Ability of a Laser Ultrasonic Wave-Based Detection of Damage on the Curved Surface of a Pipe Using a Deep Learning Technique
by Byoungjoon Yu, Kassahun Demissie Tola, Changgil Lee and Seunghee Park
Sensors 2021, 21(21), 7105; https://doi.org/10.3390/s21217105 - 26 Oct 2021
Cited by 8 | Viewed by 3739
Abstract
With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic [...] Read more.
With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic pipe damage detection system was built using a laser-scanned pipe’s ultrasonic wave propagation imaging (UWPI) data and conventional neural network (CNN)-based object detection algorithms. The algorithm used in this study was EfficientDet-d0, a CNN-based object detection algorithm which uses the transfer learning method. As a result, the mean average precision (mAP) was measured to be 0.39. The result found was higher than COCO EfficientDet-d0 mAP, which is expected to enable the efficient maintenance of piping used in construction and many industries. Full article
Show Figures

Figure 1

15 pages, 2472 KiB  
Article
Compressed Video Quality Index Based on Saliency-Aware Artifact Detection
by Liqun Lin, Jing Yang, Zheng Wang, Liping Zhou, Weiling Chen and Yiwen Xu
Sensors 2021, 21(19), 6429; https://doi.org/10.3390/s21196429 - 26 Sep 2021
Cited by 4 | Viewed by 6087
Abstract
Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers [...] Read more.
Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers can observe visually annoying artifacts, namely, Perceivable Encoding Artifacts (PEAs), which degrade their perceived video quality. To monitor and measure these PEAs (including blurring, blocking, ringing and color bleeding), we propose an objective video quality metric named Saliency-Aware Artifact Measurement (SAAM) without any reference information. The SAAM metric first introduces video saliency detection to extract interested regions and further splits these regions into a finite number of image patches. For each image patch, the data-driven model is utilized to evaluate intensities of PEAs. Finally, these intensities are fused into an overall metric using Support Vector Regression (SVR). In experiment section, we compared the SAAM metric with other popular video quality metrics on four publicly available databases: LIVE, CSIQ, IVP and FERIT-RTRK. The results reveal the promising quality prediction performance of the SAAM metric, which is superior to most of the popular compressed video quality evaluation models. Full article
Show Figures

Figure 1

18 pages, 8410 KiB  
Article
MSF-Net: Multi-Scale Feature Learning Network for Classification of Surface Defects of Multifarious Sizes
by Pengcheng Xu, Zhongyuan Guo, Lei Liang and Xiaohang Xu
Sensors 2021, 21(15), 5125; https://doi.org/10.3390/s21155125 - 29 Jul 2021
Cited by 8 | Viewed by 2372
Abstract
In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local [...] Read more.
In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features. Full article
Show Figures

Figure 1

21 pages, 8853 KiB  
Article
Wheat Ear Recognition Based on RetinaNet and Transfer Learning
by Jingbo Li, Changchun Li, Shuaipeng Fei, Chunyan Ma, Weinan Chen, Fan Ding, Yilin Wang, Yacong Li, Jinjin Shi and Zhen Xiao
Sensors 2021, 21(14), 4845; https://doi.org/10.3390/s21144845 - 16 Jul 2021
Cited by 36 | Viewed by 3596
Abstract
The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the [...] Read more.
The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the background, which can be challenging to obtain the number of wheat ears required. In this paper, the performance of Faster regions with convolutional neural networks (Faster R-CNN) and RetinaNet to predict the number of wheat ears for wheat at different growth stages under different conditions is investigated. The results show that using the Global WHEAT dataset for recognition, the RetinaNet method, and the Faster R-CNN method achieve an average accuracy of 0.82 and 0.72, with the RetinaNet method obtaining the highest recognition accuracy. Secondly, using the collected image data for recognition, the R2 of RetinaNet and Faster R-CNN after transfer learning is 0.9722 and 0.8702, respectively, indicating that the recognition accuracy of the RetinaNet method is higher on different data sets. We also tested wheat ears at both the filling and maturity stages; our proposed method has proven to be very robust (the R2 is above 90). This study provides technical support and a reference for automatic wheat ear recognition and yield estimation. Full article
Show Figures

Figure 1

12 pages, 3924 KiB  
Communication
Bionic Birdlike Imaging Using a Multi-Hyperuniform LED Array
by Xin-Yu Zhao, Li-Jing Li, Lei Cao and Ming-Jie Sun
Sensors 2021, 21(12), 4084; https://doi.org/10.3390/s21124084 - 14 Jun 2021
Cited by 1 | Viewed by 2388
Abstract
Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during [...] Read more.
Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during demosaicking process which causes degradation of image quality. Inspired by the biological structure of the avian retinas, we developed a chromatic LED array which has a geometric arrangement of multi-hyperuniformity, which exhibits an irregularity on small-length scales but a quasi-uniformity on large scales, to suppress frequency aliasing and color misregistration in full color image retrieval. Experiments were performed with a single-pixel imaging system using the multi-hyperuniform chromatic LED array to provide structured illumination, and 208 fps frame rate was achieved at 32 × 32 pixel resolution. By comparing the experimental results with the images captured with a conventional digital camera, it has been demonstrated that the proposed imaging system forms images with less chromatic moiré patterns and color misregistration artifacts. The concept proposed verified here could provide insights for the design and the manufacturing of future bionic imaging sensors. Full article
Show Figures

Figure 1

19 pages, 2182 KiB  
Article
Attention Networks for the Quality Enhancement of Light Field Images
by Ionut Schiopu and Adrian Munteanu
Sensors 2021, 21(9), 3246; https://doi.org/10.3390/s21093246 - 07 May 2021
Cited by 1 | Viewed by 2007
Abstract
In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using [...] Read more.
In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using efficient complex processing blocks and novel attention-based residual blocks. The network takes advantage of the macro-pixel (MP) structure, specific to LF images, and processes each reconstructed MP in the luminance (Y) channel. The input patch is represented as a tensor that collects, from an MP neighbourhood, four Epipolar Plane Images (EPIs) at four different angles. The experimental results on a common LF image database showed high improvements over HEVC in terms of the structural similarity index (SSIM), with an average Y-Bjøntegaard Delta (BD)-rate savings of 36.57%, and an average Y-BD-PSNR improvement of 2.301 dB. Increased performance was achieved when the HEVC built-in filtering methods were skipped. The visual results illustrate that the enhanced image contains sharper edges and more texture details. The ablation study provides two robust solutions to reduce the inference time by 44.6% and the network complexity by 74.7%. The results demonstrate the potential of attention networks for the quality enhancement of LF images encoded by HEVC. Full article
Show Figures

Figure 1

20 pages, 4137 KiB  
Article
DNet: Dynamic Neighborhood Feature Learning in Point Cloud
by Fujing Tian, Zhidi Jiang and Gangyi Jiang
Sensors 2021, 21(7), 2327; https://doi.org/10.3390/s21072327 - 26 Mar 2021
Cited by 4 | Viewed by 1898
Abstract
Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the [...] Read more.
Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the neighborhood, without considering whether the selected neighborhood is reasonable or not. To solve this problem, this paper proposes a new point cloud learning network, denoted as Dynamic neighborhood Network (DNet), to dynamically select the neighborhood and learn the features of each point. The proposed DNet has a multi-head structure which has two important modules: the Feature Enhancement Layer (FELayer) and the masking mechanism. The FELayer enhances the manifold features of the point cloud, while the masking mechanism is used to remove the neighborhood points with low contribution. The DNet can learn the manifold features and spatial geometric features of point cloud, and obtain the relationship between each point and its effective neighborhood points through the masking mechanism, so that the dynamic neighborhood features of each point can be obtained. Experimental results on three public datasets demonstrate that compared with the state-of-the-art learning networks, the proposed DNet shows better superiority and competitiveness in point cloud processing task. Full article
Show Figures

Figure 1

18 pages, 8556 KiB  
Article
NRA-Net—Neg-Region Attention Network for Salient Object Detection with Gaze Tracking
by Hoijun Kim, Soonchul Kwon and Seunghyun Lee
Sensors 2021, 21(5), 1753; https://doi.org/10.3390/s21051753 - 04 Mar 2021
Cited by 5 | Viewed by 2133
Abstract
In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated [...] Read more.
In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated line of sight using deep learning techniques. The existing deep learning-based method has an autoencoder structure, which causes feature loss during the encoding process of compressing and extracting features from the image and the decoding process of expanding and restoring. As a result, a feature loss occurs in the area of the object from the detection results, or another area is detected as an object. The proposed method, that is, NRA, can be used for reducing feature loss and emphasizing object areas with encoders. After separating positive and negative regions using the exponential linear unit activation function, converted attention was performed for each region. The attention method provided without using the backbone network emphasized the object area and suppressed the background area. In the experimental results, the proposed method showed higher detection results than the conventional methods. Full article
Show Figures

Figure 1

Back to TopTop