entropy-logo

Journal Browser

Journal Browser

Deep Learning Models and Applications to Computer Vision

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Signal and Data Analysis".

Deadline for manuscript submissions: closed (30 September 2023) | Viewed by 20452

Special Issue Editors

School of Computer Science and Mathematics, Keele University, Staffordshire ST5 5GB, UK
Interests: machine learning; computer vision; image processing; visual data; privacy; security; object classification; activity recognition; medical image analysis
Special Issues, Collections and Topics in MDPI journals
Physical, Mathematical and Engineering Sciences, University of Chester, Parkgate Road, Chester CH1 4BJ, UK
Interests: neural networks; deep learning; IoT; smart cities; resource-efficient machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Information theory has proved to be effective for solving many computer-related vision and pattern recognition problems (including, but not limited to, entropy thresholding, feature selection, clustering and segmentation, image matching, saliency detection, optimal classifier design). Increasingly, information theory concepts are being applied in computer vision applications. Some examples include measures (mutual information, entropy, information gain, etc.), principles (maximum entropy, minimax entropy, cross-entropy, relative-entropy, etc.) and theories (such as rate distortion theory).

Entropy is a metric for how chaotic a system is. Using different types of entropy to optimise algorithms, such as decision trees or deep neural networks, has been shown to increase speed and performance since it is significantly more dynamic than other, more inflexible metrics such as accuracy or even mean-squared error. It has been demonstrated that models that are entropy-optimized can navigate the domain of unpredictability with a greater sense of purpose and awareness, and consequently lead to better overall model performance.

The fusion of computer vision with deep learning has produced amazing results for image classification, object recognition, face recognition and many more vision-related tasks. We have also seen quite impressive results in medical image analysis, security and privacy-enabled image and video processing. Furthermore, lightweight machine learning and deep learning models for edge devices have opened another opportunity to embed the learning modules into small edge devices including mobile phones, surveillance cameras, wrist watches, etc. to provide secure data acquisition and transmission. With the re-enforcement of privacy laws around the world the research community needs to develop solution-by-designs for vison-based tasks that make use of deep learning methods, including the generations and use of privacy-aware training data.

This Special Issue aims to publish cutting edge research in privacy and security enabled solutions to different vision related tasks such as recognition, tracking, autonomous driving, medical image analysis and classification. The Special Issue will accept unpublished original papers and comprehensive reviews focused (but not limited to) on the following research areas

  • Entropy-based object recognition;
  • Spatial-entropy-based computer vision models;
  • Mathematical advancement in deep learning models;
  • Light weight deep learning models for edge devices/resource constrained devices;
  • Privacy aware computer vision solutions;
  • Visual data security;
  • Identification and mitigation techniques for cyber-attacks on image and video data;
  • Deep learning methods for image style transfer;
  • Deep learning methods for image segmentation;
  • Deep learning methods for object detection and classification;
  • Virtual reality applications;
  • Immersive technology;
  • Application of deep learning methods for human computer interaction;
  • Smart cities.

Dr. Nadia Kanwal
Dr. Mohammad Samar Ansari
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • image segmentation
  • video image analysis
  • medical image analysis
  • cyber attacks on video data
  • image style transfer
  • visual data security
  • visual data privacy
  • entropy
  • information gain

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 2235 KiB  
Article
Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
by Qinglang Guo, Yong Liao, Zhe Li and Shenglin Liang
Entropy 2023, 25(10), 1421; https://doi.org/10.3390/e25101421 - 07 Oct 2023
Cited by 1 | Viewed by 1320
Abstract
The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, [...] Read more.
The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features’ means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

26 pages, 2160 KiB  
Article
Enhancing Visual Feedback Control through Early Fusion Deep Learning
by Adrian-Paul Botezatu, Lavinia-Eugenia Ferariu and Adrian Burlacu
Entropy 2023, 25(10), 1378; https://doi.org/10.3390/e25101378 - 25 Sep 2023
Viewed by 684
Abstract
A visual servoing system is a type of control system used in robotics that employs visual feedback to guide the movement of a robot or a camera to achieve a desired task. This problem is addressed using deep models that receive a visual [...] Read more.
A visual servoing system is a type of control system used in robotics that employs visual feedback to guide the movement of a robot or a camera to achieve a desired task. This problem is addressed using deep models that receive a visual representation of the current and desired scene, to compute the control input. The focus is on early fusion, which consists of using additional information integrated into the neural input array. In this context, we discuss how ready-to-use information can be directly obtained from the current and desired scenes, to facilitate the learning process. Inspired by some of the most effective traditional visual servoing techniques, we introduce early fusion based on image moments and provide an extensive analysis of approaches based on image moments, region-based segmentation, and feature points. These techniques are applied stand-alone or in combination, to allow obtaining maps with different levels of detail. The role of the extra maps is experimentally investigated for scenes with different layouts. The results show that early fusion facilitates a more accurate approximation of the linear and angular camera velocities, in order to control the movement of a 6-degree-of-freedom robot from a current configuration to a desired one. The best results were obtained for the extra maps providing details of low and medium levels. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

19 pages, 15710 KiB  
Article
Adaptable 2D to 3D Stereo Vision Image Conversion Based on a Deep Convolutional Neural Network and Fast Inpaint Algorithm
by Tomasz Hachaj
Entropy 2023, 25(8), 1212; https://doi.org/10.3390/e25081212 - 15 Aug 2023
Viewed by 2364
Abstract
Algorithms for converting 2D to 3D are gaining importance following the hiatus brought about by the discontinuation of 3D TV production; this is due to the high availability and popularity of virtual reality systems that use stereo vision. In this paper, several depth [...] Read more.
Algorithms for converting 2D to 3D are gaining importance following the hiatus brought about by the discontinuation of 3D TV production; this is due to the high availability and popularity of virtual reality systems that use stereo vision. In this paper, several depth image-based rendering (DIBR) approaches using state-of-the-art single-frame depth generation neural networks and inpaint algorithms are proposed and validated, including a novel very fast inpaint (FAST). FAST significantly exceeds the speed of currently used inpaint algorithms by reducing computational complexity, without degrading the quality of the resulting image. The role of the inpaint algorithm is to fill in missing pixels in the stereo pair estimated by DIBR. Missing estimated pixels appear at the boundaries of areas that differ significantly in their estimated distance from the observer. In addition, we propose parameterizing DIBR using a singular, easy-to-interpret adaptable parameter that can be adjusted online according to the preferences of the user who views the visualization. This single parameter governs both the camera parameters and the maximum binocular disparity. The proposed solutions are also compared with a fully automatic 2D to 3D mapping solution. The algorithm proposed in this work, which features intuitive disparity steering, the foundational deep neural network MiDaS, and the FAST inpaint algorithm, received considerable acclaim from evaluators. The mean absolute error of the proposed solution does not contain statistically significant differences from state-of-the-art approaches like Deep3D and other DIBR-based approaches using different inpaint functions. Since both the source codes and the generated videos are available for download, all experiments can be reproduced, and one can apply our algorithm to any selected video or single image to convert it. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

17 pages, 3824 KiB  
Article
Distance Matters: A Distance-Aware Medical Image Segmentation Algorithm
by Yuncong Feng, Yeming Cong, Shuaijie Xing, Hairui Wang, Cuixing Zhao, Xiaoli Zhang and Qingan Yao
Entropy 2023, 25(8), 1169; https://doi.org/10.3390/e25081169 - 05 Aug 2023
Viewed by 1084
Abstract
The transformer-based U-Net network structure has gained popularity in the field of medical image segmentation. However, most networks overlook the impact of the distance between each patch on the encoding process. This paper proposes a novel GC-TransUnet for medical image segmentation. The key [...] Read more.
The transformer-based U-Net network structure has gained popularity in the field of medical image segmentation. However, most networks overlook the impact of the distance between each patch on the encoding process. This paper proposes a novel GC-TransUnet for medical image segmentation. The key innovation is that it takes into account the relationships between patch blocks based on their distances, optimizing the encoding process in traditional transformer networks. This optimization results in improved encoding efficiency and reduced computational costs. Moreover, the proposed GC-TransUnet is combined with U-Net to accomplish the segmentation task. In the encoder part, the traditional vision transformer is replaced by the global context vision transformer (GC-VIT), eliminating the need for the CNN network while retaining skip connections for subsequent decoders. Experimental results demonstrate that the proposed algorithm achieves superior segmentation results compared to other algorithms when applied to medical images. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

23 pages, 1644 KiB  
Article
Contour Information-Guided Multi-Scale Feature Detection Method for Visible-Infrared Pedestrian Detection
by Xiaoyu Xu, Weida Zhan, Depeng Zhu, Yichun Jiang, Yu Chen and Jinxin Guo
Entropy 2023, 25(7), 1022; https://doi.org/10.3390/e25071022 - 04 Jul 2023
Cited by 1 | Viewed by 971
Abstract
Infrared pedestrian target detection is affected by factors such as the low resolution and contrast of infrared pedestrian images, as well as the complexity of the background and the presence of multiple targets occluding each other, resulting in indistinct target features. To address [...] Read more.
Infrared pedestrian target detection is affected by factors such as the low resolution and contrast of infrared pedestrian images, as well as the complexity of the background and the presence of multiple targets occluding each other, resulting in indistinct target features. To address these issues, this paper proposes a method to enhance the accuracy of pedestrian target detection by employing contour information to guide multi-scale feature detection. This involves analyzing the shapes and edges of the targets in infrared images at different scales to more accurately identify and differentiate them from the background and other targets. First, we propose a preprocessing method to suppress background interference and extract color information from visible images. Second, we propose an information fusion residual block combining a U-shaped structure and residual connection to form a feature extraction network. Then, we propose an attention mechanism based on a contour information-guided approach to guide the network to extract the depth features of pedestrian targets. Finally, we use the clustering method of mIoU to generate anchor frame sizes applicable to the KAIST pedestrian dataset and propose a hybrid loss function to enhance the network’s adaptability to pedestrian targets. The extensive experimental results show that the method proposed in this paper outperforms other comparative algorithms in pedestrian detection, proving its superiority. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

16 pages, 2005 KiB  
Article
Partition-Based Point Cloud Completion Network with Density Refinement
by Jianxin Li, Guannan Si, Xinyu Liang, Zhaoliang An, Pengxin Tian and Fengyu Zhou
Entropy 2023, 25(7), 1018; https://doi.org/10.3390/e25071018 - 02 Jul 2023
Viewed by 1262
Abstract
In this paper, we propose a novel method for point cloud complementation called PADPNet. Our approach uses a combination of global and local information to infer missing elements in the point cloud. We achieve this by dividing the input point cloud into uniform [...] Read more.
In this paper, we propose a novel method for point cloud complementation called PADPNet. Our approach uses a combination of global and local information to infer missing elements in the point cloud. We achieve this by dividing the input point cloud into uniform local regions, called perceptual fields, which are abstractly understood as special convolution kernels. The set of point clouds in each local region is represented as a feature vector and transformed into N uniform perceptual fields as the input to our transformer model. We also designed a geometric density-aware block to better exploit the inductive bias of the point cloud’s 3D geometric structure. Our method preserves sharp edges and detailed structures that are often lost in voxel-based or point-based approaches. Experimental results demonstrate that our approach outperforms other methods in reducing the ambiguity of output results. Our proposed method has important applications in 3D computer vision and can efficiently recover complete 3D object shapes from missing point clouds. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

18 pages, 1882 KiB  
Article
Unsupervised Low-Light Image Enhancement Based on Generative Adversarial Network
by Wenshuo Yu, Liquan Zhao and Tie Zhong
Entropy 2023, 25(6), 932; https://doi.org/10.3390/e25060932 - 13 Jun 2023
Cited by 1 | Viewed by 1745
Abstract
Low-light image enhancement aims to improve the perceptual quality of images captured under low-light conditions. This paper proposes a novel generative adversarial network to enhance low-light image quality. Firstly, it designs a generator consisting of residual modules with hybrid attention modules and parallel [...] Read more.
Low-light image enhancement aims to improve the perceptual quality of images captured under low-light conditions. This paper proposes a novel generative adversarial network to enhance low-light image quality. Firstly, it designs a generator consisting of residual modules with hybrid attention modules and parallel dilated convolution modules. The residual module is designed to prevent gradient explosion during training and to avoid feature information loss. The hybrid attention module is designed to make the network pay more attention to useful features. A parallel dilated convolution module is designed to increase the receptive field and capture multi-scale information. Additionally, a skip connection is utilized to fuse shallow features with deep features to extract more effective features. Secondly, a discriminator is designed to improve the discrimination ability. Finally, an improved loss function is proposed by incorporating pixel loss to effectively recover detailed information. The proposed method demonstrates superior performance in enhancing low-light images compared to seven other methods. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

23 pages, 8161 KiB  
Article
Regional Time-Series Coding Network and Multi-View Image Generation Network for Short-Time Gait Recognition
by Wenhao Sun, Guangda Lu, Zhuangzhuang Zhao, Tinghang Guo, Zhuanping Qin and Yu Han
Entropy 2023, 25(6), 837; https://doi.org/10.3390/e25060837 - 23 May 2023
Cited by 1 | Viewed by 1163
Abstract
Gait recognition is one of the important research directions of biometric authentication technology. However, in practical applications, the original gait data is often short, and a long and complete gait video is required for successful recognition. Also, the gait images from different views [...] Read more.
Gait recognition is one of the important research directions of biometric authentication technology. However, in practical applications, the original gait data is often short, and a long and complete gait video is required for successful recognition. Also, the gait images from different views have a great influence on the recognition effect. To address the above problems, we designed a gait data generation network for expanding the cross-view image data required for gait recognition, which provides sufficient data input for feature extraction branching with gait silhouette as the criterion. In addition, we propose a gait motion feature extraction network based on regional time-series coding. By independently time-series coding the joint motion data within different regions of the body, and then combining the time-series data features of each region with secondary coding, we obtain the unique motion relationships between regions of the body. Finally, bilinear matrix decomposition pooling is used to fuse spatial silhouette features and motion time-series features to obtain complete gait recognition under shorter time-length video input. We use the OUMVLP-Pose and CASIA-B datasets to validate the silhouette image branching and motion time-series branching, respectively, and employ evaluation metrics such as IS entropy value and Rank-1 accuracy to demonstrate the effectiveness of our design network. Finally, we also collect gait-motion data in the real world and test them in a complete two-branch fusion network. The experimental results show that the network we designed can effectively extract the time-series features of human motion and achieve the expansion of multi-view gait data. The real-world tests also prove that our designed method has good results and feasibility in the problem of gait recognition with short-time video as input data. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

14 pages, 7036 KiB  
Article
Repeated Cross-Scale Structure-Induced Feature Fusion Network for 2D Hand Pose Estimation
by Xin Guan, Huan Shen, Charles Okanda Nyatega and Qiang Li
Entropy 2023, 25(5), 724; https://doi.org/10.3390/e25050724 - 27 Apr 2023
Cited by 1 | Viewed by 1045
Abstract
Recently, the use of convolutional neural networks for hand pose estimation from RGB images has dramatically improved. However, self-occluded keypoint inference in hand pose estimation is still a challenging task. We argue that these occluded keypoints cannot be readily recognized directly from traditional [...] Read more.
Recently, the use of convolutional neural networks for hand pose estimation from RGB images has dramatically improved. However, self-occluded keypoint inference in hand pose estimation is still a challenging task. We argue that these occluded keypoints cannot be readily recognized directly from traditional appearance features, and sufficient contextual information among the keypoints is especially needed to induce feature learning. Therefore, we propose a new repeated cross-scale structure-induced feature fusion network to learn about the representations of keypoints with rich information, ’informed’ by the relationships between different abstraction levels of features. Our network consists of two modules: GlobalNet and RegionalNet. GlobalNet roughly locates hand joints based on a new feature pyramid structure by combining higher semantic information and more global spatial scale information. RegionalNet further refines keypoint representation learning via a four-stage cross-scale feature fusion network, which learns shallow appearance features induced by more implicit hand structure information, so that when identifying occluded keypoints, the network can use augmented features to better locate the positions. The experimental results show that our method outperforms the state-of-the-art methods for 2D hand pose estimation on two public datasets, STB and RHD. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

21 pages, 14200 KiB  
Article
Multi-Modality Image Fusion and Object Detection Based on Semantic Information
by Yong Liu, Xin Zhou and Wei Zhong
Entropy 2023, 25(5), 718; https://doi.org/10.3390/e25050718 - 26 Apr 2023
Cited by 1 | Viewed by 1894
Abstract
Infrared and visible image fusion (IVIF) aims to provide informative images by combining complementary information from different sensors. Existing IVIF methods based on deep learning focus on strengthening the network with increasing depth but often ignore the importance of transmission characteristics, resulting in [...] Read more.
Infrared and visible image fusion (IVIF) aims to provide informative images by combining complementary information from different sensors. Existing IVIF methods based on deep learning focus on strengthening the network with increasing depth but often ignore the importance of transmission characteristics, resulting in the degradation of important information. In addition, while many methods use various loss functions or fusion rules to retain complementary features of both modes, the fusion results often retain redundant or even invalid information.In order to accurately extract the effective information from both infrared images and visible light images without omission or redundancy, and to better serve downstream tasks such as target detection with the fused image, we propose a multi-level structure search attention fusion network based on semantic information guidance, which realizes the fusion of infrared and visible images in an end-to-end way. Our network has two main contributions: the use of neural architecture search (NAS) and the newly designed multilevel adaptive attention module (MAAB). These methods enable our network to retain the typical characteristics of the two modes while removing useless information for the detection task in the fusion results. In addition, our loss function and joint training method can establish a reliable relationship between the fusion network and subsequent detection tasks. Extensive experiments on the new dataset (M3FD) show that our fusion method has achieved advanced performance in both subjective and objective evaluations, and the mAP in the object detection task is improved by 0.5% compared to the second-best method (FusionGAN). Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

20 pages, 4679 KiB  
Article
GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model
by Yuan Dai and Weiming Liu
Entropy 2023, 25(4), 587; https://doi.org/10.3390/e25040587 - 29 Mar 2023
Cited by 2 | Viewed by 1784
Abstract
The detection of a fallen person (FPD) is a crucial task in guaranteeing individual safety. Although deep-learning models have shown potential in addressing this challenge, they face several obstacles, such as the inadequate utilization of global contextual information, poor feature extraction, and substantial [...] Read more.
The detection of a fallen person (FPD) is a crucial task in guaranteeing individual safety. Although deep-learning models have shown potential in addressing this challenge, they face several obstacles, such as the inadequate utilization of global contextual information, poor feature extraction, and substantial computational requirements. These limitations have led to low detection accuracy, poor generalization, and slow inference speeds. To overcome these challenges, the present study proposed a new lightweight detection model named Global and Local You-Only-Look-Once Lite (GL-YOLO-Lite), which integrates both global and local contextual information by incorporating transformer and attention modules into the popular object-detection framework YOLOv5. Specifically, a stem module replaced the original inefficient focus module, and rep modules with re-parameterization technology were introduced. Furthermore, a lightweight detection head was developed to reduce the number of redundant channels in the model. Finally, we constructed a large-scale, well-formatted FPD dataset (FPDD). The proposed model employed a binary cross-entropy (BCE) function to calculate the classification and confidence losses. An experimental evaluation of the FPDD and Pascal VOC dataset demonstrated that GL-YOLO-Lite outperformed other state-of-the-art models with significant margins, achieving 2.4–18.9 mean average precision (mAP) on FPDD and 1.8–23.3 on the Pascal VOC dataset. Moreover, GL-YOLO-Lite maintained a real-time processing speed of 56.82 frames per second (FPS) on a Titan Xp and 16.45 FPS on a HiSilicon Kirin 980, demonstrating its effectiveness in real-world scenarios. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

21 pages, 7501 KiB  
Article
Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion
by Lei Wang, Ziming Hu, Quan Kong, Qian Qi and Qing Liao
Entropy 2023, 25(3), 407; https://doi.org/10.3390/e25030407 - 23 Feb 2023
Cited by 5 | Viewed by 1877
Abstract
Infrared and visible image fusion methods based on feature decomposition are able to generate good fused images. However, most of them employ manually designed simple feature fusion strategies in the reconstruction stage, such as addition or concatenation fusion strategies. These strategies do not [...] Read more.
Infrared and visible image fusion methods based on feature decomposition are able to generate good fused images. However, most of them employ manually designed simple feature fusion strategies in the reconstruction stage, such as addition or concatenation fusion strategies. These strategies do not pay attention to the relative importance between different features and thus may suffer from issues such as low-contrast, blurring results or information loss. To address this problem, we designed an adaptive fusion network to synthesize decoupled common structural features and distinct modal features under an attention-based adaptive fusion (AAF) strategy. The AAF module adaptively computes different weights assigned to different features according to their relative importance. Moreover, the structural features from different sources are also synthesized under the AAF strategy before reconstruction, to provide a more entire structure information. More important features are thus paid more attention to automatically and advantageous information contained in these features manifests itself more reasonably in the final fused images. Experiments on several datasets demonstrated an obvious improvement of image fusion quality using our method. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

12 pages, 3224 KiB  
Article
Lightweight Deep Neural Network Embedded with Stochastic Variational Inference Loss Function for Fast Detection of Human Postures
by Feng-Shuo Hsu, Zi-Jun Su, Yamin Kao, Sen-Wei Tsai, Ying-Chao Lin, Po-Hsun Tu, Cihun-Siyong Alex Gong and Chien-Chang Chen
Entropy 2023, 25(2), 336; https://doi.org/10.3390/e25020336 - 11 Feb 2023
Cited by 4 | Viewed by 1392
Abstract
Fusing object detection techniques and stochastic variational inference, we proposed a new scheme for lightweight neural network models, which could simultaneously reduce model sizes and raise the inference speed. This technique was then applied in fast human posture identification. The integer-arithmetic-only algorithm and [...] Read more.
Fusing object detection techniques and stochastic variational inference, we proposed a new scheme for lightweight neural network models, which could simultaneously reduce model sizes and raise the inference speed. This technique was then applied in fast human posture identification. The integer-arithmetic-only algorithm and the feature pyramid network were adopted to reduce the computational complexity in training and to capture features of small objects, respectively. Features of sequential human motion frames (i.e., the centroid coordinates of bounding boxes) were extracted by the self-attention mechanism. With the techniques of Bayesian neural network and stochastic variational inference, human postures could be promptly classified by fast resolving of the Gaussian mixture model for human posture classification. The model took instant centroid features as inputs and indicated possible human postures in the probabilistic maps. Our model had better overall performance than the baseline model ResNet in mean average precision (32.5 vs. 34.6), inference speed (27 vs. 48 milliseconds), and model size (46.2 vs. 227.8 MB). The model could also alert a suspected human falling event about 0.66 s in advance. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Back to TopTop