entropy-logo

Journal Browser

Journal Browser

Special Issue "Deep Learning Models and Applications to Computer Vision"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Signal and Data Analysis".

Deadline for manuscript submissions: 31 July 2023 | Viewed by 4465

Special Issue Editors

School of Computer Science and Mathematics, Keele University, Staffordshire ST5 5GB, UK
Interests: machine learning; computer vision; image processing; visual data; privacy; security; object classification; activity recognition; medical image analysis
Special Issues, Collections and Topics in MDPI journals
Physical, Mathematical and Engineering Sciences, University of Chester, Parkgate Road, Chester CH1 4BJ, UK
Interests: neural networks; deep learning; IoT; smart cities; resource-efficient machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Information theory has proved to be effective for solving many computer-related vision and pattern recognition problems (including, but not limited to, entropy thresholding, feature selection, clustering and segmentation, image matching, saliency detection, optimal classifier design). Increasingly, information theory concepts are being applied in computer vision applications. Some examples include measures (mutual information, entropy, information gain, etc.), principles (maximum entropy, minimax entropy, cross-entropy, relative-entropy, etc.) and theories (such as rate distortion theory).

Entropy is a metric for how chaotic a system is. Using different types of entropy to optimise algorithms, such as decision trees or deep neural networks, has been shown to increase speed and performance since it is significantly more dynamic than other, more inflexible metrics such as accuracy or even mean-squared error. It has been demonstrated that models that are entropy-optimized can navigate the domain of unpredictability with a greater sense of purpose and awareness, and consequently lead to better overall model performance.

The fusion of computer vision with deep learning has produced amazing results for image classification, object recognition, face recognition and many more vision-related tasks. We have also seen quite impressive results in medical image analysis, security and privacy-enabled image and video processing. Furthermore, lightweight machine learning and deep learning models for edge devices have opened another opportunity to embed the learning modules into small edge devices including mobile phones, surveillance cameras, wrist watches, etc. to provide secure data acquisition and transmission. With the re-enforcement of privacy laws around the world the research community needs to develop solution-by-designs for vison-based tasks that make use of deep learning methods, including the generations and use of privacy-aware training data.

This Special Issue aims to publish cutting edge research in privacy and security enabled solutions to different vision related tasks such as recognition, tracking, autonomous driving, medical image analysis and classification. The Special Issue will accept unpublished original papers and comprehensive reviews focused (but not limited to) on the following research areas

  • Entropy-based object recognition;
  • Spatial-entropy-based computer vision models;
  • Mathematical advancement in deep learning models;
  • Light weight deep learning models for edge devices/resource constrained devices;
  • Privacy aware computer vision solutions;
  • Visual data security;
  • Identification and mitigation techniques for cyber-attacks on image and video data;
  • Deep learning methods for image style transfer;
  • Deep learning methods for image segmentation;
  • Deep learning methods for object detection and classification;
  • Virtual reality applications;
  • Immersive technology;
  • Application of deep learning methods for human computer interaction;
  • Smart cities.

Dr. Nadia Kanwal
Dr. Mohammad Samar Ansari
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • image segmentation
  • video image analysis
  • medical image analysis
  • cyber attacks on video data
  • image style transfer
  • visual data security
  • visual data privacy
  • entropy
  • information gain

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Regional Time-Series Coding Network and Multi-View Image Generation Network for Short-Time Gait Recognition
Entropy 2023, 25(6), 837; https://doi.org/10.3390/e25060837 - 23 May 2023
Viewed by 351
Abstract
Gait recognition is one of the important research directions of biometric authentication technology. However, in practical applications, the original gait data is often short, and a long and complete gait video is required for successful recognition. Also, the gait images from different views [...] Read more.
Gait recognition is one of the important research directions of biometric authentication technology. However, in practical applications, the original gait data is often short, and a long and complete gait video is required for successful recognition. Also, the gait images from different views have a great influence on the recognition effect. To address the above problems, we designed a gait data generation network for expanding the cross-view image data required for gait recognition, which provides sufficient data input for feature extraction branching with gait silhouette as the criterion. In addition, we propose a gait motion feature extraction network based on regional time-series coding. By independently time-series coding the joint motion data within different regions of the body, and then combining the time-series data features of each region with secondary coding, we obtain the unique motion relationships between regions of the body. Finally, bilinear matrix decomposition pooling is used to fuse spatial silhouette features and motion time-series features to obtain complete gait recognition under shorter time-length video input. We use the OUMVLP-Pose and CASIA-B datasets to validate the silhouette image branching and motion time-series branching, respectively, and employ evaluation metrics such as IS entropy value and Rank-1 accuracy to demonstrate the effectiveness of our design network. Finally, we also collect gait-motion data in the real world and test them in a complete two-branch fusion network. The experimental results show that the network we designed can effectively extract the time-series features of human motion and achieve the expansion of multi-view gait data. The real-world tests also prove that our designed method has good results and feasibility in the problem of gait recognition with short-time video as input data. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Article
Repeated Cross-Scale Structure-Induced Feature Fusion Network for 2D Hand Pose Estimation
Entropy 2023, 25(5), 724; https://doi.org/10.3390/e25050724 - 27 Apr 2023
Viewed by 454
Abstract
Recently, the use of convolutional neural networks for hand pose estimation from RGB images has dramatically improved. However, self-occluded keypoint inference in hand pose estimation is still a challenging task. We argue that these occluded keypoints cannot be readily recognized directly from traditional [...] Read more.
Recently, the use of convolutional neural networks for hand pose estimation from RGB images has dramatically improved. However, self-occluded keypoint inference in hand pose estimation is still a challenging task. We argue that these occluded keypoints cannot be readily recognized directly from traditional appearance features, and sufficient contextual information among the keypoints is especially needed to induce feature learning. Therefore, we propose a new repeated cross-scale structure-induced feature fusion network to learn about the representations of keypoints with rich information, ’informed’ by the relationships between different abstraction levels of features. Our network consists of two modules: GlobalNet and RegionalNet. GlobalNet roughly locates hand joints based on a new feature pyramid structure by combining higher semantic information and more global spatial scale information. RegionalNet further refines keypoint representation learning via a four-stage cross-scale feature fusion network, which learns shallow appearance features induced by more implicit hand structure information, so that when identifying occluded keypoints, the network can use augmented features to better locate the positions. The experimental results show that our method outperforms the state-of-the-art methods for 2D hand pose estimation on two public datasets, STB and RHD. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Article
Multi-Modality Image Fusion and Object Detection Based on Semantic Information
Entropy 2023, 25(5), 718; https://doi.org/10.3390/e25050718 - 26 Apr 2023
Viewed by 614
Abstract
Infrared and visible image fusion (IVIF) aims to provide informative images by combining complementary information from different sensors. Existing IVIF methods based on deep learning focus on strengthening the network with increasing depth but often ignore the importance of transmission characteristics, resulting in [...] Read more.
Infrared and visible image fusion (IVIF) aims to provide informative images by combining complementary information from different sensors. Existing IVIF methods based on deep learning focus on strengthening the network with increasing depth but often ignore the importance of transmission characteristics, resulting in the degradation of important information. In addition, while many methods use various loss functions or fusion rules to retain complementary features of both modes, the fusion results often retain redundant or even invalid information.In order to accurately extract the effective information from both infrared images and visible light images without omission or redundancy, and to better serve downstream tasks such as target detection with the fused image, we propose a multi-level structure search attention fusion network based on semantic information guidance, which realizes the fusion of infrared and visible images in an end-to-end way. Our network has two main contributions: the use of neural architecture search (NAS) and the newly designed multilevel adaptive attention module (MAAB). These methods enable our network to retain the typical characteristics of the two modes while removing useless information for the detection task in the fusion results. In addition, our loss function and joint training method can establish a reliable relationship between the fusion network and subsequent detection tasks. Extensive experiments on the new dataset (M3FD) show that our fusion method has achieved advanced performance in both subjective and objective evaluations, and the mAP in the object detection task is improved by 0.5% compared to the second-best method (FusionGAN). Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Article
GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model
Entropy 2023, 25(4), 587; https://doi.org/10.3390/e25040587 - 29 Mar 2023
Viewed by 621
Abstract
The detection of a fallen person (FPD) is a crucial task in guaranteeing individual safety. Although deep-learning models have shown potential in addressing this challenge, they face several obstacles, such as the inadequate utilization of global contextual information, poor feature extraction, and substantial [...] Read more.
The detection of a fallen person (FPD) is a crucial task in guaranteeing individual safety. Although deep-learning models have shown potential in addressing this challenge, they face several obstacles, such as the inadequate utilization of global contextual information, poor feature extraction, and substantial computational requirements. These limitations have led to low detection accuracy, poor generalization, and slow inference speeds. To overcome these challenges, the present study proposed a new lightweight detection model named Global and Local You-Only-Look-Once Lite (GL-YOLO-Lite), which integrates both global and local contextual information by incorporating transformer and attention modules into the popular object-detection framework YOLOv5. Specifically, a stem module replaced the original inefficient focus module, and rep modules with re-parameterization technology were introduced. Furthermore, a lightweight detection head was developed to reduce the number of redundant channels in the model. Finally, we constructed a large-scale, well-formatted FPD dataset (FPDD). The proposed model employed a binary cross-entropy (BCE) function to calculate the classification and confidence losses. An experimental evaluation of the FPDD and Pascal VOC dataset demonstrated that GL-YOLO-Lite outperformed other state-of-the-art models with significant margins, achieving 2.4–18.9 mean average precision (mAP) on FPDD and 1.8–23.3 on the Pascal VOC dataset. Moreover, GL-YOLO-Lite maintained a real-time processing speed of 56.82 frames per second (FPS) on a Titan Xp and 16.45 FPS on a HiSilicon Kirin 980, demonstrating its effectiveness in real-world scenarios. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Article
Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion
Entropy 2023, 25(3), 407; https://doi.org/10.3390/e25030407 - 23 Feb 2023
Cited by 1 | Viewed by 866
Abstract
Infrared and visible image fusion methods based on feature decomposition are able to generate good fused images. However, most of them employ manually designed simple feature fusion strategies in the reconstruction stage, such as addition or concatenation fusion strategies. These strategies do not [...] Read more.
Infrared and visible image fusion methods based on feature decomposition are able to generate good fused images. However, most of them employ manually designed simple feature fusion strategies in the reconstruction stage, such as addition or concatenation fusion strategies. These strategies do not pay attention to the relative importance between different features and thus may suffer from issues such as low-contrast, blurring results or information loss. To address this problem, we designed an adaptive fusion network to synthesize decoupled common structural features and distinct modal features under an attention-based adaptive fusion (AAF) strategy. The AAF module adaptively computes different weights assigned to different features according to their relative importance. Moreover, the structural features from different sources are also synthesized under the AAF strategy before reconstruction, to provide a more entire structure information. More important features are thus paid more attention to automatically and advantageous information contained in these features manifests itself more reasonably in the final fused images. Experiments on several datasets demonstrated an obvious improvement of image fusion quality using our method. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Article
Lightweight Deep Neural Network Embedded with Stochastic Variational Inference Loss Function for Fast Detection of Human Postures
Entropy 2023, 25(2), 336; https://doi.org/10.3390/e25020336 - 11 Feb 2023
Cited by 2 | Viewed by 773
Abstract
Fusing object detection techniques and stochastic variational inference, we proposed a new scheme for lightweight neural network models, which could simultaneously reduce model sizes and raise the inference speed. This technique was then applied in fast human posture identification. The integer-arithmetic-only algorithm and [...] Read more.
Fusing object detection techniques and stochastic variational inference, we proposed a new scheme for lightweight neural network models, which could simultaneously reduce model sizes and raise the inference speed. This technique was then applied in fast human posture identification. The integer-arithmetic-only algorithm and the feature pyramid network were adopted to reduce the computational complexity in training and to capture features of small objects, respectively. Features of sequential human motion frames (i.e., the centroid coordinates of bounding boxes) were extracted by the self-attention mechanism. With the techniques of Bayesian neural network and stochastic variational inference, human postures could be promptly classified by fast resolving of the Gaussian mixture model for human posture classification. The model took instant centroid features as inputs and indicated possible human postures in the probabilistic maps. Our model had better overall performance than the baseline model ResNet in mean average precision (32.5 vs. 34.6), inference speed (27 vs. 48 milliseconds), and model size (46.2 vs. 227.8 MB). The model could also alert a suspected human falling event about 0.66 s in advance. Full article
(This article belongs to the Special Issue Deep Learning Models and Applications to Computer Vision)
Show Figures

Figure 1

Back to TopTop