Computer Vision and Deep Learning: Trends and Applications

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: closed (30 June 2023) | Viewed by 43195

Special Issue Editor


E-Mail Website
Guest Editor
National Reseach Council of Italy (CNR), ISASI Institute of Applied Sciences & Intelligent Systems, Pozzuoli, Italy
Interests: multimedia signal processing; image processing and understanding; image feature extraction and selection; neural network classifiers; object classification and tracking
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The aim of this Special Issue is to discuss the latest innovations in deep learning technologies applied to computer vision and image processing contexts also from a software development company perspective. The Special Issue will be focused on:

  • No-Code Deep Learning—a way of programming DL applications without having to go through the long and arduous processes of pre-processing, modeling, designing algorithms, collecting new data, retraining, deployment, and more;
  • TinyDL—IoT driven; while large scale machine learning applications exist, their usability is fairly limited. Smaller scale applications are often necessary. It can take time for a web request to send data to a large server for it to be processed by a machine learning algorithm and then sent back;
  • Full-stack Deep Learning—a form of wide spreading of deep learning frameworks; the business need to be able to include deep learning solutions into products has led to the emergence of a large demand for “full-stack deep learning”;
  • General Adversarial Networks (GAN)—a way of producing stronger solutions for implementations such as differentiating between different kinds of images. Generative neural networks produce samples that must be checked by discriminative networks which toss out unwanted generated content;
  • Unsupervised and self-supervised DL—as automation improves, more and more data science solutions are needed without human intervention. We already know from previous techniques that machines cannot learn in a vacuum. They must be able to take new information and analyze that data for the solution that they provide. However, this typically requires human data scientists to feed that information into the system;
  • Reinforcement Learning—where the machine learning system learns from direct experiences with its environment. The environment can use reward/punishment system to assign value to the observations that the ML system sees;
  • Few Shot, One Shot, and Zero Shot Learning—few shot learning focuses on limited data. While this has limitations, it does have various applications in fields such as image classification, facial recognition, and text classification. Likewise, one shot learning uses even fewer data. Zero shot learning is an initially confusing prospect. How can machine learning algorithms function without initial data? Zero shot ML systems observe a subject and use information about that object to predict what classification they may fall into. This is possible for humans.

Dr. Pier Luigi Mazzeo
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • machine learning
  • reinforcement learning
  • unsupervised and self-supervised learning
  • general adversarial networks (GAN)
  • no-code machine learning
  • full-stack deep learning
  • few shot, one shot, and zero shot learning
  • tiny machine learning

Related Special Issue

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 10707 KiB  
Article
Lithium Metal Battery Quality Control via Transformer–CNN Segmentation
by Jerome Quenum, Iryna V. Zenyuk and Daniela Ushizima
J. Imaging 2023, 9(6), 111; https://doi.org/10.3390/jimaging9060111 - 31 May 2023
Cited by 1 | Viewed by 1527
Abstract
Lithium metal battery (LMB) has the potential to be the next-generation battery system because of its high theoretical energy density. However, defects known as dendrites are formed by heterogeneous lithium (Li) plating, which hinders the development and utilization of LMBs. Non-destructive techniques to [...] Read more.
Lithium metal battery (LMB) has the potential to be the next-generation battery system because of its high theoretical energy density. However, defects known as dendrites are formed by heterogeneous lithium (Li) plating, which hinders the development and utilization of LMBs. Non-destructive techniques to observe the dendrite morphology often use X-ray computed tomography (XCT) to provide cross-sectional views. To retrieve three-dimensional structures inside a battery, image segmentation becomes essential to quantitatively analyze XCT images. This work proposes a new semantic segmentation approach using a transformer-based neural network called TransforCNN that is capable of segmenting out dendrites from XCT data. In addition, we compare the performance of the proposed TransforCNN with three other algorithms, U-Net, Y-Net, and E-Net, consisting of an ensemble network model for XCT analysis. Our results show the advantages of using TransforCNN when evaluating over-segmentation metrics, such as mean intersection over union (mIoU) and mean Dice similarity coefficient (mDSC), as well as through several qualitatively comparative visualizations. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

18 pages, 5435 KiB  
Article
Future Prediction of Shuttlecock Trajectory in Badminton Using Player’s Information
by Yuka Nokihara, Ryo Hachiuma, Ryosuke Hori and Hideo Saito
J. Imaging 2023, 9(5), 99; https://doi.org/10.3390/jimaging9050099 - 11 May 2023
Cited by 1 | Viewed by 3252
Abstract
Video analysis has become an essential aspect of net sports, such as badminton. Accurately predicting the future trajectory of balls and shuttlecocks can significantly benefit players by enhancing their performance and enabling them to devise effective game strategies. This paper aims to analyze [...] Read more.
Video analysis has become an essential aspect of net sports, such as badminton. Accurately predicting the future trajectory of balls and shuttlecocks can significantly benefit players by enhancing their performance and enabling them to devise effective game strategies. This paper aims to analyze data to provide players with an advantage in the fast-paced rallies of badminton matches. The paper delves into the innovative task of predicting future shuttlecock trajectories in badminton match videos and presents a method that takes into account both the shuttlecock position and the positions and postures of the players. In the experiments, players were extracted from the match video, their postures were analyzed, and a time-series model was trained. The results indicate that the proposed method improved accuracy by 13% compared to methods that solely used shuttlecock position information as input, and by 8.4% compared to methods that employed both shuttlecock and player position information as input. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

16 pages, 4451 KiB  
Article
Novel Light Convolutional Neural Network for COVID Detection with Watershed Based Region Growing Segmentation
by Hassan Ali Khan, Xueqing Gong, Fenglin Bi and Rashid Ali
J. Imaging 2023, 9(2), 42; https://doi.org/10.3390/jimaging9020042 - 13 Feb 2023
Cited by 3 | Viewed by 1803
Abstract
A rapidly spreading epidemic, COVID-19 had a serious effect on millions and took many lives. Therefore, for individuals with COVID-19, early discovery is essential for halting the infection’s progress. To quickly and accurately diagnose COVID-19, imaging modalities, including computed tomography (CT) scans and [...] Read more.
A rapidly spreading epidemic, COVID-19 had a serious effect on millions and took many lives. Therefore, for individuals with COVID-19, early discovery is essential for halting the infection’s progress. To quickly and accurately diagnose COVID-19, imaging modalities, including computed tomography (CT) scans and chest X-ray radiographs, are frequently employed. The potential of artificial intelligence (AI) approaches further explored the creation of automated and precise COVID-19 detection systems. Scientists widely use deep learning techniques to identify coronavirus infection in lung imaging. In our paper, we developed a novel light CNN model architecture with watershed-based region-growing segmentation on Chest X-rays. Both CT scans and X-ray radiographs were employed along with 5-fold cross-validation. Compared to earlier state-of-the-art models, our model is lighter and outperformed the previous methods by achieving a mean accuracy of 98.8% on X-ray images and 98.6% on CT scans, predicting the rate of 0.99% and 0.97% for PPV (Positive predicted Value) and NPV (Negative predicted Value) rate of 0.98% and 0.99%, respectively. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

22 pages, 10011 KiB  
Article
CNN-Based Classification for Highly Similar Vehicle Model Using Multi-Task Learning
by Donny Avianto, Agus Harjoko and Afiahayati
J. Imaging 2022, 8(11), 293; https://doi.org/10.3390/jimaging8110293 - 22 Oct 2022
Cited by 5 | Viewed by 2574
Abstract
Vehicle make and model classification is crucial to the operation of an intelligent transportation system (ITS). Fine-grained vehicle information such as make and model can help officers uncover cases of traffic violations when license plate information cannot be obtained. Various techniques have been [...] Read more.
Vehicle make and model classification is crucial to the operation of an intelligent transportation system (ITS). Fine-grained vehicle information such as make and model can help officers uncover cases of traffic violations when license plate information cannot be obtained. Various techniques have been developed to perform vehicle make and model classification. However, it is very hard to identify the make and model of vehicles with highly similar visual appearances. The classifier contains a lot of potential for mistakes because the vehicles look very similar but have different models and manufacturers. To solve this problem, a fine-grained classifier based on convolutional neural networks with a multi-task learning approach is proposed in this paper. The proposed method takes a vehicle image as input and extracts features using the VGG-16 architecture. The extracted features will then be sent to two different branches, with one branch being used to classify the vehicle model and the other to classify the vehicle make. The performance of the proposed method was evaluated using the InaV-Dash dataset, which contains an Indonesian vehicle model with a highly similar visual appearance. The experimental results show that the proposed method achieves 98.73% accuracy for vehicle make and 97.69% accuracy for vehicle model. Our study also demonstrates that the proposed method is able to improve the performance of the baseline method on highly similar vehicle classification problems. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

22 pages, 4793 KiB  
Article
Dual Autoencoder Network with Separable Convolutional Layers for Denoising and Deblurring Images
by Elena Solovyeva and Ali Abdullah
J. Imaging 2022, 8(9), 250; https://doi.org/10.3390/jimaging8090250 - 13 Sep 2022
Cited by 3 | Viewed by 3194
Abstract
A dual autoencoder employing separable convolutional layers for image denoising and deblurring is represented. Combining two autoencoders is presented to gain higher accuracy and simultaneously reduce the complexity of neural network parameters by using separable convolutional layers. In the proposed structure of the [...] Read more.
A dual autoencoder employing separable convolutional layers for image denoising and deblurring is represented. Combining two autoencoders is presented to gain higher accuracy and simultaneously reduce the complexity of neural network parameters by using separable convolutional layers. In the proposed structure of the dual autoencoder, the first autoencoder aims to denoise the image, while the second one aims to enhance the quality of the denoised image. The research includes Gaussian noise (Gaussian blur), Poisson noise, speckle noise, and random impulse noise. The advantages of the proposed neural network are the number reduction in the trainable parameters and the increase in the similarity between the denoised or deblurred image and the original one. The similarity is increased by decreasing the main square error and increasing the structural similarity index. The advantages of a dual autoencoder network with separable convolutional layers are demonstrated by a comparison of the proposed network with a convolutional autoencoder and dual convolutional autoencoder. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

24 pages, 2540 KiB  
Article
A Novel Trademark Image Retrieval System Based on Multi-Feature Extraction and Deep Networks
by Sandra Jardim, João António, Carlos Mora and Artur Almeida
J. Imaging 2022, 8(9), 238; https://doi.org/10.3390/jimaging8090238 - 02 Sep 2022
Cited by 7 | Viewed by 2528
Abstract
Graphical Search Engines are conceptually used in many development areas surrounding information retrieval systems that aim to provide a visual representation of results, typically associated with retrieving images relevant to one or more input images. Since the 1990s, efforts have been made to [...] Read more.
Graphical Search Engines are conceptually used in many development areas surrounding information retrieval systems that aim to provide a visual representation of results, typically associated with retrieving images relevant to one or more input images. Since the 1990s, efforts have been made to improve the result quality, be it through improved processing speeds or more efficient graphical processing techniques that generate accurate representations of images for comparison. While many systems achieve timely results by combining high-level features, they still struggle when dealing with large datasets and abstract images. Image datasets regarding industrial property are an example of an hurdle for typical image retrieval systems where the dimensions and characteristics of images make adequate comparison a difficult task. In this paper, we introduce an image retrieval system based on a multi-phase implementation of different deep learning and image processing techniques, designed to deliver highly accurate results regardless of dataset complexity and size. The proposed approach uses image signatures to provide a near exact representation of an image, with abstraction levels that allow the comparison with other signatures as a means to achieve a fully capable image comparison process. To overcome performance disadvantages related to multiple image searches due to the high complexity of image signatures, the proposed system incorporates a parallel processing block responsible for dealing with multi-image search scenarios. The system achieves the image retrieval through the use of a new similarity compound formula that accounts for all components of an image signature. The results shows that the developed approach performs image retrieval with high accuracy, showing that combining multiple image assets allows for more accurate comparisons across a broad spectrum of image typologies. The use of deep convolutional networks for feature extraction as a means of semantically describing more commonly encountered objects allows for the system to perform research with a degree of abstraction. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

21 pages, 4675 KiB  
Article
High-Temporal-Resolution Object Detection and Tracking Using Images and Events
by Zaid El Shair and Samir A. Rawashdeh
J. Imaging 2022, 8(8), 210; https://doi.org/10.3390/jimaging8080210 - 27 Jul 2022
Cited by 5 | Viewed by 3867
Abstract
Event-based vision is an emerging field of computer vision that offers unique properties, such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes, to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based [...] Read more.
Event-based vision is an emerging field of computer vision that offers unique properties, such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes, to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based vision. In this paper, we present a hybrid, high-temporal-resolution object detection and tracking approach that combines learned and classical methods using synchronized images and event data. Off-the-shelf frame-based object detectors are used for initial object detection and classification. Then, event masks, generated per detection, are used to enable inter-frame tracking at varying temporal resolutions using the event data. Detections are associated across time using a simple, low-cost association metric. Moreover, we collect and label a traffic dataset using the hybrid sensor DAVIS 240c. This dataset is utilized for quantitative evaluation using state-of-the-art detection and tracking metrics. We provide ground truth bounding boxes and object IDs for each vehicle annotation. Further, we generate high-temporal-resolution ground truth data to analyze tracking performance at different temporal rates. Our approach shows promising results, with minimal performance deterioration at higher temporal resolutions (48–384 Hz) when compared with the baseline frame-based performance at 24 Hz. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

12 pages, 6496 KiB  
Article
Indoor Scene Recognition via Object Detection and TF-IDF
by Edvard Heikel and Leonardo Espinosa-Leal
J. Imaging 2022, 8(8), 209; https://doi.org/10.3390/jimaging8080209 - 26 Jul 2022
Cited by 7 | Viewed by 3515
Abstract
Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in [...] Read more.
Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

17 pages, 3670 KiB  
Article
Deep Learning-Based Automatic Detection of Ships: An Experimental Study Using Satellite Images
by Krishna Patel, Chintan Bhatt and Pier Luigi Mazzeo
J. Imaging 2022, 8(7), 182; https://doi.org/10.3390/jimaging8070182 - 28 Jun 2022
Cited by 29 | Viewed by 5323
Abstract
The remote sensing surveillance of maritime areas represents an essential task for both security and environmental reasons. Recently, learning strategies belonging to the field of machine learning (ML) have become a niche of interest for the community of remote sensing. Specifically, a major [...] Read more.
The remote sensing surveillance of maritime areas represents an essential task for both security and environmental reasons. Recently, learning strategies belonging to the field of machine learning (ML) have become a niche of interest for the community of remote sensing. Specifically, a major challenge is the automatic classification of ships from satellite imagery, which is needed for traffic surveillance systems, the protection of illegal fisheries, control systems of oil discharge, and the monitoring of sea pollution. Deep learning (DL) is a branch of ML that has emerged in the last few years as a result of advancements in digital technology and data availability. DL has shown capacity and efficacy in tackling difficult learning tasks that were previously intractable. Specifically, DL methods, such as convolutional neural networks (CNNs), have been reported to be efficient in image detection and recognition applications. In this paper, we focused on the development of an automatic ship detection (ASD) approach by using DL methods for assessing the Airbus ship dataset (composed of about 40 K satellite images). The paper explores and analyzes the distinct variations of the YOLO algorithm for the detection of ships from satellite images. A comparison of different versions of YOLO algorithms for ship detection, such as YOLOv3, YOLOv4, and YOLOv5, is presented, after training them on a personal computer with a large dataset of satellite images of the Airbus Ship Challenge and Shipsnet. The differences between the algorithms could be observed on the personal computer. We have confirmed that these algorithms can be used for effective ship detection from satellite images. The conclusion drawn from the conducted research is that the YOLOv5 object detection algorithm outperforms the other versions of the YOLO algorithm, i.e., YOLOv4 and YOLOv3 in terms accuracy of 99% for YOLOv5 compared to 98% and 97% respectively for YOLOv4 and YOLOv3. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

11 pages, 2393 KiB  
Article
A Deep-Learning Model for Real-Time Red Palm Weevil Detection and Localization
by Majed Alsanea, Shabana Habib, Noreen Fayyaz Khan, Mohammed F. Alsharekh, Muhammad Islam and Sheroz Khan
J. Imaging 2022, 8(6), 170; https://doi.org/10.3390/jimaging8060170 - 15 Jun 2022
Cited by 11 | Viewed by 2867
Abstract
Background and motivation: Over the last two decades, particularly in the Middle East, Red Palm Weevils (RPW, Rhynchophorus ferruginous) have proved to be the most destructive pest of palm trees across the globe. Problem: The RPW has caused considerable damage to various palm [...] Read more.
Background and motivation: Over the last two decades, particularly in the Middle East, Red Palm Weevils (RPW, Rhynchophorus ferruginous) have proved to be the most destructive pest of palm trees across the globe. Problem: The RPW has caused considerable damage to various palm species. The early identification of the RPW is a challenging task for good date production since the identification will prevent palm trees from being affected by the RPW. This is one of the reasons why the use of advanced technology will help in the prevention of the spread of the RPW on palm trees. Many researchers have worked on finding an accurate technique for the identification, localization and classification of the RPW pest. This study aimed to develop a model that can use a deep-learning approach to identify and discriminate between the RPW and other insects living in palm tree habitats using a deep-learning technique. Researchers had not applied deep learning to the classification of red palm weevils previously. Methods: In this study, a region-based convolutional neural network (R-CNN) algorithm was used to detect the location of the RPW in an image by building bounding boxes around the image. A CNN algorithm was applied in order to extract the features to enclose with the bounding boxes—the selection target. In addition, these features were passed through the classification and regression layers to determine the presence of the RPW with a high degree of accuracy and to locate its coordinates. Results: As a result of the developed model, the RPW can be quickly detected with a high accuracy of 100% in infested palm trees at an early stage. In the Al-Qassim region, which has thousands of farms, the model sets the path for deploying an efficient, low-cost RPW detection and classification technology for palm trees. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

13 pages, 7980 KiB  
Article
A Generic Framework for Depth Reconstruction Enhancement
by Hendrik Sommerhoff and Andreas Kolb
J. Imaging 2022, 8(5), 138; https://doi.org/10.3390/jimaging8050138 - 16 May 2022
Viewed by 1998
Abstract
We propose a generic depth-refinement scheme based on GeoNet, a recent deep-learning approach for predicting depth and normals from a single color image, and extend it to be applied to any depth reconstruction task such as super resolution, denoising and deblurring, as long [...] Read more.
We propose a generic depth-refinement scheme based on GeoNet, a recent deep-learning approach for predicting depth and normals from a single color image, and extend it to be applied to any depth reconstruction task such as super resolution, denoising and deblurring, as long as the task includes a depth output. Our approach utilizes a tight coupling of the inherent geometric relationship between depth and normal maps to guide a neural network. In contrast to GeoNet, we do not utilize the original input information to the backbone reconstruction task, which leads to a generic application of our network structure. Our approach first learns a high-quality normal map from the depth image generated by the backbone method and then uses this normal map to refine the initial depth image jointly with the learned normal map. This is motivated by the fact that it is hard for neural networks to learn direct mapping between depth and normal maps without explicit geometric constraints. We show the efficiency of our method on the exemplary inverse depth-image reconstruction tasks of denoising, super resolution and removal of motion blur. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

15 pages, 15932 KiB  
Article
Extraction and Calculation of Roadway Area from Satellite Images Using Improved Deep Learning Model and Post-Processing
by Varun Yerram, Hiroyuki Takeshita, Yuji Iwahori, Yoshitsugu Hayashi, M. K. Bhuyan, Shinji Fukui, Boonserm Kijsirikul and Aili Wang
J. Imaging 2022, 8(5), 124; https://doi.org/10.3390/jimaging8050124 - 25 Apr 2022
Cited by 6 | Viewed by 3367
Abstract
Roadway area calculation is a novel problem in remote sensing and urban planning. This paper models this problem as a two-step problem, roadway extraction, and area calculation. Roadway extraction from satellite images is a problem that has been tackled many times before. This [...] Read more.
Roadway area calculation is a novel problem in remote sensing and urban planning. This paper models this problem as a two-step problem, roadway extraction, and area calculation. Roadway extraction from satellite images is a problem that has been tackled many times before. This paper proposes a method using pixel resolution to calculate the area of the roads covered in satellite images. The proposed approach uses novel U-net and Resnet architectures called U-net++ and ResNeXt. The state-of-the-art model is combined with the proposed efficient post-processing approach to improve the overlap with ground truth labels. The performance of the proposed road extraction algorithm is evaluated on the Massachusetts dataset and it is shown that the proposed approach outperforms the existing solutions which use models from the U-net family. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

18 pages, 3502 KiB  
Article
Discriminative Shape Feature Pooling in Deep Neural Networks
by Gang Hu, Chahna Dixit and Guanqiu Qi
J. Imaging 2022, 8(5), 118; https://doi.org/10.3390/jimaging8050118 - 20 Apr 2022
Viewed by 2111
Abstract
Although deep learning approaches are able to generate generic image features from massive labeled data, discriminative handcrafted features still have advantages in providing explicit domain knowledge and reflecting intuitive visual understanding. Much of the existing research focuses on integrating both handcrafted features and [...] Read more.
Although deep learning approaches are able to generate generic image features from massive labeled data, discriminative handcrafted features still have advantages in providing explicit domain knowledge and reflecting intuitive visual understanding. Much of the existing research focuses on integrating both handcrafted features and deep networks to leverage the benefits. However, the issues of parameter quality have not been effectively solved in existing applications of handcrafted features in deep networks. In this research, we propose a method that enriches deep network features by utilizing the injected discriminative shape features (generic edge tokens and curve partitioning points) to adjust the network’s internal parameter update process. Thus, the modified neural networks are trained under the guidance of specific domain knowledge, and they are able to generate image representations that incorporate the benefits from both handcrafted and deep learned features. The comparative experiments were performed on several benchmark datasets. The experimental results confirmed our method works well on both large and small training datasets. Additionally, compared with existing models using either handcrafted features or deep network representations, our method not only improves the corresponding performance, but also reduces the computational costs. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

16 pages, 1428 KiB  
Article
Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
by Martino Trapanotto, Loris Nanni, Sheryl Brahnam and Xiang Guo
J. Imaging 2022, 8(4), 96; https://doi.org/10.3390/jimaging8040096 - 01 Apr 2022
Cited by 6 | Viewed by 3179
Abstract
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines [...] Read more.
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)
Show Figures

Figure 1

Back to TopTop