sensors-logo

Journal Browser

Journal Browser

Neural Networks and Deep Learning in Image Sensing

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (31 December 2021) | Viewed by 43636

Special Issue Editors


E-Mail Website
Guest Editor
Division of Computer Engineering, Dongseo University, 47 Jurye Road, Sasang Gu, Busan 47011, Republic of Korea
Interests: image deconvolution/restoration; color image compression; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor
Machine Learning/Deep Learning Research Labs, Department of Computer Engineering, Dongseo University, Busan 47011, Korea
Interests: multiagent reinforcement learning; few shot learning/model-agnostic meta-learning; adversarial machine learning; generative adversarial network
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep-learning-based image sensing is used in a variety of applications today. Major smartphone makers, for example, are adding new deep-learning-based technologies to smartphone camera applications, such as face recognition, panoramic photography, depth/geometry detection, high-quality magnification, and detection.

More innovative and enchanting functions based on deep learning are expected to be included in future imaging systems. In addition to the trend of increasing the functions in the imaging system, another important trend for smartphone cameras is the increase in resolution. The increased resolution of smartphone cameras reduces the size of the pixel sensors, which again reduces the amount of light sensed in each pixel. This reduces not only the dynamic range of the sensed image but also the signal-to-noise ratio (SNR) and makes it difficult to take clear pictures at night. Therefore, the importance of developing high-performance image signal processing (ISP) techniques is increasing.

Many deep-learning-based ISP technologies have recently been developed and successfully applied to image post-processing techniques such as conversion of mobile photos to DSLR-quality photos, automatic night shots, demosaicing, denoising, dehazing, deblurring, super resolution, high dynamic range imaging, digital image stabilization, etc. Furthermore, deep-learning-based ISP technologies have also been successfully applied to images captured by multispectral filter arrays (MSFA) to enhance the resolution and sensitivity by integrating additional information received from spectrum-wide bands. Such ISP technologies can be employed in various applications, such as military, surveillance, remote sensing, and scientific imaging applications.

The goal of this Special Issue is to highlight and invite state-of-the-art research papers related to deep-learning-based image processing and computer vision techniques in image sensing. Topics include but are not limited to:

Deep-learning-based image signal processing techniques:
  - Deep learning-based demosaicing;
  - Deep learning-based superresolution;
  - Deep learning-based deblurring/denoising;
  - Deep learning-based dehazing, inpainting, compression;
  - Other advanced image signal processing (ISP) techniques based on deep learning;

Deep learning-based computational photography:
  - Deep learning-based panoramic photography;
  - Deep learning-based generative models for image sensing applications;
  - Deep learning-based image/video alignment;
  - Deep learning-based image/video artifact corrections, image/video stabilization;
  - Deep learning-based rendering;
  - Deep learning-based image reconstruction and image/data fusion from data acquired by multispectral sensors;

Deep learning based computer vision algorithms:
  - Deep learning-based depth estimation;
  - Deep learning-based object detection, object tracking, object localization;
  - Deep learning-based scene understanding, three-dimensional analysis;
  - Deep learning-based segmentation, shape detection;
  - Few-shot learning.

Prof. Dr. Sukho Lee
Prof. Dr. Dae-Ki Kang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

13 pages, 2698 KiB  
Article
Noise-Resistant Demosaicing with Deep Image Prior Network and Random RGBW Color Filter Array
by Edwin Kurniawan, Yunjin Park and Sukho Lee
Sensors 2022, 22(5), 1767; https://doi.org/10.3390/s22051767 - 24 Feb 2022
Cited by 3 | Viewed by 2187
Abstract
In this paper, we propose a deep-image-prior-based demosaicing method for a random RGBW color filter array (CFA). The color reconstruction from the random RGBW CFA is performed by the deep image prior network, which uses only the RGBW CFA image as the training [...] Read more.
In this paper, we propose a deep-image-prior-based demosaicing method for a random RGBW color filter array (CFA). The color reconstruction from the random RGBW CFA is performed by the deep image prior network, which uses only the RGBW CFA image as the training data. To our knowledge, this work is a first attempt to reconstruct the color image with a neural network using only a single RGBW CFA in the training. Due to the White pixels in the RGBW CFA, more light is transmitted through the CFA than in the case with the conventional RGB CFA. As the image sensor can detect more light, the signal-to-noise-ratio (SNR) increases and the proposed demosaicing method can reconstruct the color image with a higher visual quality than other existing demosaicking methods, especially in the presence of noise. We propose a loss function that can train the deep image prior (DIP) network to reconstruct the colors from the White pixels as well as from the red, green, and blue pixels in the RGBW CFA. Apart from using the DIP network, no additional complex reconstruction algorithms are required for the demosaicing. The proposed demosaicing method becomes useful in situations when the noise becomes a major problem, for example, in low light conditions. Experimental results show the validity of the proposed method for joint demosaicing and denoising. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

22 pages, 2845 KiB  
Article
Localization and Edge-Based Segmentation of Lumbar Spine Vertebrae to Identify the Deformities Using Deep Learning Models
by Malaika Mushtaq, Muhammad Usman Akram, Norah Saleh Alghamdi, Joddat Fatima and Rao Farhat Masood
Sensors 2022, 22(4), 1547; https://doi.org/10.3390/s22041547 - 17 Feb 2022
Cited by 25 | Viewed by 5679
Abstract
The lumbar spine plays a very important role in our load transfer and mobility. Vertebrae localization and segmentation are useful in detecting spinal deformities and fractures. Understanding of automated medical imagery is of main importance to help doctors in handling the time-consuming manual [...] Read more.
The lumbar spine plays a very important role in our load transfer and mobility. Vertebrae localization and segmentation are useful in detecting spinal deformities and fractures. Understanding of automated medical imagery is of main importance to help doctors in handling the time-consuming manual or semi-manual diagnosis. Our paper presents the methods that will help clinicians to grade the severity of the disease with confidence, as the current manual diagnosis by different doctors has dissimilarity and variations in the analysis of diseases. In this paper we discuss the lumbar spine localization and segmentation which help for the analysis of lumbar spine deformities. The lumber spine is localized using YOLOv5 which is the fifth variant of the YOLO family. It is the fastest and the lightest object detector. Mean average precision (mAP) of 0.975 is achieved by YOLOv5. To diagnose the lumbar lordosis, we correlated the angles with region area that is computed from the YOLOv5 centroids and obtained 74.5% accuracy. Cropped images from YOLOv5 bounding boxes are passed through HED U-Net, which is a combination of segmentation and edge detection frameworks, to obtain the segmented vertebrae and its edges. Lumbar lordortic angles (LLAs) and lumbosacral angles (LSAs) are found after detecting the corners of vertebrae using a Harris corner detector with very small mean errors of 0.29° and 0.38°, respectively. This paper compares the different object detectors used to localize the vertebrae, the results of two methods used to diagnose the lumbar deformity, and the results with other researchers. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

23 pages, 33248 KiB  
Article
DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation
by Hatem Ibrahem, Ahmed Salem and Hyun-Soo Kang
Sensors 2022, 22(1), 337; https://doi.org/10.3390/s22010337 - 03 Jan 2022
Cited by 6 | Viewed by 2580
Abstract
We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration [...] Read more.
We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We propose both a deep network, that is, DTS-Net, and a lightweight network, DTS-Net-Lite, for real-time semantic segmentation; these networks employ Xception and MobileNetV2 architectures as the feature extractors, respectively. In addition, we explore the joint semantic segmentation and depth estimation task and demonstrate that the proposed technique can efficiently perform both tasks simultaneously, outperforming state-of-art (SOTA) methods. We train and evaluate the performance of the proposed method on the PASCAL VOC2012, NYUV2, and CITYSCAPES benchmarks. Hence, we obtain high mean intersection over union (mIOU) and mean pixel accuracy (Pix.acc.) values using simple and lightweight convolutional neural network architectures of the developed networks. Notably, the proposed method outperforms SOTA methods that depend on encoder–decoder architectures, although our implementation and computations are far simpler. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

13 pages, 5593 KiB  
Article
Wildfire Smoke Classification Based on Synthetic Images and Pixel- and Feature-Level Domain Adaptation
by Jun Mao, Change Zheng, Jiyan Yin, Ye Tian and Wenbin Cui
Sensors 2021, 21(23), 7785; https://doi.org/10.3390/s21237785 - 23 Nov 2021
Cited by 9 | Viewed by 2253
Abstract
Training a deep learning-based classification model for early wildfire smoke images requires a large amount of rich data. However, due to the episodic nature of fire events, it is difficult to obtain wildfire smoke image data, and most of the samples in public [...] Read more.
Training a deep learning-based classification model for early wildfire smoke images requires a large amount of rich data. However, due to the episodic nature of fire events, it is difficult to obtain wildfire smoke image data, and most of the samples in public datasets suffer from a lack of diversity. To address these issues, a method using synthetic images to train a deep learning classification model for real wildfire smoke was proposed in this paper. Firstly, we constructed a synthetic dataset by simulating a large amount of morphologically rich smoke in 3D modeling software and rendering the virtual smoke against many virtual wildland background images with rich environmental diversity. Secondly, to better use the synthetic data to train a wildfire smoke image classifier, we applied both pixel-level domain adaptation and feature-level domain adaptation. The CycleGAN-based pixel-level domain adaptation method for image translation was employed. On top of this, the feature-level domain adaptation method incorporated ADDA with DeepCORAL was adopted to further reduce the domain shift between the synthetic and real data. The proposed method was evaluated and compared on a test set of real wildfire smoke and achieved an accuracy of 97.39%. The method is applicable to wildfire smoke classification tasks based on RGB single-frame images and would also contribute to training image classification models without sufficient data. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

13 pages, 7653 KiB  
Article
Synthetic Source Universal Domain Adaptation through Contrastive Learning
by Jungchan Cho
Sensors 2021, 21(22), 7539; https://doi.org/10.3390/s21227539 - 12 Nov 2021
Cited by 2 | Viewed by 2212
Abstract
Universal domain adaptation (UDA) is a crucial research topic for efficient deep learning model training using data from various imaging sensors. However, its development is affected by unlabeled target data. Moreover, the nonexistence of prior knowledge of the source and target domain makes [...] Read more.
Universal domain adaptation (UDA) is a crucial research topic for efficient deep learning model training using data from various imaging sensors. However, its development is affected by unlabeled target data. Moreover, the nonexistence of prior knowledge of the source and target domain makes it more challenging for UDA to train models. I hypothesize that the degradation of trained models in the target domain is caused by the lack of direct training loss to improve the discriminative power of the target domain data. As a result, the target data adapted to the source representations is biased toward the source domain. I found that the degradation was more pronounced when I used synthetic data for the source domain and real data for the target domain. In this paper, I propose a UDA method with target domain contrastive learning. The proposed method enables models to leverage synthetic data for the source domain and train the discriminativeness of target features in an unsupervised manner. In addition, the target domain feature extraction network is shared with the source domain classification task, preventing unnecessary computational growth. Extensive experimental results on VisDa-2017 and MNIST to SVHN demonstrated that the proposed method significantly outperforms the baseline by 2.7% and 5.1%, respectively. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

24 pages, 9481 KiB  
Article
Multivariate Analysis of Concrete Image Using Thermography and Edge Detection
by Bubryur Kim, Se-Woon Choi, Gang Hu, Dong-Eun Lee and Ronnie O. Serfa Juan
Sensors 2021, 21(21), 7396; https://doi.org/10.3390/s21217396 - 07 Nov 2021
Cited by 7 | Viewed by 2701
Abstract
With the growing demand for structural health monitoring system applications, data imaging is an ideal method for performing regular routine maintenance inspections. Image analysis can provide invaluable information about the health conditions of a structure’s existing infrastructure by recording and analyzing exterior damages. [...] Read more.
With the growing demand for structural health monitoring system applications, data imaging is an ideal method for performing regular routine maintenance inspections. Image analysis can provide invaluable information about the health conditions of a structure’s existing infrastructure by recording and analyzing exterior damages. Therefore, it is desirable to have an automated approach that reports defects on images reliably and robustly. This paper presents a multivariate analysis approach for images, specifically for assessing substantial damage (such as cracks). The image analysis provides graph representations that are related to the image, such as the histogram. In addition, image-processing techniques such as grayscale are also implemented, which enhance the object’s information present in the image. In addition, this study uses image segmentation and a neural network, for transforming an image to analyze it more easily and as a classifier, respectively. Initially, each concrete structure image is preprocessed to highlight the crack. A neural network is used to calculate and categorize the visual characteristics of each region, and it shows an accuracy for classification of 98%. Experimental results show that thermal image extraction yields better histogram and cumulative distribution function features. The system can promote the development of various thermal image applications, such as nonphysical visual recognition and fault detection analysis. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Graphical abstract

24 pages, 7480 KiB  
Article
Depth Completion and Super-Resolution with Arbitrary Scale Factors for Indoor Scenes
by Anh Minh Truong, Wilfried Philips and Peter Veelaert
Sensors 2021, 21(14), 4892; https://doi.org/10.3390/s21144892 - 18 Jul 2021
Cited by 1 | Viewed by 2916
Abstract
Depth sensing has improved rapidly in recent years, which allows for structural information to be utilized in various applications, such as virtual reality, scene and object recognition, view synthesis, and 3D reconstruction. Due to the limitations of the current generation of depth sensors, [...] Read more.
Depth sensing has improved rapidly in recent years, which allows for structural information to be utilized in various applications, such as virtual reality, scene and object recognition, view synthesis, and 3D reconstruction. Due to the limitations of the current generation of depth sensors, the resolution of depth maps is often still much lower than the resolution of color images. This hinders applications, such as view synthesis or 3D reconstruction, from providing high-quality results. Therefore, super-resolution, which allows for the upscaling of depth maps while still retaining sharpness, has recently drawn much attention in the deep learning community. However, state-of-the-art deep learning methods are typically designed and trained to handle a fixed set of integer-scale factors. Moreover, the raw depth map collected by the depth sensor usually has many depth data missing or misestimated values along the edges and corners of observed objects. In this work, we propose a novel deep learning network for both depth completion and depth super-resolution with arbitrary scale factors. The experimental results on the Middlebury stereo, NYUv2, and Matterport3D datasets demonstrate that the proposed method can outperform state-of-the-art methods. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

13 pages, 25446 KiB  
Article
Fully Learnable Model for Task-Driven Image Compressed Sensing
by Bowen Zheng, Jianping Zhang, Guiling Sun and Xiangnan Ren
Sensors 2021, 21(14), 4662; https://doi.org/10.3390/s21144662 - 07 Jul 2021
Cited by 2 | Viewed by 1871
Abstract
This study primarily investigates image sensing at low sampling rates with convolutional neural networks (CNN) for specific applications. To improve the image acquisition efficiency in energy-limited systems, this study, inspired by compressed sensing, proposes a fully learnable model for task-driven image-compressed sensing (FLCS). [...] Read more.
This study primarily investigates image sensing at low sampling rates with convolutional neural networks (CNN) for specific applications. To improve the image acquisition efficiency in energy-limited systems, this study, inspired by compressed sensing, proposes a fully learnable model for task-driven image-compressed sensing (FLCS). The FLCS, based on Deep Convolution Generative Adversarial Networks (DCGAN) and Variational Auto-encoder (VAE), divides the image-compressed sensing model into three learnable parts, i.e., the Sampler, the Solver and the Rebuilder. To be specific, a measurement matrix suitable for a type of image is obtained by training the Sampler. The Solver calculates the image’s low-dimensional representation with the measurements. The Rebuilder learns a mapping from the low-dimensional latent space to the image space. All the mentioned could be trained jointly or individually for a range of application scenarios. The pre-trained FLCS reconstructs images with few iterations for task-driven compressed sensing. As indicated from the experimental results, compared with existing approaches, the proposed method could significantly improve the reconstructed images’ quality while decreasing the running time. This study is of great significance for the application of image-compressed sensing at low sampling rates. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

27 pages, 4237 KiB  
Article
An Instance Segmentation and Clustering Model for Energy Audit Assessments in Built Environments: A Multi-Stage Approach
by Youness Arjoune, Sai Peri, Niroop Sugunaraj, Avhishek Biswas, Debanjan Sadhukhan and Prakash Ranganathan
Sensors 2021, 21(13), 4375; https://doi.org/10.3390/s21134375 - 26 Jun 2021
Cited by 5 | Viewed by 2242
Abstract
Heat loss quantification (HLQ) is an essential step in improving a building’s thermal performance and optimizing its energy usage. While this problem is well-studied in the literature, most of the existing studies are either qualitative or minimally driven quantitative studies that rely on [...] Read more.
Heat loss quantification (HLQ) is an essential step in improving a building’s thermal performance and optimizing its energy usage. While this problem is well-studied in the literature, most of the existing studies are either qualitative or minimally driven quantitative studies that rely on localized building envelope points and are, thus, not suitable for automated solutions in energy audit applications. This research work is an attempt to fill this gap of knowledge by utilizing intensive thermal data (on the order of 100,000 plus images) and constitutes a relatively new area of analysis in energy audit applications. Specifically, we demonstrate a novel process using deep-learning methods to segment more than 100,000 thermal images collected from an unmanned aerial system (UAS). To quantify the heat loss for a building envelope, multiple stages of computations need to be performed: object detection (using Mask-RCNN/Faster R-CNN), estimating the surface temperature (using two clustering methods), and finally calculating the overall heat transfer coefficient (e.g., the U-value). The proposed model was applied to eleven academic campuses across the state of North Dakota. The preliminary findings indicate that Mask R-CNN outperformed other instance segmentation models with an mIOU of 73% for facades, 55% for windows, 67% for roofs, 24% for doors, and 11% for HVACs. Two clustering methods, namely K-means and threshold-based clustering (TBC), were deployed to estimate surface temperatures with TBC providing consistent estimates across all times of the day over K-means. Our analysis demonstrated that thermal efficiency not only depended on the accurate acquisition of thermal images but also relied on other factors, such as the building geometry and seasonal weather parameters, such as the outside/inside building temperatures, wind, time of day, and indoor heating/cooling conditions. Finally, the resultant U-values of various building envelopes were compared with recommendations from the American Society of Heating, Refrigerating, and Air-conditioning Engineers (ASHRAE) building standards. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

15 pages, 4427 KiB  
Article
Unsupervised Learning of Depth and Camera Pose with Feature Map Warping
by Ente Guo, Zhifeng Chen, Yanlin Zhou and Dapeng Oliver Wu
Sensors 2021, 21(3), 923; https://doi.org/10.3390/s21030923 - 30 Jan 2021
Cited by 3 | Viewed by 2830
Abstract
Estimating the depth of image and egomotion of agent are important for autonomous and robot in understanding the surrounding environment and avoiding collision. Most existing unsupervised methods estimate depth and camera egomotion by minimizing photometric error between adjacent frames. However, the photometric consistency [...] Read more.
Estimating the depth of image and egomotion of agent are important for autonomous and robot in understanding the surrounding environment and avoiding collision. Most existing unsupervised methods estimate depth and camera egomotion by minimizing photometric error between adjacent frames. However, the photometric consistency sometimes does not meet the real situation, such as brightness change, moving objects and occlusion. To reduce the influence of brightness change, we propose a feature pyramid matching loss (FPML) which captures the trainable feature error between a current and the adjacent frames and therefore it is more robust than photometric error. In addition, we propose the occlusion-aware mask (OAM) network which can indicate occlusion according to change of masks to improve estimation accuracy of depth and camera pose. The experimental results verify that the proposed unsupervised approach is highly competitive against the state-of-the-art methods, both qualitatively and quantitatively. Specifically, our method reduces absolute relative error (Abs Rel) by 0.017–0.088. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

18 pages, 37749 KiB  
Article
CNN-Based Suppression of False Contour and Color Distortion in Bit-Depth Enhancement
by Changmeng Peng, Luting Cai, Xiaoyang Huang, Zhizhong Fu, Jin Xu and Xiaofeng Li
Sensors 2021, 21(2), 416; https://doi.org/10.3390/s21020416 - 08 Jan 2021
Cited by 4 | Viewed by 2017
Abstract
It is a challenge to transmit and store the massive visual data generated in the Visual Internet of Things (VIoT), so the compression of the visual data is of great significance to VIoT. Compressing bit-depth of images is very cost-effective to reduce the [...] Read more.
It is a challenge to transmit and store the massive visual data generated in the Visual Internet of Things (VIoT), so the compression of the visual data is of great significance to VIoT. Compressing bit-depth of images is very cost-effective to reduce the large volume of visual data. However, compressing the bit-depth will introduce false contour, and color distortion would occur in the reconstructed image. False contour and color distortion suppression become critical issues of the bit-depth enhancement in VIoT. To solve these problems, a Bit-depth Enhancement method with AUTO-encoder-like structure (BE-AUTO) is proposed in this paper. Based on the convolution-combined-with-deconvolution codec and global skip of BE-AUTO, this method can effectively suppress false contour and color distortion, thus achieving the state-of-the-art objective metric and visual quality in the reconstructed images, making it more suitable for bit-depth enhancement in VIoT. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

Review

Jump to: Research

44 pages, 5028 KiB  
Review
Practices and Applications of Convolutional Neural Network-Based Computer Vision Systems in Animal Farming: A Review
by Guoming Li, Yanbo Huang, Zhiqian Chen, Gary D. Chesser, Jr., Joseph L. Purswell, John Linhoss and Yang Zhao
Sensors 2021, 21(4), 1492; https://doi.org/10.3390/s21041492 - 21 Feb 2021
Cited by 80 | Viewed by 11489
Abstract
Convolutional neural network (CNN)-based computer vision systems have been increasingly applied in animal farming to improve animal management, but current knowledge, practices, limitations, and solutions of the applications remain to be expanded and explored. The objective of this study is to systematically review [...] Read more.
Convolutional neural network (CNN)-based computer vision systems have been increasingly applied in animal farming to improve animal management, but current knowledge, practices, limitations, and solutions of the applications remain to be expanded and explored. The objective of this study is to systematically review applications of CNN-based computer vision systems on animal farming in terms of the five deep learning computer vision tasks: image classification, object detection, semantic/instance segmentation, pose estimation, and tracking. Cattle, sheep/goats, pigs, and poultry were the major farm animal species of concern. In this research, preparations for system development, including camera settings, inclusion of variations for data recordings, choices of graphics processing units, image preprocessing, and data labeling were summarized. CNN architectures were reviewed based on the computer vision tasks in animal farming. Strategies of algorithm development included distribution of development data, data augmentation, hyperparameter tuning, and selection of evaluation metrics. Judgment of model performance and performance based on architectures were discussed. Besides practices in optimizing CNN-based computer vision systems, system applications were also organized based on year, country, animal species, and purposes. Finally, recommendations on future research were provided to develop and improve CNN-based computer vision systems for improved welfare, environment, engineering, genetics, and management of farm animals. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Image Sensing)
Show Figures

Figure 1

Back to TopTop