sensors-logo

Journal Browser

Journal Browser

Image Sensing and Processing with Convolutional Neural Networks

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (11 December 2021) | Viewed by 66683

Special Issue Editors

Intelligent Systems Research Centre, Ulster University Northern Ireland, Londonderry BT52 1SA, UK
Interests: image processing, robotics, machine learning and financial engineering
Intelligent Systems Research Centre, University of Ulster, Londonderry BT52 1SA, UK
Interests: image processing, bio-inspired vision, robotics, machine learning
College of Information Science and Engineering, Northeastern University, China
Interests: computer vision, machine learning, robotics

Special Issue Information

Dear Colleagues,

Convolutional neural networks (CNNs or ConvNet) are a class of deep neural networks that leverage spatial information, and they are therefore well suited for classifying images for a range of applications. These networks use an ad hoc architecture inspired by our understanding of processing within the visual cortex.  Convolutional neural networks (CNNs) provide an interesting method for representing and processing image information and form a link between general feed-forward neural networks and adaptive filters. Two-dimensional CNNs are formed by one or more layers of two-dimensional filters, with possible non-linear activation functions and/or down-sampling. CNNs possess the key properties of translation invariance and spatially local connections (receptive fields). Given this, deep learning using convolutional neural networks (CNNs) is quickly becoming the state-of-the-art for challenging computer vision applications. However, deep learning’s power consumption and bandwidth requirements currently limit its application in embedded and mobile systems with tight energy budgets. Application of CNNs with different, state-of-the-art image sensors is also a thriving research area.

This Special Issue covers all topics relating to the applications of CNNs with image sensors and for image and vision processing.

Prof. Sonya A. Coleman
Dr. Dermot Kerr
Dr. Yunzhou Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (17 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 181 KiB  
Editorial
Image Sensing and Processing with Convolutional Neural Networks
by Sonya Coleman, Dermot Kerr and Yunzhou Zhang
Sensors 2022, 22(10), 3612; https://doi.org/10.3390/s22103612 - 10 May 2022
Cited by 7 | Viewed by 1677
Abstract
Convolutional neural networks are a class of deep neural networks that leverage spatial information, and they are therefore well suited to classifying images for a range of applications [...] Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)

Research

Jump to: Editorial

19 pages, 89983 KiB  
Article
Litter Detection with Deep Learning: A Comparative Study
by Manuel Córdova, Allan Pinto, Christina Carrozzo Hellevik, Saleh Abdel-Afou Alaliyat, Ibrahim A. Hameed, Helio Pedrini and Ricardo da S. Torres
Sensors 2022, 22(2), 548; https://doi.org/10.3390/s22020548 - 11 Jan 2022
Cited by 25 | Viewed by 6285
Abstract
Pollution in the form of litter in the natural environment is one of the great challenges of our times. Automated litter detection can help assess waste occurrences in the environment. Different machine learning solutions have been explored to develop litter detection tools, thereby [...] Read more.
Pollution in the form of litter in the natural environment is one of the great challenges of our times. Automated litter detection can help assess waste occurrences in the environment. Different machine learning solutions have been explored to develop litter detection tools, thereby supporting research, citizen science, and volunteer clean-up initiatives. However, to the best of our knowledge, no work has investigated the performance of state-of-the-art deep learning object detection approaches in the context of litter detection. In particular, no studies have focused on the assessment of those methods aiming their use in devices with low processing capabilities, e.g., mobile phones, typically employed in citizen science activities. In this paper, we fill this literature gap. We performed a comparative study involving state-of-the-art CNN architectures (e.g., Faster RCNN, Mask-RCNN, EfficientDet, RetinaNet and YOLO-v5), two litter image datasets and a smartphone. We also introduce a new dataset for litter detection, named PlastOPol, composed of 2418 images and 5300 annotations. The experimental results demonstrate that object detectors based on the YOLO family are promising for the construction of litter detection solutions, with superior performance in terms of detection accuracy, processing time, and memory footprint. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

20 pages, 2460 KiB  
Article
Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques
by Ghislain Takam Tchendjou and Emmanuel Simeu
Sensors 2022, 22(1), 175; https://doi.org/10.3390/s22010175 - 28 Dec 2021
Cited by 3 | Viewed by 2059
Abstract
This paper presents the construction of a new objective method for estimation of visual perceiving quality. The proposal provides an assessment of image quality without the need for a reference image or a specific distortion assumption. Two main processes have been used to [...] Read more.
This paper presents the construction of a new objective method for estimation of visual perceiving quality. The proposal provides an assessment of image quality without the need for a reference image or a specific distortion assumption. Two main processes have been used to build our models: The first one uses deep learning with a convolutional neural network process, without any preprocessing. The second objective visual quality is computed by pooling several image features extracted from different concepts: the natural scene statistic in the spatial domain, the gradient magnitude, the Laplacian of Gaussian, as well as the spectral and spatial entropies. The features extracted from the image file are used as the input of machine learning techniques to build the models that are used to estimate the visual quality level of any image. For the machine learning training phase, two main processes are proposed: The first proposed process consists of a direct learning using all the selected features in only one training phase, named direct learning blind visual quality assessment DLBQA. The second process is an indirect learning and consists of two training phases, named indirect learning blind visual quality assessment ILBQA. This second process includes an additional phase of construction of intermediary metrics used for the construction of the prediction model. The produced models are evaluated on many benchmarks image databases as TID2013, LIVE, and LIVE in the wild image quality challenge. The experimental results demonstrate that the proposed models produce the best visual perception quality prediction, compared to the state-of-the-art models. The proposed models have been implemented on an FPGA platform to demonstrate the feasibility of integrating the proposed solution on an image sensor. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

23 pages, 4818 KiB  
Article
An Efficient CNN-Based Deep Learning Model to Detect Malware Attacks (CNN-DMA) in 5G-IoT Healthcare Applications
by Ankita Anand, Shalli Rani, Divya Anand, Hani Moaiteq Aljahdali and Dermot Kerr
Sensors 2021, 21(19), 6346; https://doi.org/10.3390/s21196346 - 23 Sep 2021
Cited by 41 | Viewed by 4731
Abstract
The role of 5G-IoT has become indispensable in smart applications and it plays a crucial part in e-health applications. E-health applications require intelligent schemes and architectures to overcome the security threats against the sensitive data of patients. The information in e-healthcare applications is [...] Read more.
The role of 5G-IoT has become indispensable in smart applications and it plays a crucial part in e-health applications. E-health applications require intelligent schemes and architectures to overcome the security threats against the sensitive data of patients. The information in e-healthcare applications is stored in the cloud which is vulnerable to security attacks. However, with deep learning techniques, these attacks can be detected, which needs hybrid models. In this article, a new deep learning model (CNN-DMA) is proposed to detect malware attacks based on a classifier—Convolution Neural Network (CNN). The model uses three layers, i.e., Dense, Dropout, and Flatten. Batch sizes of 64, 20 epoch, and 25 classes are used to train the network. An input image of 32 × 32 × 1 is used for the initial convolutional layer. Results are retrieved on the Malimg dataset where 25 families of malware are fed as input and our model has detected is Alueron.gen!J malware. The proposed model CNN-DMA is 99% accurate and it is validated with state-of-the-art techniques. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

25 pages, 1076 KiB  
Article
Pulmonary COVID-19: Learning Spatiotemporal Features Combining CNN and LSTM Networks for Lung Ultrasound Video Classification
by Bruno Barros, Paulo Lacerda, Célio Albuquerque and Aura Conci
Sensors 2021, 21(16), 5486; https://doi.org/10.3390/s21165486 - 14 Aug 2021
Cited by 27 | Viewed by 4191
Abstract
Deep Learning is a very active and important area for building Computer-Aided Diagnosis (CAD) applications. This work aims to present a hybrid model to classify lung ultrasound (LUS) videos captured by convex transducers to diagnose COVID-19. A Convolutional Neural Network (CNN) performed the [...] Read more.
Deep Learning is a very active and important area for building Computer-Aided Diagnosis (CAD) applications. This work aims to present a hybrid model to classify lung ultrasound (LUS) videos captured by convex transducers to diagnose COVID-19. A Convolutional Neural Network (CNN) performed the extraction of spatial features, and the temporal dependence was learned using a Long Short-Term Memory (LSTM). Different types of convolutional architectures were used for feature extraction. The hybrid model (CNN-LSTM) hyperparameters were optimized using the Optuna framework. The best hybrid model was composed of an Xception pre-trained on ImageNet and an LSTM containing 512 units, configured with a dropout rate of 0.4, two fully connected layers containing 1024 neurons each, and a sequence of 20 frames in the input layer (20×2018). The model presented an average accuracy of 93% and sensitivity of 97% for COVID-19, outperforming models based purely on spatial approaches. Furthermore, feature extraction using transfer learning with models pre-trained on ImageNet provided comparable results to models pre-trained on LUS images. The results corroborate with other studies showing that this model for LUS classification can be an important tool in the fight against COVID-19 and other lung diseases. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

32 pages, 5827 KiB  
Article
Automatic, Qualitative Scoring of the Clock Drawing Test (CDT) Based on U-Net, CNN and Mobile Sensor Data
by Ingyu Park and Unjoo Lee
Sensors 2021, 21(15), 5239; https://doi.org/10.3390/s21155239 - 03 Aug 2021
Cited by 10 | Viewed by 10410
Abstract
The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular screening tool for cognitive functions. In spite of its qualitative capabilities in diagnosis of neurological diseases, the assessment of the CDT has depended on quantitative methods as well as manual paper based [...] Read more.
The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular screening tool for cognitive functions. In spite of its qualitative capabilities in diagnosis of neurological diseases, the assessment of the CDT has depended on quantitative methods as well as manual paper based methods. Furthermore, due to the impact of the advancement of mobile smart devices imbedding several sensors and deep learning algorithms, the necessity of a standardized, qualitative, and automatic scoring system for CDT has been increased. This study presents a mobile phone application, mCDT, for the CDT and suggests a novel, automatic and qualitative scoring method using mobile sensor data and deep learning algorithms: CNN, a convolutional network, U-Net, a convolutional network for biomedical image segmentation, and the MNIST (Modified National Institute of Standards and Technology) database. To obtain DeepC, a trained model for segmenting a contour image from a hand drawn clock image, U-Net was trained with 159 CDT hand-drawn images at 128 × 128 resolution, obtained via mCDT. To construct DeepH, a trained model for segmenting the hands in a clock image, U-Net was trained with the same 159 CDT 128 × 128 resolution images. For obtaining DeepN, a trained model for classifying the digit images from a hand drawn clock image, CNN was trained with the MNIST database. Using DeepC, DeepH and DeepN with the sensor data, parameters of contour (0–3 points), numbers (0–4 points), hands (0–5 points), and the center (0–1 points) were scored for a total of 13 points. From 219 subjects, performance testing was completed with images and sensor data obtained via mCDT. For an objective performance analysis, all the images were scored and crosschecked by two clinical experts in CDT scaling. Performance test analysis derived a sensitivity, specificity, accuracy and precision for the contour parameter of 89.33, 92.68, 89.95 and 98.15%, for the hands parameter of 80.21, 95.93, 89.04 and 93.90%, for the numbers parameter of 83.87, 95.31, 87.21 and 97.74%, and for the center parameter of 98.42, 86.21, 96.80 and 97.91%, respectively. From these results, the mCDT application and its scoring system provide utility in differentiating dementia disease subtypes, being valuable in clinical practice and for studies in the field. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

25 pages, 13343 KiB  
Article
Forecasting of Typhoon-Induced Wind-Wave by Using Convolutional Deep Learning on Fused Data of Remote Sensing and Ground Measurements
by Chih-Chiang Wei and Hao-Chun Chang
Sensors 2021, 21(15), 5234; https://doi.org/10.3390/s21155234 - 02 Aug 2021
Cited by 12 | Viewed by 2427
Abstract
Taiwan is an island, and its economic activities are primarily dependent on maritime transport and international trade. However, Taiwan is also located in the region of typhoon development in the Northwestern Pacific Basin. Thus, it frequently receives strong winds and large waves brought [...] Read more.
Taiwan is an island, and its economic activities are primarily dependent on maritime transport and international trade. However, Taiwan is also located in the region of typhoon development in the Northwestern Pacific Basin. Thus, it frequently receives strong winds and large waves brought by typhoons, which pose a considerable threat to port operations. To determine the real-time status of winds and waves brought by typhoons near the coasts of major ports in Taiwan, this study developed models for predicting the wind speed and wave height near the coasts of ports during typhoon periods. The forecasting horizons range from 1 to 6 h. In this study, the gated recurrent unit (GRU) neural networks and convolutional neural networks (CNNs) were combined and adopted to formulate the typhoon-induced wind and wave height prediction models. This work designed two wind speed prediction models (WIND-1 and WIND-2) and four wave height prediction models (WAVE-1 to WAVE-4), which are based on the WIND-1 and WIND-2 model outcomes. The Longdong and Liuqiu Buoys were the experiment locations. The observatory data from the ground stations and buoys, as well as radar reflectivity images, were adopted. The results indicated that, first, WIND-2 has a superior wind speed prediction performance to WIND-1, where WIND-2 can be used to identify the temporal and spatial changes in wind speeds using ground station data and reflectivity images. Second, WAVE-4 has the optimal wave height prediction performance, followed by WAVE-3, WAVE-2, and WAVE-1. The results of WAVE-4 revealed using the designed models with in-situ and reflectivity data directly yielded optimal predictions of the wind-based wave heights. Overall, the results indicated that the presented combination models were able to extract the spatial image features using multiple convolutional and pooling layers and provide useful information from time-series data using the GRU memory cell units. Overall, the presented models could exhibit promising results. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

14 pages, 3215 KiB  
Article
Jitter Detection and Image Restoration Based on Generative Adversarial Networks in Satellite Images
by Zilin Wang, Zhaoxiang Zhang, Limin Dong and Guodong Xu
Sensors 2021, 21(14), 4693; https://doi.org/10.3390/s21144693 - 09 Jul 2021
Cited by 8 | Viewed by 2974
Abstract
High-resolution satellite images (HRSIs) obtained from onboard satellite linear array cameras suffer from geometric disturbance in the presence of attitude jitter. Therefore, detection and compensation of satellite attitude jitter are crucial to reduce the geopositioning error and to improve the geometric accuracy of [...] Read more.
High-resolution satellite images (HRSIs) obtained from onboard satellite linear array cameras suffer from geometric disturbance in the presence of attitude jitter. Therefore, detection and compensation of satellite attitude jitter are crucial to reduce the geopositioning error and to improve the geometric accuracy of HRSIs. In this work, a generative adversarial network (GAN) architecture is proposed to automatically learn and correct the deformed scene features from a single remote sensing image. In the proposed GAN, a convolutional neural network (CNN) is designed to discriminate the inputs, and another CNN is used to generate so-called fake inputs. To explore the usefulness and effectiveness of a GAN for jitter detection, the proposed GANs are trained on part of the PatternNet dataset and tested on three popular remote sensing datasets, along with a deformed Yaogan-26 satellite image. Several experiments show that the proposed model provides competitive results. The proposed GAN reveals the enormous potential of GAN-based methods for the analysis of attitude jitter from remote sensing images. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

17 pages, 1100 KiB  
Article
An Efficient and Lightweight Deep Learning Model for Human Activity Recognition Using Smartphones
by Ankita, Shalli Rani, Himanshi Babbar, Sonya Coleman, Aman Singh and Hani Moaiteq Aljahdali
Sensors 2021, 21(11), 3845; https://doi.org/10.3390/s21113845 - 02 Jun 2021
Cited by 34 | Viewed by 5387
Abstract
Traditional pattern recognition approaches have gained a lot of popularity. However, these are largely dependent upon manual feature extraction, which makes the generalized model obscure. The sequences of accelerometer data recorded can be classified by specialized smartphones into well known movements that can [...] Read more.
Traditional pattern recognition approaches have gained a lot of popularity. However, these are largely dependent upon manual feature extraction, which makes the generalized model obscure. The sequences of accelerometer data recorded can be classified by specialized smartphones into well known movements that can be done with human activity recognition. With the high success and wide adaptation of deep learning approaches for the recognition of human activities, these techniques are widely used in wearable devices and smartphones to recognize the human activities. In this paper, convolutional layers are combined with long short-term memory (LSTM), along with the deep learning neural network for human activities recognition (HAR). The proposed model extracts the features in an automated way and categorizes them with some model attributes. In general, LSTM is alternative form of recurrent neural network (RNN) which is famous for temporal sequences’ processing. In the proposed architecture, a dataset of UCI-HAR for Samsung Galaxy S2 is used for various human activities. The CNN classifier, which should be taken single, and LSTM models should be taken in series and take the feed data. For each input, the CNN model is applied, and each input image’s output is transferred to the LSTM classifier as a time step. The number of filter maps for mapping of the various portions of image is the most important hyperparameter used. Transformation on the basis of observations takes place by using Gaussian standardization. CNN-LSTM, a proposed model, is an efficient and lightweight model that has shown high robustness and better activity detection capability than traditional algorithms by providing the accuracy of 97.89%. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

16 pages, 7256 KiB  
Article
Deep Supervised Residual Dense Network for Underwater Image Enhancement
by Yanling Han, Lihua Huang, Zhonghua Hong, Shouqi Cao, Yun Zhang and Jing Wang
Sensors 2021, 21(9), 3289; https://doi.org/10.3390/s21093289 - 10 May 2021
Cited by 25 | Viewed by 2913
Abstract
Underwater images are important carriers and forms of underwater information, playing a vital role in exploring and utilizing marine resources. However, underwater images have characteristics of low contrast and blurred details because of the absorption and scattering of light. In recent years, deep [...] Read more.
Underwater images are important carriers and forms of underwater information, playing a vital role in exploring and utilizing marine resources. However, underwater images have characteristics of low contrast and blurred details because of the absorption and scattering of light. In recent years, deep learning has been widely used in underwater image enhancement and restoration because of its powerful feature learning capabilities, but there are still shortcomings in detailed enhancement. To address the problem, this paper proposes a deep supervised residual dense network (DS_RD_Net), which is used to better learn the mapping relationship between clear in-air images and synthetic underwater degraded images. DS_RD_Net first uses residual dense blocks to extract features to enhance feature utilization; then, it adds residual path blocks between the encoder and decoder to reduce the semantic differences between the low-level features and high-level features; finally, it employs a deep supervision mechanism to guide network training to improve gradient propagation. Experiments results (PSNR was 36.2, SSIM was 96.5%, and UCIQE was 0.53) demonstrated that the proposed method can fully retain the local details of the image while performing color restoration and defogging compared with other image enhancement methods, achieving good qualitative and quantitative effects. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

15 pages, 2803 KiB  
Article
Keypoint-Based Robotic Grasp Detection Scheme in Multi-Object Scenes
by Tong Li, Fei Wang, Changlei Ru, Yong Jiang and Jinghong Li
Sensors 2021, 21(6), 2132; https://doi.org/10.3390/s21062132 - 18 Mar 2021
Cited by 15 | Viewed by 3799
Abstract
Robot grasping is an important direction in intelligent robots. However, how to help robots grasp specific objects in multi-object scenes is still a challenging problem. In recent years, due to the powerful feature extraction capabilities of convolutional neural networks (CNN), various algorithms based [...] Read more.
Robot grasping is an important direction in intelligent robots. However, how to help robots grasp specific objects in multi-object scenes is still a challenging problem. In recent years, due to the powerful feature extraction capabilities of convolutional neural networks (CNN), various algorithms based on convolutional neural networks have been proposed to solve the problem of grasp detection. Different from anchor-based grasp detection algorithms, in this paper, we propose a keypoint-based scheme to solve this problem. We model an object or a grasp as a single point—the center point of its bounding box. The detector uses keypoint estimation to find the center point and regress to all other object attributes such as size, direction, etc. Experimental results demonstrate that the accuracy of this method is 74.3% in the multi-object grasp dataset VMRD, and the performance on the single-object scene Cornell dataset is competitive with the current state-of-the-art grasp detection algorithm. Robot experiments demonstrate that this method can help robots grasp the target in single-object and multi-object scenes with overall success rates of 94% and 87%, respectively. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

17 pages, 10641 KiB  
Article
Deep Learning Driven Noise Reduction for Reduced Flux Computed Tomography
by Khalid L. Alsamadony, Ertugrul U. Yildirim, Guenther Glatz, Umair Bin Waheed and Sherif M. Hanafy
Sensors 2021, 21(5), 1921; https://doi.org/10.3390/s21051921 - 09 Mar 2021
Cited by 11 | Viewed by 3126
Abstract
Deep neural networks have received considerable attention in clinical imaging, particularly with respect to the reduction of radiation risk. Lowering the radiation dose by reducing the photon flux inevitably results in the degradation of the scanned image quality. Thus, researchers have sought to [...] Read more.
Deep neural networks have received considerable attention in clinical imaging, particularly with respect to the reduction of radiation risk. Lowering the radiation dose by reducing the photon flux inevitably results in the degradation of the scanned image quality. Thus, researchers have sought to exploit deep convolutional neural networks (DCNNs) to map low-quality, low-dose images to higher-dose, higher-quality images, thereby minimizing the associated radiation hazard. Conversely, computed tomography (CT) measurements of geomaterials are not limited by the radiation dose. In contrast to the human body, however, geomaterials may be comprised of high-density constituents causing increased attenuation of the X-rays. Consequently, higher-dose images are required to obtain an acceptable scan quality. The problem of prolonged acquisition times is particularly severe for micro-CT based scanning technologies. Depending on the sample size and exposure time settings, a single scan may require several hours to complete. This is of particular concern if phenomena with an exponential temperature dependency are to be elucidated. A process may happen too fast to be adequately captured by CT scanning. To address the aforementioned issues, we apply DCNNs to improve the quality of rock CT images and reduce exposure times by more than 60%, simultaneously. We highlight current results based on micro-CT derived datasets and apply transfer learning to improve DCNN results without increasing training time. The approach is applicable to any computed tomography technology. Furthermore, we contrast the performance of the DCNN trained by minimizing different loss functions such as mean squared error and structural similarity index. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

13 pages, 1741 KiB  
Article
MSDU-Net: A Multi-Scale Dilated U-Net for Blur Detection
by Xiao Xiao, Fan Yang and Amir Sadovnik
Sensors 2021, 21(5), 1873; https://doi.org/10.3390/s21051873 - 08 Mar 2021
Cited by 8 | Viewed by 2310
Abstract
A blur detection problem which aims to separate the blurred and clear regions of an image is widely used in many important computer vision tasks such object detection, semantic segmentation, and face recognition, attracting increasing attention from researchers and industry in recent years. [...] Read more.
A blur detection problem which aims to separate the blurred and clear regions of an image is widely used in many important computer vision tasks such object detection, semantic segmentation, and face recognition, attracting increasing attention from researchers and industry in recent years. To improve the quality of the image separation, many researchers have spent enormous efforts on extracting features from various scales of images. However, the matter of how to extract blur features and fuse these features synchronously is still a big challenge. In this paper, we regard blur detection as an image segmentation problem. Inspired by the success of the U-net architecture for image segmentation, we propose a multi-scale dilated convolutional neural network called MSDU-net. In this model, we design a group of multi-scale feature extractors with dilated convolutions to extract textual information at different scales at the same time. The U-shape architecture of the MSDU-net can fuse the different-scale texture features and generated semantic features to support the image segmentation task. We conduct extensive experiments on two classic public benchmark datasets and show that the MSDU-net outperforms other state-of-the-art blur detection approaches. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

15 pages, 4955 KiB  
Article
Identification of Crop Type in Crowdsourced Road View Photos with Deep Convolutional Neural Network
by Fangming Wu, Bingfang Wu, Miao Zhang, Hongwei Zeng and Fuyou Tian
Sensors 2021, 21(4), 1165; https://doi.org/10.3390/s21041165 - 07 Feb 2021
Cited by 18 | Viewed by 2677
Abstract
In situ ground truth data are an important requirement for producing accurate cropland type map, and this is precisely what is lacking at vast scales. Although volunteered geographic information (VGI) has been proven as a possible solution for in situ data acquisition, processing [...] Read more.
In situ ground truth data are an important requirement for producing accurate cropland type map, and this is precisely what is lacking at vast scales. Although volunteered geographic information (VGI) has been proven as a possible solution for in situ data acquisition, processing and extracting valuable information from millions of pictures remains challenging. This paper targets the detection of specific crop types from crowdsourced road view photos. A first large, public, multiclass road view crop photo dataset named iCrop was established for the development of crop type detection with deep learning. Five state-of-the-art deep convolutional neural networks including InceptionV4, DenseNet121, ResNet50, MobileNetV2, and ShuffleNetV2 were employed to compare the baseline performance. ResNet50 outperformed the others according to the overall accuracy (87.9%), and ShuffleNetV2 outperformed the others according to the efficiency (13 FPS). The decision fusion schemes major voting was used to further improve crop identification accuracy. The results clearly demonstrate the superior accuracy of the proposed decision fusion over the other non-fusion-based methods in crop type detection of imbalanced road view photos dataset. The voting method achieved higher mean accuracy (90.6–91.1%) and can be leveraged to classify crop type in crowdsourced road view photos. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

19 pages, 4558 KiB  
Article
A CNN-Based Length-Aware Cascade Road Damage Detection Approach
by Huiqing Xu, Bin Chen and Jian Qin
Sensors 2021, 21(3), 689; https://doi.org/10.3390/s21030689 - 20 Jan 2021
Cited by 6 | Viewed by 2693
Abstract
Accurate and robust detection of road damage is essential for public transportation safety. Currently, deep convolutional neural networks (CNNs)-based road damage detection algorithms to localize and classify damage with a bounding box have achieved remarkable progress. However, research in this field fails to [...] Read more.
Accurate and robust detection of road damage is essential for public transportation safety. Currently, deep convolutional neural networks (CNNs)-based road damage detection algorithms to localize and classify damage with a bounding box have achieved remarkable progress. However, research in this field fails to take into account two key characteristics of road damage: weak semantic information and abnormal geometric properties, resulting in inappropriate feature representation and suboptimal detection results. To boost the performance, we propose a CNN-based cascaded damage detection network, called CrdNet. The proposed model has three parts: (1) We introduce a novel backbone network, named LrNet, that reuses low-level features and mixes suitable range dependency features to learn high-to-low level feature fusions for road damage weak semantic information representation. (2) We apply multi-scale and multiple aspect ratios anchor mechanism to generate high-quality positive samples regarding the damage with abnormal geometric properties for network training. (3) We designed an adaptive proposal assignment strategy and performed cascade predictions on corresponding branches that can establish different range dependencies. The experiments show that the proposed method achieves mean average precision (mAP) of 90.92% on a collected road damage dataset, demonstrating the good performance and robustness of the model. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

31 pages, 6012 KiB  
Article
Automated Sensing System for Real-Time Recognition of Trucks in River Dredging Areas Using Computer Vision and Convolutional Deep Learning
by Jui-Sheng Chou and Chia-Hsuan Liu
Sensors 2021, 21(2), 555; https://doi.org/10.3390/s21020555 - 14 Jan 2021
Cited by 18 | Viewed by 4616
Abstract
Sand theft or illegal mining in river dredging areas has been a problem in recent decades. For this reason, increasing the use of artificial intelligence in dredging areas, building automated monitoring systems, and reducing human involvement can effectively deter crime and lighten the [...] Read more.
Sand theft or illegal mining in river dredging areas has been a problem in recent decades. For this reason, increasing the use of artificial intelligence in dredging areas, building automated monitoring systems, and reducing human involvement can effectively deter crime and lighten the workload of security guards. In this investigation, a smart dredging construction site system was developed using automated techniques that were arranged to be suitable to various areas. The aim in the initial period of the smart dredging construction was to automate the audit work at the control point, which manages trucks in river dredging areas. Images of dump trucks entering the control point were captured using monitoring equipment in the construction area. The obtained images and the deep learning technique, YOLOv3, were used to detect the positions of the vehicle license plates. Framed images of the vehicle license plates were captured and were used as input in an image classification model, C-CNN-L3, to identify the number of characters on the license plate. Based on the classification results, the images of the vehicle license plates were transmitted to a text recognition model, R-CNN-L3, that corresponded to the characters of the license plate. Finally, the models of each stage were integrated into a real-time truck license plate recognition (TLPR) system; the single character recognition rate was 97.59%, the overall recognition rate was 93.73%, and the speed was 0.3271 s/image. The TLPR system reduces the labor force and time spent to identify the license plates, effectively reducing the probability of crime and increasing the transparency, automation, and efficiency of the frontline personnel’s work. The TLPR is the first step toward an automated operation to manage trucks at the control point. The subsequent and ongoing development of system functions can advance dredging operations toward the goal of being a smart construction site. By intending to facilitate an intelligent and highly efficient management system of dredging-related departments by providing a vehicle LPR system, this paper forms a contribution to the current body of knowledge in the sense that it presents an objective approach for the TLPR system. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

18 pages, 5262 KiB  
Article
Deeply Recursive Low- and High-Frequency Fusing Networks for Single Image Super-Resolution
by Cheng Yang and Guanming Lu
Sensors 2020, 20(24), 7268; https://doi.org/10.3390/s20247268 - 18 Dec 2020
Cited by 8 | Viewed by 2477
Abstract
With the development of researches on single image super-resolution (SISR) based on convolutional neural networks (CNN), the quality of recovered images has been remarkably promoted. Since then, many deep learning-based models have been proposed, which have outperformed the traditional SISR algorithms. According to [...] Read more.
With the development of researches on single image super-resolution (SISR) based on convolutional neural networks (CNN), the quality of recovered images has been remarkably promoted. Since then, many deep learning-based models have been proposed, which have outperformed the traditional SISR algorithms. According to the results of extensive experiments, the feature representations of the model can be enhanced by increasing the depth and width of the network, which can ultimately improve the image reconstruction quality. However, a larger network generally consumes more computational and memory resources, making it difficult to train the network and increasing the prediction time. In view of the above problems, a novel deeply-recursive low- and high-frequency fusing network (DRFFN) for SISR tasks is proposed in this paper, which adopts the structure of parallel branches to extract the low- and high-frequency information of the image, respectively. The different complexities of the branches can reflect the frequency characteristic of the diverse image information. Moreover, an effective channel-wise attention mechanism based on variance (VCA) is designed to make the information distribution of each feature map more reasonably with different variances. Owing to model structure (i.e., cascading recursive learning of recursive units), DRFFN and DRFFN-L are very compact, where the weights are shared by all convolutional recursions. Comprehensive benchmark evaluations in standard benchmark datasets well demonstrate that DRFFN outperforms the most existing models and has achieved competitive, quantitative, and visual results. Full article
(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)
Show Figures

Figure 1

Back to TopTop