Remote Sensing

Research

Jump to: Other

22 pages, 4309 KiB

Open AccessArticle

Spectral-Spatial Offset Graph Convolutional Networks for Hyperspectral Image Classification

by Minghua Zhang, Hongling Luo, Wei Song, Haibin Mei and Cheng Su

Remote Sens. 2021, 13(21), 4342; https://doi.org/10.3390/rs13214342 - 28 Oct 2021

Cited by 13 | Viewed by 2727

Abstract

In hyperspectral image (HSI) classification, convolutional neural networks (CNN) have been attracting increasing attention because of their ability to represent spectral-spatial features. Nevertheless, the conventional CNN models perform convolution operation on regular-grid image regions with a fixed kernel size and as a result, [...] Read more.

In hyperspectral image (HSI) classification, convolutional neural networks (CNN) have been attracting increasing attention because of their ability to represent spectral-spatial features. Nevertheless, the conventional CNN models perform convolution operation on regular-grid image regions with a fixed kernel size and as a result, they neglect the inherent relation between HSI data. In recent years, graph convolutional networks (GCN) used for data representation in a non-Euclidean space, have been successfully applied to HSI classification. However, conventional GCN methods suffer from a huge computational cost since they construct the adjacency matrix between all HSI pixels, and they ignore the local spatial context information of hyperspectral images. To alleviate these shortcomings, we propose a novel method termed spectral-spatial offset graph convolutional networks (SSOGCN). Different from the usually used GCN models that compute the adjacency matrix between all pixels, we construct an adjacency matrix only using pixels within a patch, which contains rich local spatial context information, while reducing the computation cost and memory consumption of the adjacency matrix. Moreover, to emphasize important local spatial information, an offset graph convolution module is proposed to extract more robust features and improve the classification performance. Comprehensive experiments are carried out on three representative benchmark data sets, and the experimental results effectively certify that the proposed SSOGCN method has more advantages than the recent state-of-the-art (SOTA) methods. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

26 pages, 408010 KiB

Open AccessArticle

Detection of Abnormal Vibration Dampers on Transmission Lines in UAV Remote Sensing Images with PMA-YOLO

by Wenxia Bao, Yangxun Ren, Nian Wang, Gensheng Hu and Xianjun Yang

Remote Sens. 2021, 13(20), 4134; https://doi.org/10.3390/rs13204134 - 15 Oct 2021

Cited by 19 | Viewed by 2623

Abstract

The accurate detection and timely replacement of abnormal vibration dampers on transmission lines are critical for the safe and stable operation of power systems. Recently, unmanned aerial vehicles (UAVs) have become widely used to inspect transmission lines. In this paper, we constructed a [...] Read more.

The accurate detection and timely replacement of abnormal vibration dampers on transmission lines are critical for the safe and stable operation of power systems. Recently, unmanned aerial vehicles (UAVs) have become widely used to inspect transmission lines. In this paper, we constructed a data set of abnormal vibration dampers (DAVDs) on transmission lines in images obtained by UAVs. There are four types of vibration dampers in this data set, and each vibration damper may be rusty, defective, or normal. The challenges in the detection of abnormal vibration dampers on transmission lines in the images captured by UAVs were as following: the images had a high resolution as well as the objects of vibration dampers were relatively small and sparsely distributed, and the backgrounds of cross stage partial networks of the images were complex due to the fact that the transmission lines were erected in a variety of outdoor environments. Existing methods of ground-based object detection significantly reduced the accuracy when dealing with complex backgrounds and small objects of abnormal vibration dampers detection. To address these issues, we proposed an end-to-end parallel mixed attention You Only Look Once (PMA-YOLO) network to improve the detection performance for abnormal vibration dampers. The parallel mixed attention (PMA) module was introduced and integrated into the YOLOv4 network. This module combines a channel attention block and a spatial attention block, and the convolution results of the input feature maps in parallel, allowing the network to pay more attention to critical regions of abnormal vibration dampers in complex background images. Meanwhile, in view of the problem that abnormal vibration dampers are prone to missing detections, we analyzed the scale and ratio of the ground truth boxes and used the K-means algorithm to re-cluster new anchors for abnormal vibration dampers in images. In addition, we introduced a multi-stage transfer learning strategy to improve the efficiency of the original training method and prevent overfitting by the network. The experimental results showed that the

m A P @ 0.5

for PMA-YOLO in the detection of abnormal vibration dampers reached 93.8% on the test set of DAVD, 3.5% higher than that of YOLOv4. When the multi-stage transfer learning strategy was used, the

m A P @ 0.5

was improved by a further 0.2%. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

22 pages, 10818 KiB

Open AccessArticle

KappaMask: AI-Based Cloudmask Processor for Sentinel-2

by Marharyta Domnich, Indrek Sünter, Heido Trofimov, Olga Wold, Fariha Harun, Anton Kostiukhin, Mihkel Järveoja, Mihkel Veske, Tanel Tamm, Kaupo Voormansik, Aire Olesk, Valentina Boccia, Nicolas Longepe and Enrico Giuseppe Cadau

Remote Sens. 2021, 13(20), 4100; https://doi.org/10.3390/rs13204100 - 13 Oct 2021

Cited by 17 | Viewed by 5142

Abstract

The Copernicus Sentinel-2 mission operated by the European Space Agency (ESA) provides comprehensive and continuous multi-spectral observations of all the Earth’s land surface since mid-2015. Clouds and cloud shadows significantly decrease the usability of optical satellite data, especially in agricultural applications; therefore, an [...] Read more.

The Copernicus Sentinel-2 mission operated by the European Space Agency (ESA) provides comprehensive and continuous multi-spectral observations of all the Earth’s land surface since mid-2015. Clouds and cloud shadows significantly decrease the usability of optical satellite data, especially in agricultural applications; therefore, an accurate and reliable cloud mask is mandatory for effective EO optical data exploitation. During the last few years, image segmentation techniques have developed rapidly with the exploitation of neural network capabilities. With this perspective, the KappaMask processor using U-Net architecture was developed with the ability to generate a classification mask over northern latitudes into the following classes: clear, cloud shadow, semi-transparent cloud (thin clouds), cloud and invalid. For training, a Sentinel-2 dataset covering the Northern European terrestrial area was labelled. KappaMask provides a 10 m classification mask for Sentinel-2 Level-2A (L2A) and Level-1C (L1C) products. The total dice coefficient on the test dataset, which was not seen by the model at any stage, was 80% for KappaMask L2A and 76% for KappaMask L1C for clear, cloud shadow, semi-transparent and cloud classes. A comparison with rule-based cloud mask methods was then performed on the same test dataset, where Sen2Cor reached 59% dice coefficient for clear, cloud shadow, semi-transparent and cloud classes, Fmask reached 61% for clear, cloud shadow and cloud classes and Maja reached 51% for clear and cloud classes. The closest machine learning open-source cloud classification mask, S2cloudless, had a 63% dice coefficient providing only cloud and clear classes, while KappaMask L2A, with a more complex classification schema, outperformed S2cloudless by 17%. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

20 pages, 3244 KiB

Open AccessArticle

Direct Aerial Visual Geolocalization Using Deep Neural Networks

by Winthrop Harvey, Chase Rainwater and Jackson Cothren

Remote Sens. 2021, 13(19), 4017; https://doi.org/10.3390/rs13194017 - 08 Oct 2021

Cited by 7 | Viewed by 2804

Abstract

Unmanned aerial vehicles (UAVs) must keep track of their location in order to maintain flight plans. Currently, this task is almost entirely performed by a combination of Inertial Measurement Units (IMUs) and reference to GNSS (Global Navigation Satellite System). Navigation by GNSS, however, [...] Read more.

Unmanned aerial vehicles (UAVs) must keep track of their location in order to maintain flight plans. Currently, this task is almost entirely performed by a combination of Inertial Measurement Units (IMUs) and reference to GNSS (Global Navigation Satellite System). Navigation by GNSS, however, is not always reliable, due to various causes both natural (reflection and blockage from objects, technical fault, inclement weather) and artificial (GPS spoofing and denial). In such GPS-denied situations, it is desirable to have additional methods for aerial geolocalization. One such method is visual geolocalization, where aircraft use their ground facing cameras to localize and navigate. The state of the art in many ground-level image processing tasks involve the use of Convolutional Neural Networks (CNNs). We present here a study of how effectively a modern CNN designed for visual classification can be applied to the problem of Absolute Visual Geolocalization (AVL, localization without a prior location estimate). An Xception based architecture is trained from scratch over a >1000 km² section of Washington County, Arkansas to directly regress latitude and longitude from images from different orthorectified high-altitude survey flights. It achieves average localization accuracy on unseen image sets over the same region from different years and seasons with as low as 115 m average error, which localizes to 0.004% of the training area, or about 8% of the width of the 1.5 × 1.5 km input image. This demonstrates that CNNs are expressive enough to encode robust landscape information for geolocalization over large geographic areas. Furthermore, discussed are methods of providing uncertainty for CNN regression outputs, and future areas of potential improvement for use of deep neural networks in visual geolocalization. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

20 pages, 3514 KiB

Open AccessArticle

EAAU-Net: Enhanced Asymmetric Attention U-Net for Infrared Small Target Detection

by Xiaozhong Tong, Bei Sun, Junyu Wei, Zhen Zuo and Shaojing Su

Remote Sens. 2021, 13(16), 3200; https://doi.org/10.3390/rs13163200 - 12 Aug 2021

Cited by 36 | Viewed by 3797

Abstract

Detecting infrared small targets lacking texture and shape information in cluttered environments is extremely challenging. With the development of deep learning, convolutional neural network (CNN)-based methods have achieved promising results in generic object detection. However, existing CNN-based methods with pooling layers may lose [...] Read more.

Detecting infrared small targets lacking texture and shape information in cluttered environments is extremely challenging. With the development of deep learning, convolutional neural network (CNN)-based methods have achieved promising results in generic object detection. However, existing CNN-based methods with pooling layers may lose the targets in the deep layers and, thus, cannot be directly applied for infrared small target detection. To overcome this problem, we propose an enhanced asymmetric attention (EAA) U-Net. Specifically, we present an efficient and powerful EAA module that uses both same-layer feature information exchange and cross-layer feature fusion to improve feature representation. In the proposed approach, spatial and channel information exchanges occur between the same layers to reinforce the primitive features of small targets, and a bottom-up global attention module focuses on cross-layer feature fusion to enable the dynamic weighted modulation of high-level features under the guidance of low-level features. The results of detailed ablation studies empirically validate the effectiveness of each component in the network architecture. Compared to state-of-the-art methods, the proposed method achieved superior performance, with an intersection-over-union (IoU) of 0.771, normalised IoU (nIoU) of 0.746, and F-area of 0.681 on the publicly available SIRST dataset. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

20 pages, 5147 KiB

Open AccessArticle

Domain Adaptive Ship Detection in Optical Remote Sensing Images

by Linhao Li, Zhiqiang Zhou, Bo Wang, Lingjuan Miao, Zhe An and Xiaowu Xiao

Remote Sens. 2021, 13(16), 3168; https://doi.org/10.3390/rs13163168 - 10 Aug 2021

Cited by 14 | Viewed by 2064

Abstract

With the successful application of the convolutional neural network (CNN), significant progress has been made by CNN-based ship detection methods. However, they often face considerable difficulties when applied to a new domain where the imaging condition changes significantly. Although training with the two [...] Read more.

With the successful application of the convolutional neural network (CNN), significant progress has been made by CNN-based ship detection methods. However, they often face considerable difficulties when applied to a new domain where the imaging condition changes significantly. Although training with the two domains together can solve this problem to some extent, the large domain shift will lead to sub-optimal feature representations, and thus weaken the generalization ability on both domains. In this paper, a domain adaptive ship detection method is proposed to better detect ships between different domains. Specifically, the proposed method minimizes the domain discrepancies via both image-level adaption and instance-level adaption. In image-level adaption, we use multiple receptive field integration and channel domain attention to enhance the feature’s resistance to scale and environmental changes, respectively. Moreover, a novel boundary regression module is proposed in instance-level adaption to correct the localization deviation of the ship proposals caused by the domain shift. Compared with conventional regression approaches, the proposed boundary regression module is able to make more accurate predictions via the effective extreme point features. The two adaption components are implemented by learning the corresponding domain classifiers respectively in an adversarial training way, thereby obtaining a robust model suitable for both of the two domains. Experiments on both supervised and unsupervised domain adaption scenarios are conducted to verify the effectiveness of the proposed method. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

21 pages, 3902 KiB

Open AccessArticle

A Deep Learning Approach to an Enhanced Building Footprint and Road Detection in High-Resolution Satellite Imagery

by Christian Ayala, Rubén Sesma, Carlos Aranda and Mikel Galar

Remote Sens. 2021, 13(16), 3135; https://doi.org/10.3390/rs13163135 - 07 Aug 2021

Cited by 23 | Viewed by 5446

Abstract

The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has [...] Read more.

The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

18 pages, 3929 KiB

Open AccessArticle

Deep Residual Dual-Attention Network for Super-Resolution Reconstruction of Remote Sensing Images

by Bo Huang, Boyong He, Liaoni Wu and Zhiming Guo

Remote Sens. 2021, 13(14), 2784; https://doi.org/10.3390/rs13142784 - 15 Jul 2021

Cited by 11 | Viewed by 2879

Abstract

A super-resolution (SR) reconstruction of remote sensing images is becoming a highly active area of research. With increasing upscaling factors, richer and more abundant details can progressively be obtained. However, in comparison with natural images, the complex spatial distribution of remote sensing data [...] Read more.

A super-resolution (SR) reconstruction of remote sensing images is becoming a highly active area of research. With increasing upscaling factors, richer and more abundant details can progressively be obtained. However, in comparison with natural images, the complex spatial distribution of remote sensing data increases the difficulty in its reconstruction. Furthermore, most SR reconstruction methods suffer from low feature information utilization and equal processing of all spatial regions of an image. To improve the performance of SR reconstruction of remote sensing images, this paper proposes a deep convolutional neural network (DCNN)-based approach, named the deep residual dual-attention network (DRDAN), which achieves the fusion of global and local information. Specifically, we have developed a residual dual-attention block (RDAB) as a building block in DRDAN. In the RDAB, we firstly use the local multi-level fusion module to fully extract and deeply fuse the features of the different convolution layers. This module can facilitate the flow of information in the network. After this, a dual-attention mechanism (DAM), which includes both a channel attention mechanism and a spatial attention mechanism, enables the network to adaptively allocate more attention to regions carrying high-frequency information. Extensive experiments indicate that the DRDAN outperforms other comparable DCNN-based approaches in both objective evaluation indexes and subjective visual quality. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

31 pages, 5089 KiB

Open AccessArticle

Injection of Traditional Hand-Crafted Features into Modern CNN-Based Models for SAR Ship Classification: What, Why, Where, and How

by Tianwen Zhang and Xiaoling Zhang

Remote Sens. 2021, 13(11), 2091; https://doi.org/10.3390/rs13112091 - 26 May 2021

Cited by 47 | Viewed by 3469

Abstract

With the rise of artificial intelligence, many advanced Synthetic Aperture Radar (SAR) ship classifiers based on convolutional neural networks (CNNs) have achieved better accuracies than traditional hand-crafted feature ones. However, most existing CNN-based models uncritically abandon traditional hand-crafted features, and rely excessively on [...] Read more.

With the rise of artificial intelligence, many advanced Synthetic Aperture Radar (SAR) ship classifiers based on convolutional neural networks (CNNs) have achieved better accuracies than traditional hand-crafted feature ones. However, most existing CNN-based models uncritically abandon traditional hand-crafted features, and rely excessively on abstract ones of deep networks. This may be controversial, potentially creating challenges to improve classification performance further. Therefore, in view of this situation, this paper explores preliminarily the possibility of injection of traditional hand-crafted features into modern CNN-based models to further improve SAR ship classification accuracy. Specifically, we will—(1) illustrate what this injection technique is, (2) explain why it is needed, (3) discuss where it should be applied, and (4) describe how it is implemented. Experimental results on the two open three-category OpenSARShip-1.0 and seven-category FUSAR-Ship datasets indicate that it is effective to perform injection of traditional hand-crafted features into CNN-based models to improve classification accuracy. Notably, the maximum accuracy improvement reaches 6.75%. Hence, we hold the view that it is not advisable to abandon uncritically traditional hand-crafted features, because they can also play an important role in CNN-based models. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

26 pages, 13215 KiB

Open AccessArticle

Multi-Sector Oriented Object Detector for Accurate Localization in Optical Remote Sensing Images

by Xu He, Shiping Ma, Linyuan He, Le Ru and Chen Wang

Remote Sens. 2021, 13(10), 1921; https://doi.org/10.3390/rs13101921 - 14 May 2021

Cited by 5 | Viewed by 1941

Abstract

Oriented object detection in optical remote sensing images (ORSIs) is a challenging task since the targets in ORSIs are displayed in an arbitrarily oriented manner and on small scales, and are densely packed. Current state-of-the-art oriented object detection models used in ORSIs primarily [...] Read more.

Oriented object detection in optical remote sensing images (ORSIs) is a challenging task since the targets in ORSIs are displayed in an arbitrarily oriented manner and on small scales, and are densely packed. Current state-of-the-art oriented object detection models used in ORSIs primarily evolved from anchor-based and direct regression-based detection paradigms. Nevertheless, they still encounter a design difficulty from handcrafted anchor definitions and learning complexities in direct localization regression. To tackle these issues, in this paper, we proposed a novel multi-sector oriented object detection framework called MS

O^{2}

-Det, which quantizes the scales and orientation prediction of targets in ORSIs via an anchor-free classification-to-regression approach. Specifically, we first represented the arbitrarily oriented bounding box as four scale offsets and angles in four quadrant sectors of the corresponding Cartesian coordinate system. Then, we divided the scales and angle space into multiple discrete sectors and obtained more accurate localization information by a coarse-granularity classification to fine-grained regression strategy. In addition, to decrease the angular-sector classification loss and accelerate the network’s convergence, we designed a smooth angular-sector label (SASL) that smoothly distributes label values with a definite tolerance radius. Finally, we proposed a localization-aided detection score (LADS) to better represent the confidence of a detected box by combining the category-classification score and the sector-selection score. The proposed MS

O^{2}

-Det achieves state-of-the-art results on three widely used benchmarks, including the DOTA, HRSC2016, and UCAS-AOD data sets. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

20 pages, 24423 KiB

Open AccessArticle

Semi-Supervised Segmentation for Coastal Monitoring Seagrass Using RPA Imagery

by Brandon Hobley, Riccardo Arosio, Geoffrey French, Julie Bremner, Tony Dolphin and Michal Mackiewicz

Remote Sens. 2021, 13(9), 1741; https://doi.org/10.3390/rs13091741 - 30 Apr 2021

Cited by 23 | Viewed by 4206

Abstract

Intertidal seagrass plays a vital role in estimating the overall health and dynamics of coastal environments due to its interaction with tidal changes. However, most seagrass habitats around the globe have been in steady decline due to human impacts, disturbing the already delicate [...] Read more.

Intertidal seagrass plays a vital role in estimating the overall health and dynamics of coastal environments due to its interaction with tidal changes. However, most seagrass habitats around the globe have been in steady decline due to human impacts, disturbing the already delicate balance in the environmental conditions that sustain seagrass. Miniaturization of multi-spectral sensors has facilitated very high resolution mapping of seagrass meadows, which significantly improves the potential for ecologists to monitor changes. In this study, two analytical approaches used for classifying intertidal seagrass habitats are compared—Object-based Image Analysis (OBIA) and Fully Convolutional Neural Networks (FCNNs). Both methods produce pixel-wise classifications in order to create segmented maps. FCNNs are an emerging set of algorithms within Deep Learning. Conversely, OBIA has been a prominent solution within this field, with many studies leveraging in-situ data and multiresolution segmentation to create habitat maps. This work demonstrates the utility of FCNNs in a semi-supervised setting to map seagrass and other coastal features from an optical drone survey conducted at Budle Bay, Northumberland, England. Semi-supervision is also an emerging field within Deep Learning that has practical benefits of achieving state of the art results using only subsets of labelled data. This is especially beneficial for remote sensing applications where in-situ data is an expensive commodity. For our results, we show that FCNNs have comparable performance with the standard OBIA method used by ecologists. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

18 pages, 15703 KiB

Open AccessArticle

MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images

by Danilo Avola, Luigi Cinque, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Alessio Mecca, Daniele Pannone and Claudio Piciarelli

Remote Sens. 2021, 13(9), 1670; https://doi.org/10.3390/rs13091670 - 25 Apr 2021

Cited by 52 | Viewed by 6786

Abstract

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing [...] Read more.

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

20 pages, 3548 KiB

Open AccessArticle

LighterGAN: An Illumination Enhancement Method for Urban UAV Imagery

by Junshu Wang, Yue Yang, Yuan Chen and Yuxing Han

Remote Sens. 2021, 13(7), 1371; https://doi.org/10.3390/rs13071371 - 02 Apr 2021

Cited by 6 | Viewed by 2251

Abstract

In unmanned aerial vehicle based urban observation and monitoring, the performance of computer vision algorithms is inevitably limited by the low illumination and light pollution caused degradation, therefore, the application image enhancement is a considerable prerequisite for the performance of subsequent image processing [...] Read more.

In unmanned aerial vehicle based urban observation and monitoring, the performance of computer vision algorithms is inevitably limited by the low illumination and light pollution caused degradation, therefore, the application image enhancement is a considerable prerequisite for the performance of subsequent image processing algorithms. Therefore, we proposed a deep learning and generative adversarial network based model for UAV low illumination image enhancement, named LighterGAN. The design of LighterGAN refers to the CycleGAN model with two improvements—attention mechanism and semantic consistency loss—having been proposed to the original structure. Additionally, an unpaired dataset that was captured by urban UAV aerial photography has been used to train this unsupervised learning model. Furthermore, in order to explore the advantages of the improvements, both the performance in the illumination enhancement task and the generalization ability improvement of LighterGAN were proven in the comparative experiments combining subjective and objective evaluations. In the experiments with five cutting edge image enhancement algorithms, in the test set, LighterGAN achieved the best results in both visual perception and PIQE (perception based image quality evaluator, a MATLAB build-in function, the lower the score, the higher the image quality) score of enhanced images, scores were 4.91 and 11.75 respectively, better than EnlightenGAN the state-of-the-art. In the enhancement of low illumination sub-dataset

Y

(containing 2000 images), LighterGAN also achieved the lowest PIQE score of 12.37, 2.85 points lower than second place. Moreover, compared with the CycleGAN, the improvement of generalization ability was also demonstrated. In the test set generated images, LighterGAN was 6.66 percent higher than CycleGAN in subjective authenticity assessment and 3.84 lower in PIQE score, meanwhile, in the whole dataset generated images, the PIQE score of LighterGAN is 11.67, 4.86 lower than CycleGAN. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

31 pages, 10579 KiB

Open AccessArticle

Knowledge and Spatial Pyramid Distance-Based Gated Graph Attention Network for Remote Sensing Semantic Segmentation

by Wei Cui, Xin He, Meng Yao, Ziwei Wang, Yuanjie Hao, Jie Li, Weijie Wu, Huilin Zhao, Cong Xia, Jin Li and Wenqi Cui

Remote Sens. 2021, 13(7), 1312; https://doi.org/10.3390/rs13071312 - 30 Mar 2021

Cited by 11 | Viewed by 2913

Abstract

The pixel-based semantic segmentation methods take pixels as recognitions units, and are restricted by the limited range of receptive fields, so they cannot carry richer and higher-level semantics. These reduce the accuracy of remote sensing (RS) semantic segmentation to a certain extent. Comparing [...] Read more.

The pixel-based semantic segmentation methods take pixels as recognitions units, and are restricted by the limited range of receptive fields, so they cannot carry richer and higher-level semantics. These reduce the accuracy of remote sensing (RS) semantic segmentation to a certain extent. Comparing with the pixel-based methods, the graph neural networks (GNNs) usually use objects as input nodes, so they not only have relatively small computational complexity, but also can carry richer semantic information. However, the traditional GNNs are more rely on the context information of the individual samples and lack geographic prior knowledge that reflects the overall situation of the research area. Therefore, these methods may be disturbed by the confusion of “different objects with the same spectrum” or “violating the first law of geography” in some areas. To address the above problems, we propose a remote sensing semantic segmentation model called knowledge and spatial pyramid distance-based gated graph attention network (KSPGAT), which is based on prior knowledge, spatial pyramid distance and a graph attention network (GAT) with gating mechanism. The model first uses superpixels (geographical objects) to form the nodes of a graph neural network and then uses a novel spatial pyramid distance recognition algorithm to recognize the spatial relationships. Finally, based on the integration of feature similarity and the spatial relationships of geographic objects, a multi-source attention mechanism and gating mechanism are designed to control the process of node aggregation, as a result, the high-level semantics, spatial relationships and prior knowledge can be introduced into a remote sensing semantic segmentation network. The experimental results show that our model improves the overall accuracy by 4.43% compared with the U-Net Network, and 3.80% compared with the baseline GAT network. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

17 pages, 6801 KiB

Open AccessArticle

ICENETv2: A Fine-Grained River Ice Semantic Segmentation Network Based on UAV Images

by Xiuwei Zhang, Yang Zhou, Jiaojiao Jin, Yafei Wang, Minhao Fan, Ning Wang and Yanning Zhang

Remote Sens. 2021, 13(4), 633; https://doi.org/10.3390/rs13040633 - 10 Feb 2021

Cited by 14 | Viewed by 2433

Abstract

Accurate ice segmentation is one of the most crucial techniques for intelligent ice monitoring. Compared with ice segmentation, it can provide more information for ice situation analysis, change trend prediction, and so on. Therefore, the study of ice segmentation has important practical significance. [...] Read more.

Accurate ice segmentation is one of the most crucial techniques for intelligent ice monitoring. Compared with ice segmentation, it can provide more information for ice situation analysis, change trend prediction, and so on. Therefore, the study of ice segmentation has important practical significance. In this study, we focused on fine-grained river ice segmentation using unmanned aerial vehicle (UAV) images. This has the following difficulties: (1) The scale of river ice varies greatly in different images and even in the same image; (2) the same kind of river ice differs greatly in color, shape, texture, size, and so on; and (3) the appearances of different kinds of river ice sometimes appear similar due to the complex formation and change procedure. Therefore, to perform this study, the NWPU_YRCC2 dataset was built, in which all UAV images were collected in the Ningxia–Inner Mongolia reach of the Yellow River. Then, a novel semantic segmentation method based on deep convolution neural network, named ICENETv2, is proposed. To achieve multiscale accurate prediction, we design a multilevel features fusion framework, in which multi-scale high-level semantic features and lower-level finer features are effectively fused. Additionally, a dual attention module is adopted to highlight distinguishable characteristics, and a learnable up-sampling strategy is further used to improve the segmentation accuracy of the details. Experiments show that ICENETv2 achieves the state-of-the-art on the NWPU_YRCC2 dataset. Finally, our ICENETv2 is also applied to solve a realistic problem, calculating drift ice cover density, which is one of the most important factors to predict the freeze-up data of the river. The results demonstrate that the performance of ICENETv2 meets the actual application demand. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

24 pages, 20594 KiB

Open AccessArticle

MultEYE: Monitoring System for Real-Time Vehicle Detection, Tracking and Speed Estimation from UAV Imagery on Edge-Computing Platforms

by Navaneeth Balamuralidhar, Sofia Tilon and Francesco Nex

Remote Sens. 2021, 13(4), 573; https://doi.org/10.3390/rs13040573 - 05 Feb 2021

Cited by 58 | Viewed by 7696

Abstract

We present MultEYE, a traffic monitoring system that can detect, track, and estimate the velocity of vehicles in a sequence of aerial images. The presented solution has been optimized to execute these tasks in real-time on an embedded computer installed on an Unmanned [...] Read more.

We present MultEYE, a traffic monitoring system that can detect, track, and estimate the velocity of vehicles in a sequence of aerial images. The presented solution has been optimized to execute these tasks in real-time on an embedded computer installed on an Unmanned Aerial Vehicle (UAV). In order to overcome the limitation of existing object detection architectures related to accuracy and computational overhead, a multi-task learning methodology was employed by adding a segmentation head to an object detector backbone resulting in the MultEYE object detection architecture. On a custom dataset, it achieved 4.8% higher mean Average Precision (mAP) score, while being 91.4% faster than the state-of-the-art model and while being able to generalize to different real-world traffic scenes. Dedicated object tracking and speed estimation algorithms have been then optimized to track reliably objects from an UAV with limited computational effort. Different strategies to combine object detection, tracking, and speed estimation are discussed, too. From our experiments, the optimized detector runs at an average frame-rate of up to 29 frames per second (FPS) on frame resolution 512 × 320 on a Nvidia Xavier NX board, while the optimally combined detector, tracker and speed estimator pipeline achieves speeds of up to 33 FPS on an image of resolution 3072 × 1728. To our knowledge, the MultEYE system is one of the first traffic monitoring systems that was specifically designed and optimized for an UAV platform under real-world constraints. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

20 pages, 15000 KiB

Open AccessArticle

Sequence Image Interpolation via Separable Convolution Network

by Xing Jin, Ping Tang, Thomas Houet, Thomas Corpetti, Emilien Gence Alvarez-Vanhard and Zheng Zhang

Remote Sens. 2021, 13(2), 296; https://doi.org/10.3390/rs13020296 - 15 Jan 2021

Cited by 6 | Viewed by 2397

Abstract

Remote-sensing time-series data are significant for global environmental change research and a better understanding of the Earth. However, remote-sensing acquisitions often provide sparse time series due to sensor resolution limitations and environmental factors, such as cloud noise for optical data. Image interpolation is [...] Read more.

Remote-sensing time-series data are significant for global environmental change research and a better understanding of the Earth. However, remote-sensing acquisitions often provide sparse time series due to sensor resolution limitations and environmental factors, such as cloud noise for optical data. Image interpolation is the method that is often used to deal with this issue. This paper considers the deep learning method to learn the complex mapping of an interpolated intermediate image from predecessor and successor images, called separable convolution network for sequence image interpolation. The separable convolution network uses a separable 1D convolution kernel instead of 2D kernels to capture the spatial characteristics of input sequence images and then is trained end-to-end using sequence images. Our experiments, which were performed with unmanned aerial vehicle (UAV) and Landsat-8 datasets, show that the method is effective to produce high-quality time-series interpolated images, and the data-driven deep model can better simulate complex and diverse nonlinear image data information. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

21 pages, 15328 KiB

Open AccessArticle

Matching Large Baseline Oblique Stereo Images Using an End-to-End Convolutional Neural Network

by Guobiao Yao, Alper Yilmaz, Li Zhang, Fei Meng, Haibin Ai and Fengxiang Jin

Remote Sens. 2021, 13(2), 274; https://doi.org/10.3390/rs13020274 - 14 Jan 2021

Cited by 11 | Viewed by 2825

Abstract

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address [...] Read more.

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

20 pages, 7831 KiB

Open AccessArticle

A Practical Cross-View Image Matching Method between UAV and Satellite for UAV-Based Geo-Localization

by Lirong Ding, Ji Zhou, Lingxuan Meng and Zhiyong Long

Remote Sens. 2021, 13(1), 47; https://doi.org/10.3390/rs13010047 - 24 Dec 2020

Cited by 48 | Viewed by 5653

Abstract

Cross-view image matching has attracted extensive attention due to its huge potential applications, such as localization and navigation. Unmanned aerial vehicle (UAV) technology has been developed rapidly in recent years, and people have more opportunities to obtain and use UAV-view images than ever [...] Read more.

Cross-view image matching has attracted extensive attention due to its huge potential applications, such as localization and navigation. Unmanned aerial vehicle (UAV) technology has been developed rapidly in recent years, and people have more opportunities to obtain and use UAV-view images than ever before. However, the algorithms of cross-view image matching between the UAV view (oblique view) and the satellite view (vertical view) are still in their beginning stage, and the matching accuracy is expected to be further improved when applied in real situations. Within this context, in this study, we proposed a cross-view matching method based on location classification (hereinafter referred to LCM), in which the similarity between UAV and satellite views is considered, and we implemented the method with the newest UAV-based geo-localization dataset (University-1652). LCM is able to solve the imbalance of the input sample number between the satellite images and the UAV images. In the training stage, LCM can simplify the retrieval problem into a classification problem and consider the influence of the feature vector size on the matching accuracy. Compared with one study, LCM shows higher accuracies, and Recall@K (K ∈ {1, 5, 10}) and the average precision (AP) were improved by 5–10%. The expansion of satellite-view images and multiple queries proposed by the LCM are capable of improving the matching accuracy during the experiment. In addition, the influences of different feature sizes on the LCM’s accuracy are determined, and we found that 512 is the optimal feature size. Finally, the LCM model trained based on synthetic UAV-view images was evaluated in real-world situations, and the evaluation result shows that it still has satisfactory matching accuracy. The LCM can realize the bidirectional matching between the UAV-view image and the satellite-view image and can contribute to two applications: (i) UAV-view image localization (i.e., predicting the geographic location of UAV-view images based on satellite-view images with geo-tags) and (ii) UAV navigation (i.e., driving the UAV to the region of interest in the satellite-view image based on the flight record). Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

18 pages, 6316 KiB

Open AccessArticle

Learning to Track Aircraft in Infrared Imagery

by Sijie Wu, Kai Zhang, Shaoyi Li and Jie Yan

Remote Sens. 2020, 12(23), 3995; https://doi.org/10.3390/rs12233995 - 06 Dec 2020

Cited by 6 | Viewed by 3112

Abstract

Airborne target tracking in infrared imagery remains a challenging task. The airborne target usually has a low signal-to-noise ratio and shows different visual patterns. The features adopted in the visual tracking algorithm are usually deep features pre-trained on ImageNet, which are not tightly [...] Read more.

Airborne target tracking in infrared imagery remains a challenging task. The airborne target usually has a low signal-to-noise ratio and shows different visual patterns. The features adopted in the visual tracking algorithm are usually deep features pre-trained on ImageNet, which are not tightly coupled with the current video domain and therefore might not be optimal for infrared target tracking. To this end, we propose a new approach to learn the domain-specific features, which can be adapted to the current video online without pre-training on a large datasets. Considering that only a few samples of the initial frame can be used for online training, general feature representations are encoded to the network for a better initialization. The feature learning module is flexible and can be integrated into tracking frameworks based on correlation filters to improve the baseline method. Experiments on airborne infrared imagery are conducted to demonstrate the effectiveness of our tracking algorithm. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

26 pages, 3209 KiB

Open AccessArticle

Bathymetric Inversion and Uncertainty Estimation from Synthetic Surf-Zone Imagery with Machine Learning

by Adam M. Collins, Katherine L. Brodie, Andrew Spicer Bak, Tyler J. Hesser, Matthew W. Farthing, Jonghyun Lee and Joseph W. Long

Remote Sens. 2020, 12(20), 3364; https://doi.org/10.3390/rs12203364 - 15 Oct 2020

Cited by 15 | Viewed by 3496

Abstract

Resolving surf-zone bathymetry from high-resolution imagery typically involves measuring wave speeds and performing a physics-based inversion process using linear wave theory, or data assimilation techniques which combine multiple remotely sensed parameters with numerical models. In this work, we explored what types of coastal [...] Read more.

Resolving surf-zone bathymetry from high-resolution imagery typically involves measuring wave speeds and performing a physics-based inversion process using linear wave theory, or data assimilation techniques which combine multiple remotely sensed parameters with numerical models. In this work, we explored what types of coastal imagery can be best utilized in a 2-dimensional fully convolutional neural network to directly estimate nearshore bathymetry from optical expressions of wave kinematics. Specifically, we explored utilizing time-averaged images (timex) of the surf-zone, which can be used as a proxy for wave dissipation, as well as including a single-frame image input, which has visible patterns of wave refraction and instantaneous expressions of wave breaking. Our results show both types of imagery can be used to estimate nearshore bathymetry. However, the single-frame imagery provides more complete information across the domain, decreasing the error over the test set by approximately 10% relative to using timex imagery alone. A network incorporating both inputs had the best performance, with an overall root-mean-squared-error of 0.39 m. Activation maps demonstrate the additional information provided by the single-frame imagery in non-breaking wave areas which aid in prediction. Uncertainty in model predictions is explored through three techniques (Monte Carlo (MC) dropout, infer-transformation, and infer-noise) to provide additional actionable information about the spatial reliability of each bathymetric prediction. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

21 pages, 4258 KiB

Open AccessArticle

Distributed Training and Inference of Deep Learning Models for Multi-Modal Land Cover Classification

by Maria Aspri, Grigorios Tsagkatakis and Panagiotis Tsakalides

Remote Sens. 2020, 12(17), 2670; https://doi.org/10.3390/rs12172670 - 19 Aug 2020

Cited by 14 | Viewed by 4561

Abstract

Deep Neural Networks (DNNs) have established themselves as a fundamental tool in numerous computational modeling applications, overcoming the challenge of defining use-case-specific feature extraction processing by incorporating this stage into unified end-to-end trainable models. Despite their capabilities in modeling, training large-scale DNN models [...] Read more.

Deep Neural Networks (DNNs) have established themselves as a fundamental tool in numerous computational modeling applications, overcoming the challenge of defining use-case-specific feature extraction processing by incorporating this stage into unified end-to-end trainable models. Despite their capabilities in modeling, training large-scale DNN models is a very computation-intensive task that most single machines are often incapable of accomplishing. To address this issue, different parallelization schemes were proposed. Nevertheless, network overheads as well as optimal resource allocation pose as major challenges, since network communication is generally slower than intra-machine communication while some layers are more computationally expensive than others. In this work, we consider a novel multimodal DNN based on the Convolutional Neural Network architecture and explore several different ways to optimize its performance when training is executed on an Apache Spark Cluster. We evaluate the performance of different architectures via the metrics of network traffic and processing power, considering the case of land cover classification from remote sensing observations. Furthermore, we compare our architectures with an identical DNN architecture modeled after a data parallelization approach by using the metrics of classification accuracy and inference execution time. The experiments show that the way a model is parallelized has tremendous effect on resource allocation and hyperparameter tuning can reduce network overheads. Experimental results also demonstrate that proposed model parallelization schemes achieve more efficient resource use and more accurate predictions compared to data parallelization approaches. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

21 pages, 1227 KiB

Open AccessArticle

Effective Training of Deep Convolutional Neural Networks for Hyperspectral Image Classification through Artificial Labeling

by Wojciech Masarczyk, Przemysław Głomb, Bartosz Grabowski and Mateusz Ostaszewski

Remote Sens. 2020, 12(16), 2653; https://doi.org/10.3390/rs12162653 - 17 Aug 2020

Cited by 17 | Viewed by 4728

Abstract

Hyperspectral imaging is a rich source of data, allowing for a multitude of effective applications. However, such imaging remains challenging because of large data dimension and, typically, a small pool of available training examples. While deep learning approaches have been shown to be [...] Read more.

Hyperspectral imaging is a rich source of data, allowing for a multitude of effective applications. However, such imaging remains challenging because of large data dimension and, typically, a small pool of available training examples. While deep learning approaches have been shown to be successful in providing effective classification solutions, especially for high dimensional problems, unfortunately they work best with a lot of labelled examples available. The transfer learning approach can be used to alleviate the second requirement for a particular dataset: first the network is pre-trained on some dataset with large amount of training labels available, then the actual dataset is used to fine-tune the network. This strategy is not straightforward to apply with hyperspectral images, as it is often the case that only one particular image of some type or characteristic is available. In this paper, we propose and investigate a simple and effective strategy of transfer learning that uses unsupervised pre-training step without label information. This approach can be applied to many of the hyperspectral classification problems. The performed experiments show that it is very effective at improving the classification accuracy without being restricted to a particular image type or neural network architecture. The experiments were carried out on several deep neural network architectures and various sizes of labeled training sets. The greatest improvement in overall accuracy on the Indian Pines and Pavia University datasets is over 21 and 13 percentage points, respectively. An additional advantage of the proposed approach is the unsupervised nature of the pre-training step, which can be done immediately after image acquisition, without the need of the potentially costly expert’s time. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

21 pages, 1969 KiB

Open AccessEditor’s ChoiceArticle

Neural Network Training for the Detection and Classification of Oceanic Mesoscale Eddies

by Oliverio J. Santana, Daniel Hernández-Sosa, Jeffrey Martz and Ryan N. Smith

Remote Sens. 2020, 12(16), 2625; https://doi.org/10.3390/rs12162625 - 14 Aug 2020

Cited by 22 | Viewed by 4166

Abstract

Recent advances in deep learning have made it possible to use neural networks for the detection and classification of oceanic mesoscale eddies from satellite altimetry data. Various neural network models have been proposed in recent years to address this challenge, but they have [...] Read more.

Recent advances in deep learning have made it possible to use neural networks for the detection and classification of oceanic mesoscale eddies from satellite altimetry data. Various neural network models have been proposed in recent years to address this challenge, but they have been trained using different types of input data and evaluated using different performance metrics, making a comparison between them impossible. In this article, we examine the most common dataset and metric choices, by analyzing the reasons for the divergences between them and pointing out the most appropriate choice to obtain a fair evaluation in this scenario. Based on this comparative study, we have developed several neural network models to detect and classify oceanic eddies from satellite images, showing that our most advanced models perform better than the models previously proposed in the literature. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

22 pages, 5295 KiB

Open AccessArticle

R²FA-Det: Delving into High-Quality Rotatable Boxes for Ship Detection in SAR Images

by Shiqi Chen, Jun Zhang and Ronghui Zhan

Remote Sens. 2020, 12(12), 2031; https://doi.org/10.3390/rs12122031 - 24 Jun 2020

Cited by 34 | Viewed by 2800

Abstract

Recently, convolutional neural network (CNN)-based methods have been extensively explored for ship detection in synthetic aperture radar (SAR) images due to their powerful feature representation abilities. However, there are still several obstacles hindering the development. First, ships appear in various scenarios, which makes [...] Read more.

Recently, convolutional neural network (CNN)-based methods have been extensively explored for ship detection in synthetic aperture radar (SAR) images due to their powerful feature representation abilities. However, there are still several obstacles hindering the development. First, ships appear in various scenarios, which makes it difficult to exclude the disruption of the cluttered background. Second, it becomes more complicated to precisely locate the targets with large aspect ratios, arbitrary orientations and dense distributions. Third, the trade-off between accurate localization and improved detection efficiency needs to be considered. To address these issues, this paper presents a rotate refined feature alignment detector (R

^{2}

FA-Det), which ingeniously balances the quality of bounding box prediction and the high speed of the single-stage framework. Specifically, first, we devise a lightweight non-local attention module and embed it into the stem network. The recalibration of features not only strengthens the object-related features yet adequately suppresses the background interference. In addition, both forms of anchors are integrated into our modified anchor mechanism and thus can enable better representation of densely arranged targets with less computation burden. Furthermore, considering the shortcoming of the feature misalignment existing in the cascaded refinement scheme, a feature-guided alignment module which encodes both the position and shape information of current refined anchors into the feature points is adopted. Extensive experimental validations on two SAR ship datasets are performed and the results demonstrate that our algorithm has higher accuracy with faster speed than some state-of-the-art methods. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

19 pages, 5456 KiB

Open AccessArticle

Residual Dense Network Based on Channel-Spatial Attention for the Scene Classification of a High-Resolution Remote Sensing Image

by Xiaolei Zhao, Jing Zhang, Jimiao Tian, Li Zhuo and Jie Zhang

Remote Sens. 2020, 12(11), 1887; https://doi.org/10.3390/rs12111887 - 10 Jun 2020

Cited by 33 | Viewed by 4010

Abstract

The scene classification of a remote sensing image has been widely used in various fields as an important task of understanding the content of a remote sensing image. Specially, a high-resolution remote sensing scene contains rich information and complex content. Considering that the [...] Read more.

The scene classification of a remote sensing image has been widely used in various fields as an important task of understanding the content of a remote sensing image. Specially, a high-resolution remote sensing scene contains rich information and complex content. Considering that the scene content in a remote sensing image is very tight to the spatial relationship characteristics, how to design an effective feature extraction network directly decides the quality of classification by fully mining the spatial information in a high-resolution remote sensing image. In recent years, convolutional neural networks (CNNs) have achieved excellent performance in remote sensing image classification, especially the residual dense network (RDN) as one of the representative networks of CNN, which shows a stronger feature learning ability as it fully utilizes all the convolutional layer information. Therefore, we design an RDN based on channel-spatial attention for scene classification of a high-resolution remote sensing image. First, multi-layer convolutional features are fused with residual dense blocks. Then, a channel-spatial attention module is added to obtain more effective feature representation. Finally, softmax classifier is applied to classify the scene after adopting data augmentation strategy for meeting the training requirements of the network parameters. Five experiments are conducted on the UC Merced Land-Use Dataset (UCM) and Aerial Image Dataset (AID), and the competitive results demonstrate that our method can extract more effective features and is more conducive to classifying a scene. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

Other

Jump to: Research

19 pages, 26833 KiB

Open AccessTechnical Note

Semi-Automated Semantic Segmentation of Arctic Shorelines Using Very High-Resolution Airborne Imagery, Spectral Indices and Weakly Supervised Machine Learning Approaches

by Bibek Aryal, Stephen M. Escarzaga, Sergio A. Vargas Zesati, Miguel Velez-Reyes, Olac Fuentes and Craig Tweedie

Remote Sens. 2021, 13(22), 4572; https://doi.org/10.3390/rs13224572 - 14 Nov 2021

Cited by 9 | Viewed by 2502

Abstract

Precise coastal shoreline mapping is essential for monitoring changes in erosion rates, surface hydrology, and ecosystem structure and function. Monitoring water bodies in the Arctic National Wildlife Refuge (ANWR) is of high importance, especially considering the potential for oil and natural gas exploration [...] Read more.

Precise coastal shoreline mapping is essential for monitoring changes in erosion rates, surface hydrology, and ecosystem structure and function. Monitoring water bodies in the Arctic National Wildlife Refuge (ANWR) is of high importance, especially considering the potential for oil and natural gas exploration in the region. In this work, we propose a modified variant of the Deep Neural Network based U-Net Architecture for the automated mapping of 4 Band Orthorectified NOAA Airborne Imagery using sparsely labeled training data and compare it to the performance of traditional Machine Learning (ML) based approaches—namely, random forest, xgboost—and spectral water indices—Normalized Difference Water Index (NDWI), and Normalized Difference Surface Water Index (NDSWI)—to support shoreline mapping of Arctic coastlines. We conclude that it is possible to modify the U-Net model to accept sparse labels as input and the results are comparable to other ML methods (an Intersection-over-Union (IoU) of 94.86% using U-Net vs. an IoU of 95.05% using the best performing method). Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Figure 1

17 pages, 5599 KiB

Open AccessTechnical Note

Detection of Invasive Species in Wetlands: Practical DL with Heavily Imbalanced Data

by Mariano Cabezas, Sarah Kentsch, Luca Tomhave, Jens Gross, Maximo Larry Lopez Caceres and Yago Diez

Remote Sens. 2020, 12(20), 3431; https://doi.org/10.3390/rs12203431 - 19 Oct 2020

Cited by 13 | Viewed by 2636

Abstract

Deep Learning (DL) has become popular due to its ease of use and accuracy, with Transfer Learning (TL) effectively reducing the number of images needed to solve environmental problems. However, this approach has some limitations which we set out to explore: Our goal [...] Read more.

Deep Learning (DL) has become popular due to its ease of use and accuracy, with Transfer Learning (TL) effectively reducing the number of images needed to solve environmental problems. However, this approach has some limitations which we set out to explore: Our goal is to detect the presence of an invasive blueberry species in aerial images of wetlands. This is a key problem in ecosystem protection which is also challenging in terms of DL due to the severe imbalance present in the data. Results for the ResNet50 network show a high classification accuracy while largely ignoring the blueberry class, rendering these results of limited practical interest to detect that specific class. Moreover, by using loss function weighting and data augmentation results more akin to our practical application, our goals can be obtained. Our experiments regarding TL show that ImageNet weights do not produce satisfactory results when only the final layer of the network is trained. Furthermore, only minor gains are obtained compared with random weights when the whole network is retrained. Finally, in a study of state-of-the-art DL architectures best results were obtained by the ResNeXt architecture with 93.75 True Positive Rate and 98.11 accuracy for the Blueberry class with ResNet50, Densenet, and wideResNet obtaining close results. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

► Show Figures

Graphical abstract

Journal Menu

Journal Browser

Computer Vision and Deep Learning for Remote Sensing Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (28 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI