remotesensing-logo

Journal Browser

Journal Browser

Semantic Segmentation of High-Resolution Images with Deep Learning, Volume II

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Remote Sensing Image Processing".

Deadline for manuscript submissions: closed (31 March 2023) | Viewed by 33220

Special Issue Editors

ICT Convergence Research Center, Kumoh National Institute of Technology, Gumi 39177, Korea
Interests: radio signal processing in 5G networks; signal identification; waveform and modulation recognition; channel estimation in wireless communications; machine learning and deep learning for visual applications and communications
Special Issues, Collections and Topics in MDPI journals
School of Computer and Software, Nanjing University of Information Science and Technology, No. 219 Ningliu Road, Nanjing 210044, China
Interests: hyperspectral remote sensing image processing (including: unmixing, classification, fusion); deep learning
Special Issues, Collections and Topics in MDPI journals
College of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136 Kexue Street, Zhengzhou 450000, China
Interests: image processing; multispectral image analysis; Pan-sharpening; object detection of remote sensing images; deep learning

Special Issue Information

Dear Colleagues,

In recent years, semantic segmentation has been an open research topic in image processing and computer vision. With the rapid development of deep learning (DL), this topic is attracting more and more attention. Numerous high-impact DL models of convolutional neural networks (CNNs), fully convolutional networks (FCNs), graph convolutional networks (GCN) and Transformers have been introduced for semantic segmentation. They have remarkable performance in a wide range of applications from scene understanding for autonomous driving to skin lesion segmentation for medical diagnosis and hyperspectral/multispectral image segmentation for remote sensing.

Thanks to advanced spectral imaging and aerial photography technology, a large number of aerial multispectral and hyperspectral images can be captured conveniently and quickly, which can be used for remote sensing applications such as forest-cover measurement, land-use survey and urban-planning estimation. Despite the fruitful results of DL-based semantic segmentation of natural images, there are still many challenges for pixel-level or superpixel-level classification/segmentation of remote sensing images (RSIs) including multispectral and hyperspectral images using DL methods.

Different from natural images, high-resolution RSIs contain numerous object categories with the presence of redundant object details; therefore, in addition to taking into account the specific characteristics of RSIs (e.g., more channels and higher intensity values), a semantic segmentation method has to effectively handle interclass distinction and intraclass consistence. Additionally, feeding a full high-resolution image as an input to a DL model is nearly impossible, where the computational complexity of a segmentation system increases excessively. Some current approaches accept sacrificing some segmentation accuracy to boost the processing speed of a system via some ideas regarding spatial-based image decomposition. For this Special Issue, we are soliciting original contributions of pioneer researchers on high-performance semantic segmentation of high-resolution RSIs, which exploits deep learning to address the aforementioned theoretical problems.

Dr. Thien Huynh-The
Dr. Sun Le
Dr. Huang Wei
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image segmentation
  • pixel-wise classification/segmentation
  • convolutional neural networks/graph convolutional networks/transformers
  • deep learning
  • scene/object segmentation
  • high-resolution/super-pixel remote sensing image segmentation
  • hyperspectral/multispectral/aerial image analysis

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

20 pages, 9197 KiB  
Article
BSFCDet: Bidirectional Spatial–Semantic Fusion Network Coupled with Channel Attention for Object Detection in Satellite Images
by Xinchi Wei, Yan Zhang and Yuhui Zheng
Remote Sens. 2023, 15(13), 3213; https://doi.org/10.3390/rs15133213 - 21 Jun 2023
Viewed by 830
Abstract
Due to the increasing maturity of deep learning and remote sensing technology, the performance of object detection in satellite images has significantly improved and plays an important role in military reconnaissance, urban planning, and agricultural monitoring. However, satellite images have challenges such as [...] Read more.
Due to the increasing maturity of deep learning and remote sensing technology, the performance of object detection in satellite images has significantly improved and plays an important role in military reconnaissance, urban planning, and agricultural monitoring. However, satellite images have challenges such as small objects, multiscale objects, and complex backgrounds. To solve these problems, a lightweight object detection model named BSFCDet is proposed. First, fast spatial pyramid pooling (SPPF-G) is designed for feature fusion to enrich the spatial information of small targets. Second, a three-layer bidirectional feature pyramid network (BiFPN-G) is suggested to integrate the deep feature’s semantic information with the shallow feature’s spatial information, thus improving the scale adaptability of the model. Third, a novel efficient channel attention (ECAM) is proposed to reduce background interference. Last, a new residual block (Resblock_M) is constructed to balance accuracy and speed. BSFCDet achieves high detection performance while satisfying real-time performance, according to experimental results. Full article
Show Figures

Graphical abstract

19 pages, 4822 KiB  
Article
Detecting High-Resolution Adversarial Images with Few-Shot Deep Learning
by Junjie Zhao, Junfeng Wu, James Msughter Adeke, Sen Qiao and Jinwei Wang
Remote Sens. 2023, 15(9), 2379; https://doi.org/10.3390/rs15092379 - 30 Apr 2023
Viewed by 1391
Abstract
Deep learning models have enabled significant performance improvements to remote sensing image processing. Usually, a large number of training samples is required for detection models. In this study, a dynamic simulation training strategy is designed to generate samples in real time during training. [...] Read more.
Deep learning models have enabled significant performance improvements to remote sensing image processing. Usually, a large number of training samples is required for detection models. In this study, a dynamic simulation training strategy is designed to generate samples in real time during training. The few adversarial examples are not only directly involved in the training but are also used to fit the distribution model of adversarial noise, helping the real-time generated samples to be similar to adversarial examples. The noise of the training samples is randomly generated according to the distribution model, and the random variation of training inputs reduces the overfitting phenomenon. To enhance the detectability of adversarial noise, the input model is no longer a normalized image but a JPEG error image. Experiments show that with the proposed dynamic simulation training strategy, common classification models such as ResNet and DenseNet can effectively detect adversarial examples. Full article
Show Figures

Figure 1

25 pages, 18095 KiB  
Article
Improving Semantic Segmentation of Roof Segments Using Large-Scale Datasets Derived from 3D City Models and High-Resolution Aerial Imagery
by Florian L. Faltermeier, Sebastian Krapf, Bruno Willenborg and Thomas H. Kolbe
Remote Sens. 2023, 15(7), 1931; https://doi.org/10.3390/rs15071931 - 04 Apr 2023
Cited by 1 | Viewed by 2487
Abstract
Advances in deep learning techniques for remote sensing as well as the increased availability of high-resolution data enable the extraction of more detailed information from aerial images. One promising task is the semantic segmentation of roof segments and their orientation. However, the lack [...] Read more.
Advances in deep learning techniques for remote sensing as well as the increased availability of high-resolution data enable the extraction of more detailed information from aerial images. One promising task is the semantic segmentation of roof segments and their orientation. However, the lack of annotated data is a major barrier for deploying respective models on a large scale. Previous research demonstrated the viability of the deep learning approach for the task, but currently, published datasets are small-scale, manually labeled, and rare. Therefore, this paper extends the state of the art by presenting a novel method for the automated generation of large-scale datasets based on semantic 3D city models. Furthermore, we train a model on a dataset 50 times larger than existing datasets and achieve superior performance while applying it to a wider variety of buildings. We evaluate the approach by comparing networks trained on four dataset configurations, including an existing dataset and our novel large-scale dataset. The results show that the network performance measured as intersection over union can be increased from 0.60 for the existing dataset to 0.70 when the large-scale model is applied on the same region. The large-scale model performs superiorly even when applied to more diverse test samples, achieving 0.635. The novel approach contributes to solving the dataset bottleneck and consequently to improving semantic segmentation of roof segments. The resulting remotely sensed information is crucial for applications such as solar potential analysis or urban planning. Full article
Show Figures

Figure 1

21 pages, 10710 KiB  
Article
Automatic Extraction of Bare Soil Land from High-Resolution Remote Sensing Images Based on Semantic Segmentation with Deep Learning
by Chen He, Yalan Liu, Dacheng Wang, Shufu Liu, Linjun Yu and Yuhuan Ren
Remote Sens. 2023, 15(6), 1646; https://doi.org/10.3390/rs15061646 - 18 Mar 2023
Cited by 6 | Viewed by 1609
Abstract
Accurate monitoring of bare soil land (BSL) is an urgent need for environmental governance and optimal utilization of land resources. High-resolution imagery contains rich semantic information, which is beneficial for the recognition of objects on the ground. Simultaneously, it is susceptible to the [...] Read more.
Accurate monitoring of bare soil land (BSL) is an urgent need for environmental governance and optimal utilization of land resources. High-resolution imagery contains rich semantic information, which is beneficial for the recognition of objects on the ground. Simultaneously, it is susceptible to the impact of its background. We propose a semantic segmentation model, Deeplabv3+-M-CBAM, for extracting BSL. First, we replaced the Xception of Deeplabv3+ with MobileNetV2 as the backbone network to reduce the number of parameters. Second, to distinguish BSL from the background, we employed the convolutional block attention module (CBAM) via a combination of channel attention and spatial attention. For model training, we built a BSL dataset based on BJ-2 satellite images. The test result for the F1 of the model was 88.42%. Compared with Deeplabv3+, the classification accuracy improved by 8.52%, and the segmentation speed was 2.34 times faster. In addition, compared with the visual interpretation, the extraction speed improved by 11.5 times. In order to verify the transferable performance of the model, Jilin-1GXA images were used for the transfer test, and the extraction accuracies for F1, IoU, recall and precision were 86.07%, 87.88%, 87.00% and 95.80%, respectively. All of these experiments show that Deeplabv3+-M-CBAM achieved efficient and accurate extraction results and a well transferable performance for BSL. The methodology proposed in this study exhibits its application value for the refinement of environmental governance and the surveillance of land use. Full article
Show Figures

Graphical abstract

19 pages, 16547 KiB  
Article
High-Resolution Semantic Segmentation of Woodland Fires Using Residual Attention UNet and Time Series of Sentinel-2
by Zeinab Shirvani, Omid Abdi and Rosa C. Goodman
Remote Sens. 2023, 15(5), 1342; https://doi.org/10.3390/rs15051342 - 28 Feb 2023
Cited by 5 | Viewed by 2278
Abstract
Southern Africa experiences a great number of wildfires, but the dependence on low-resolution products to detect and quantify fires means both that there is a time lag and that many small fire events are never identified. This is particularly relevant in miombo woodlands, [...] Read more.
Southern Africa experiences a great number of wildfires, but the dependence on low-resolution products to detect and quantify fires means both that there is a time lag and that many small fire events are never identified. This is particularly relevant in miombo woodlands, where fires are frequent and predominantly small. We developed a cutting-edge deep-learning-based approach that uses freely available Sentinel-2 data for near-real-time, high-resolution fire detection in Mozambique. The importance of Sentinel-2 main bands and their derivatives was evaluated using TreeNet, and the top five variables were selected to create three training datasets. We designed a UNet architecture, including contraction and expansion paths and a bridge between them with several layers and functions. We then added attention gate units (AUNet) and residual blocks and attention gate units (RAUNet) to the UNet architecture. We trained the three models with the three datasets. The efficiency of all three models was high (intersection over union (IoU) > 0.85) and increased with more variables. This is the first time an RAUNet architecture has been used to detect fire events, and it performed better than the UNet and AUNet models—especially for detecting small fires. The RAUNet model with five variables had IoU = 0.9238 and overall accuracy = 0.985. We suggest that others test the RAUNet model with large datasets from different regions and other satellites so that it may be applied more broadly to improve the detection of wildfires. Full article
Show Figures

Graphical abstract

18 pages, 28212 KiB  
Article
Water Body Extraction from Sentinel-2 Imagery with Deep Convolutional Networks and Pixelwise Category Transplantation
by Joshua Billson, MD Samiul Islam, Xinyao Sun and Irene Cheng
Remote Sens. 2023, 15(5), 1253; https://doi.org/10.3390/rs15051253 - 24 Feb 2023
Cited by 3 | Viewed by 2825
Abstract
A common task in land-cover classification is water body extraction, wherein each pixel in an image is labelled as either water or background. Water body detection is integral to the field of urban hydrology, with applications ranging from early flood warning to water [...] Read more.
A common task in land-cover classification is water body extraction, wherein each pixel in an image is labelled as either water or background. Water body detection is integral to the field of urban hydrology, with applications ranging from early flood warning to water resource management. Although traditional index-based methods such as the Normalized Difference Water Index (NDWI) and the Modified Normalized Difference Water Index (MNDWI) have been used to detect water bodies for decades, deep convolutional neural networks (DCNNs) have recently demonstrated promising results. However, training these networks requires access to large quantities of high-quality and accurately labelled data, which is often lacking in the field of remotely sensed imagery. Another challenge stems from the fact that the category of interest typically occupies only a small portion of an image and is thus grossly underrepresented in the data. We propose a novel approach to data augmentation—pixelwise category transplantation (PCT)—as a potential solution to both of these problems. Experimental results demonstrate PCT’s ability to improve performance on a variety of models and datasets, achieving an average improvement of 0.749 mean intersection over union (mIoU). Moreover, PCT enables us to outperform the previous high score achieved on the same dataset without introducing a new model architecture. We also explore the suitability of several state-of-the-art segmentation models and loss functions on the task of water body extraction. Finally, we address the shortcomings of previous works by assessing each model on RGB, NIR, and multispectral features to ascertain the relative advantages of each approach. In particular, we find a significant benefit to the inclusion of multispectral bands, with such methods outperforming visible-spectrum models by an average of 4.193 mIoU. Full article
Show Figures

Graphical abstract

18 pages, 26908 KiB  
Article
Adaptive Slicing-Aided Hyper Inference for Small Object Detection in High-Resolution Remote Sensing Images
by Hao Zhang, Chuanyan Hao, Wanru Song, Bo Jiang and Baozhu Li
Remote Sens. 2023, 15(5), 1249; https://doi.org/10.3390/rs15051249 - 24 Feb 2023
Cited by 7 | Viewed by 3243
Abstract
In the field of object detection, deep learning models have achieved great success in recent years. Despite these advances, detecting small objects remains difficult. Most objects in aerial images have features that are a challenge for traditional object detection techniques, including small size, [...] Read more.
In the field of object detection, deep learning models have achieved great success in recent years. Despite these advances, detecting small objects remains difficult. Most objects in aerial images have features that are a challenge for traditional object detection techniques, including small size, high density, high variability, and varying orientation. Previous approaches have used slicing methods on high-resolution images or feature maps to improve performance. However, existing slicing methods inevitably lead to redundant computation. Therefore, in this article we present a novel adaptive slicing method named ASAHI (Adaptive Slicing Aided Hyper Inference), which can dramatically reduce redundant computation using an adaptive slicing size. Specifically, ASAHI focuses on the number of slices rather than the slicing size, that is, it adaptively adjusts the slicing size to control the number of slices according to the image resolution. Additionally, we replace the standard non-maximum suppression technique with Cluster-DIoU-NMS due to its improved accuracy and inference speed in the post-processing stage. In extensive experiments, ASAHI achieves competitive performance on the VisDrone and xView datasets. The results show that the mAP50 is increased by 0.9% and the computation time is reduced by 20–25% compared with state-of-the-art slicing methods on the TPH-YOLOV5 pretrained model. On the VisDrone2019-DET-val dataset, our mAP50 result is 56.4% higher, demonstrating the superiority of our approach. Full article
Show Figures

Graphical abstract

19 pages, 4940 KiB  
Article
Quantitative Short-Term Precipitation Model Using Multimodal Data Fusion Based on a Cross-Attention Mechanism
by Yingjie Cui, Yunan Qiu, Le Sun, Xinyao Shu and Zhenyu Lu
Remote Sens. 2022, 14(22), 5839; https://doi.org/10.3390/rs14225839 - 18 Nov 2022
Cited by 3 | Viewed by 1538
Abstract
Short-term precipitation prediction through abundant observation data (ground observation station data, radar data, etc.) is an essential part of the contemporary meteorological prediction system. However, most current studies only use single-modal data, which leads to some problems, such as poor prediction accuracy and [...] Read more.
Short-term precipitation prediction through abundant observation data (ground observation station data, radar data, etc.) is an essential part of the contemporary meteorological prediction system. However, most current studies only use single-modal data, which leads to some problems, such as poor prediction accuracy and little prediction timeliness. This paper proposes a multimodal data fusion precipitation prediction model integrating station data and radar data. Specifically, our model consists of three parts. Firstly, the radar feature encoder comprises a shallow convolution neural network and a stacked convolutional long short term memory network (ConvLSTM), which is used to extract the spatio-temporal features of radar-echo data. The weather station data feature encoder is composed of a fully connected network and an LSTM, which is used to extract the sequential features of the weather station data. Then, the cross-modal feature encoder obtains cross-modal features by aligning and exchanging the feature information of the radar data and the weather station data through the cross-attention mechanism. Finally, the decoder outputs the quantitative short-term precipitation prediction value. Our model can integrate station and radar data characteristics and improve prediction accuracy and timeliness, and can flexibly add other modal features. We have verified our model on four short-term and impending rainfall datasets in South Eastern China, achieving the best performance among the algorithms. Full article
Show Figures

Graphical abstract

17 pages, 5507 KiB  
Article
SealNet 2.0: Human-Level Fully-Automated Pack-Ice Seal Detection in Very-High-Resolution Satellite Imagery with CNN Model Ensembles
by Bento C. Gonçalves, Michael Wethington and Heather J. Lynch
Remote Sens. 2022, 14(22), 5655; https://doi.org/10.3390/rs14225655 - 09 Nov 2022
Cited by 4 | Viewed by 2421
Abstract
Pack-ice seals are key indicator species in the Southern Ocean. Their large size (2–4 m) and continent-wide distribution make them ideal candidates for monitoring programs via very-high-resolution satellite imagery. The sheer volume of imagery required, however, hampers our ability to rely on manual [...] Read more.
Pack-ice seals are key indicator species in the Southern Ocean. Their large size (2–4 m) and continent-wide distribution make them ideal candidates for monitoring programs via very-high-resolution satellite imagery. The sheer volume of imagery required, however, hampers our ability to rely on manual annotation alone. Here, we present SealNet 2.0, a fully automated approach to seal detection that couples a sea ice segmentation model to find potential seal habitats with an ensemble of semantic segmentation convolutional neural network models for seal detection. Our best ensemble attains 0.806 precision and 0.640 recall on an out-of-sample test dataset, surpassing two trained human observers. Built upon the original SealNet, it outperforms its predecessor by using annotation datasets focused on sea ice only, a comprehensive hyperparameter study leveraging substantial high-performance computing resources, and post-processing through regression head outputs and segmentation head logits at predicted seal locations. Even with a simplified version of our ensemble model, using AI predictions as a guide dramatically boosted the precision and recall of two human experts, showing potential as a training device for novice seal annotators. Like human observers, the performance of our automated approach deteriorates with terrain ruggedness, highlighting the need for statistical treatment to draw global population estimates from AI output. Full article
Show Figures

Figure 1

18 pages, 4885 KiB  
Article
Coupled Tensor Block Term Decomposition with Superpixel-Based Graph Laplacian Regularization for Hyperspectral Super-Resolution
by Hongyi Liu, Wen Jiang, Yuchen Zha and Zhihui Wei
Remote Sens. 2022, 14(18), 4520; https://doi.org/10.3390/rs14184520 - 09 Sep 2022
Cited by 2 | Viewed by 1509
Abstract
Hyperspectral image (HSI) super-resolution aims at improving the spatial resolution of HSI by fusing a high spatial resolution multispectral image (MSI). To preserve local submanifold structures in HSI super-resolution, a novel superpixel graph-based super-resolution method is proposed. Firstly, the MSI is segmented into [...] Read more.
Hyperspectral image (HSI) super-resolution aims at improving the spatial resolution of HSI by fusing a high spatial resolution multispectral image (MSI). To preserve local submanifold structures in HSI super-resolution, a novel superpixel graph-based super-resolution method is proposed. Firstly, the MSI is segmented into superpixel blocks to form two-directional feature tensors, then two graphs are created using spectral–spatial distance between the unfolded feature tensors. Secondly, two graph Laplacian terms involving underlying BTD factors of high-resolution HSI are developed, which ensures the inheritance of the spatial geometric structures. Finally, by incorporating graph Laplacian priors with the coupled BTD degradation model, a HSI super-resolution model is established. Experimental results demonstrate that the proposed method achieves better fused results compared with other advanced super-resolution methods, especially on the improvement of the spatial structure. Full article
Show Figures

Graphical abstract

17 pages, 13872 KiB  
Article
Semi-Supervised Contrastive Learning for Few-Shot Segmentation of Remote Sensing Images
by Yadang Chen, Chenchen Wei, Duolin Wang, Chuanjun Ji and Baozhu Li
Remote Sens. 2022, 14(17), 4254; https://doi.org/10.3390/rs14174254 - 29 Aug 2022
Cited by 8 | Viewed by 2520
Abstract
Deep learning has been widely used in remote sensing image segmentation, while a lack of training data remains a significant issue. The few-shot segmentation of remote sensing images refers to the segmenting of novel classes with a few annotated samples. Although the few-shot [...] Read more.
Deep learning has been widely used in remote sensing image segmentation, while a lack of training data remains a significant issue. The few-shot segmentation of remote sensing images refers to the segmenting of novel classes with a few annotated samples. Although the few-shot segmentation of remote sensing images method based on meta-learning can get rid of the dependence on large data training, the generalization ability of the model is still low. This work presents a few-shot segmentation of remote sensing images with a self-supervised background learner to boost the generalization capacity for unseen categories to handle this challenge. The methodology in this paper is divided into two main modules: a meta learner and a background learner. The background learner supervises the feature extractor to learning latent categories in the image background. The meta learner expands on the classic metric learning framework by optimizing feature representation through contrastive learning between target classes and latent classes acquired from the background learner. Experiments on the Vaihingen dataset and the Zurich Summer dataset show that our model has satisfactory in-domain and cross-domain transferring abilities. In addition, broad experimental evaluations on PASCAL-5i and COCO-20i demonstrate that our model outperforms the prior works of few-shot segmentation. Our approach surpassed previous methods by 1.1% with ResNet-101 in a 1-way 5-shot setting. Full article
Show Figures

Graphical abstract

23 pages, 7978 KiB  
Article
Encoding Contextual Information by Interlacing Transformer and Convolution for Remote Sensing Imagery Semantic Segmentation
by Xin Li, Feng Xu, Runliang Xia, Tao Li, Ziqi Chen, Xinyuan Wang, Zhennan Xu and Xin Lyu
Remote Sens. 2022, 14(16), 4065; https://doi.org/10.3390/rs14164065 - 19 Aug 2022
Cited by 20 | Viewed by 1835
Abstract
Contextual information plays a pivotal role in the semantic segmentation of remote sensing imagery (RSI) due to the imbalanced distributions and ubiquitous intra-class variants. The emergence of the transformer intrigues the revolution of vision tasks with its impressive scalability in establishing long-range dependencies. [...] Read more.
Contextual information plays a pivotal role in the semantic segmentation of remote sensing imagery (RSI) due to the imbalanced distributions and ubiquitous intra-class variants. The emergence of the transformer intrigues the revolution of vision tasks with its impressive scalability in establishing long-range dependencies. However, the local patterns, such as inherent structures and spatial details, are broken with the tokenization of the transformer. Therefore, the ICTNet is devised to confront the deficiencies mentioned above. Principally, ICTNet inherits the encoder–decoder architecture. First of all, Swin Transformer blocks (STBs) and convolution blocks (CBs) are deployed and interlaced, accompanied by encoded feature aggregation modules (EFAs) in the encoder stage. This design allows the network to learn the local patterns and distant dependencies and their interactions simultaneously. Moreover, multiple DUpsamplings (DUPs) followed by decoded feature aggregation modules (DFAs) form the decoder of ICTNet. Specifically, the transformation and upsampling loss are shrunken while recovering features. Together with the devised encoder and decoder, the well-rounded context is captured and contributes to the inference most. Extensive experiments are conducted on the ISPRS Vaihingen, Potsdam and DeepGlobe benchmarks. Quantitative and qualitative evaluations exhibit the competitive performance of ICTNet compared to mainstream and state-of-the-art methods. Additionally, the ablation study of DFA and DUP is implemented to validate the effects. Full article
Show Figures

Graphical abstract

21 pages, 8988 KiB  
Article
Local-Global Based High-Resolution Spatial-Spectral Representation Network for Pansharpening
by Wei Huang, Ming Ju, Zhuobing Zhao, Qinggang Wu and Erlin Tian
Remote Sens. 2022, 14(15), 3556; https://doi.org/10.3390/rs14153556 - 25 Jul 2022
Cited by 2 | Viewed by 1115
Abstract
Due to the inability of convolutional neural networks to effectively obtain long-range information, a transformer was recently introduced into the field of pansharpening to obtain global dependencies. However, a transformer does not pay enough attention to the information of channel dimensions. To solve [...] Read more.
Due to the inability of convolutional neural networks to effectively obtain long-range information, a transformer was recently introduced into the field of pansharpening to obtain global dependencies. However, a transformer does not pay enough attention to the information of channel dimensions. To solve this problem, a local-global-based high-resolution spatial-spectral representation network (LG-HSSRN) is proposed to fully fuse local and global spatial-spectral information at different scales. In this paper, a multi-scale feature fusion (MSFF) architecture is designed to obtain the scale information of remote sensing images. Meanwhile, in order to learn spatial texture information and spectral information effectively, a local-global feature extraction (LGFE) module is proposed to capture the local and global dependencies in the source images from a spatial-spectral perspective. In addition, a multi-scale contextual aggregation (MSCA) module is proposed to weave hierarchical information with high representational power. The results of three satellite datasets show that the proposed method exhibits superior performance in terms of both spatial and spectral preservation compared to other methods. Full article
Show Figures

Graphical abstract

22 pages, 7509 KiB  
Article
Satellite Video Tracking by Multi-Feature Correlation Filters with Motion Estimation
by Yan Zhang, Deng Chen and Yuhui Zheng
Remote Sens. 2022, 14(11), 2691; https://doi.org/10.3390/rs14112691 - 03 Jun 2022
Cited by 10 | Viewed by 1768
Abstract
As a novel method of earth observation, video satellites can observe dynamic changes in ground targets in real time. To make use of satellite videos, target tracking in satellite videos has received extensive interest. However, this also faces a variety of new challenges [...] Read more.
As a novel method of earth observation, video satellites can observe dynamic changes in ground targets in real time. To make use of satellite videos, target tracking in satellite videos has received extensive interest. However, this also faces a variety of new challenges such as global occlusion, low resolution, and insufficient information compared with traditional target tracking. To handle the abovementioned problems, a multi-feature correlation filter with motion estimation is proposed. First, we propose a motion estimation algorithm that combines a Kalman filter and an inertial mechanism to alleviate the boundary effects. This can also be used to track the occluded target. Then, we fuse a histogram of oriented gradient (HOG) features and optical flow (OF) features to improve the representation information of the target. Finally, we introduce a disruptor-aware mechanism to weaken the influence of background noise. Experimental results verify that our algorithm can achieve high tracking performance. Full article
Show Figures

Graphical abstract

21 pages, 4361 KiB  
Article
Differential Strategy-Based Multi-Level Dense Network for Pansharpening
by Junru Yin, Jiantao Qu, Qiqiang Chen, Ming Ju and Jun Yu
Remote Sens. 2022, 14(10), 2347; https://doi.org/10.3390/rs14102347 - 12 May 2022
Cited by 2 | Viewed by 1153
Abstract
Due to the discrepancy in spatial structure between multispectral (MS) and panchromatic (PAN) images, the general fusion scheme will lead to image error in the fused result. To solve this issue, a differential strategy-based multi-level dense network is proposed, and it regards the [...] Read more.
Due to the discrepancy in spatial structure between multispectral (MS) and panchromatic (PAN) images, the general fusion scheme will lead to image error in the fused result. To solve this issue, a differential strategy-based multi-level dense network is proposed, and it regards the image pairs at different scales as the input of the network at different levels and is able to map the spatial information in PAN images to each band of MS images well by learning the differential information of different levels, which effectively solves the scale effect of remote sensing images. An improved dense network with the same hierarchical structure is used to obtain richer spatial features to enhance the spatial information of the fused result. Meanwhile, a hybrid loss strategy is used to constrain the network at different levels for obtaining better results. Qualitative and quantitative analyses show that the result has a uniform spectral distribution, a complete spatial structure, and optimal evaluation criteria, which fully demonstrate the superior performance of the proposed method. Full article
Show Figures

Graphical abstract

Other

Jump to: Research

14 pages, 3720 KiB  
Technical Note
Sea and Land Segmentation of Optical Remote Sensing Images Based on U-Net Optimization
by Jianfeng Li, Zhenghong Huang, Yongling Wang and Qinghua Luo
Remote Sens. 2022, 14(17), 4163; https://doi.org/10.3390/rs14174163 - 24 Aug 2022
Cited by 2 | Viewed by 1527
Abstract
At present, some related studies on semantic segmentation are becoming complicated, adding a lot of feature layers and various jump splicing to improve the level of refined segmentation, which often requires a large number of parameters to ensure a better segmentation effect. When [...] Read more.
At present, some related studies on semantic segmentation are becoming complicated, adding a lot of feature layers and various jump splicing to improve the level of refined segmentation, which often requires a large number of parameters to ensure a better segmentation effect. When faced with lightweight tasks, such as sea and land segmentation, the modeling capabilities of these models far exceed the complexity of the task, and reducing the size of the model can easily weaken the original effect of the model. In response to this problem, this paper proposes a U-net optimization structure combining Atrous Spatial Pyramid Pooling (ASPP) and FReLU, namely ACU-Net. ACU-Net replaces the two-layer continuous convolution in the feature extraction part of U-Net with a lightweight ASPP module, retains the symmetric U-shaped structure of the original U-Net structure, and splices the output of the ASPP module with the upsampling part. Use FReLU to improve the modeling ability between pixels, and at the same time cooperate with the attention mechanism to improve the perception ability and receptive field of the network, reduce the training difficulty of the model, and fully tap the hidden information of the samples to capture more effective features. The experimental results show that the ACU-Net in this paper surpasses the reduced U-Net and its optimized improved network U-Net++ in terms of segmentation accuracy and IoU with a smaller volume. Full article
Show Figures

Graphical abstract

Back to TopTop