remotesensing-logo

Journal Browser

Journal Browser

Object Detection and Information Extraction Based on Remote Sensing Imagery

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Remote Sensing Image Processing".

Deadline for manuscript submissions: 31 May 2024 | Viewed by 15235

Special Issue Editors


E-Mail Website
Guest Editor
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi’an 710071, China
Interests: deep learning; object detection and tracking; reinforcement learning; hyperspectral image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
Interests: mathematical models for visual information; graph matching problem and its applications; computer vision and machine learning; large-scale 3D reconstruction of visual scenes; information processing, fusion, and scene understanding in unmanned intelligent systems; interpretation and information mining of remote sensing images
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi’an 710071, China
Interests: remote sensing image processing; hyperspectral remote sensing; deep learning in remote sensing; change detection in remote sensing; remote sensing applications in urban planning; geospatial data analysis and modeling; SAR remote sensing
Special Issues, Collections and Topics in MDPI journals

grade E-Mail Website1 Website2
Guest Editor
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Interests: computer vision; pattern recognition; image processing; machine learning; deep learning; object detection and tracking; video analysis; remote sensing applications
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
1. International AI Future Lab on AI4EO, TUM, Munich, Germany
2. Visual Learning and Reasoning team, Department: EO Data Science, DLR-IMF, Oberpfaffenhofen, Germany
Interests: natural language and earth observation; UAV video understanding;3D structure inference from monocular optical/SAR imagery; recognition in remote sensing imagery

Special Issue Information

Dear Colleagues,

Remote sensing technology has become a fundamental means by which humans might observe the Earth, and has driven progress in many applicative fields, such as environmental surveillance, disaster monitoring, ocean situational awareness, traffic management, and modern military, etc. However, the intelligent interpretation of remote sensing data poses unique challenges due to the limited imaging capability, extremely high annotation costs, and insufficient multimodal data fusion. In recent years, deep learning techniques, represented by convolutional neural networks (CNNs) and transformers, have shown remarkable success in computer vision tasks due to their powerful feature extraction and representation capabilities. However, their application in remote sensing imagery is still relatively limited. In this Special Issue, we aim to compile state-of-the-art research pertaining to the application of machine learning methods for object detection and information extraction based on remote sensing imagery .

This Special Issue aims to present the latest advancements and emerging trends in the field of object detection and information extraction in remote sensing imagery. Specifically, the topics of interest include, but are not limited, to the following suggested themes:

  • Object detection and tracking in remote sensing images/videos;
  • Scene recognition, road extraction, semantic segmentation;
  • Anomaly detection and quality evaluation of remote sensing data;
  • Multi-modal remote sensing information extraction and fusion;
  • Few/zero-shot learning in remote sensing data;

Prof. Dr. Jie Feng
Prof. Dr. Gui-Song Xia
Prof. Dr. Xiangrong Zhang
Prof. Dr. Gong Cheng
Prof. Dr. Lichao Mou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection of remote sensing images
  • object detection and tracking of remote sensing videos
  • few/zero-shot learning
  • multi-source data fusion
  • weakly supervised learning
  • semantic segmentation
  • remote sensing image classification

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 11008 KiB  
Article
SAM-Induced Pseudo Fully Supervised Learning for Weakly Supervised Object Detection in Remote Sensing Images
by Xiaoliang Qian, Chenyang Lin, Zhiwu Chen and Wei Wang
Remote Sens. 2024, 16(9), 1532; https://doi.org/10.3390/rs16091532 - 26 Apr 2024
Viewed by 285
Abstract
Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to detect high-value targets by solely utilizing image-level category labels; however, two problems have not been well addressed by existing methods. Firstly, the seed instances (SIs) are mined solely relying on the [...] Read more.
Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to detect high-value targets by solely utilizing image-level category labels; however, two problems have not been well addressed by existing methods. Firstly, the seed instances (SIs) are mined solely relying on the category score (CS) of each proposal, which is inclined to concentrate on the most salient parts of the object; furthermore, they are unreliable because the robustness of the CS is not sufficient due to the fact that the inter-category similarity and intra-category diversity are more serious in RSIs. Secondly, the localization accuracy is limited by the proposals generated by the selective search or edge box algorithm. To address the first problem, a segment anything model (SAM)-induced seed instance-mining (SSIM) module is proposed, which mines the SIs according to the object quality score, which indicates the comprehensive characteristic of the category and the completeness of the object. To handle the second problem, a SAM-based pseudo-ground truth-mining (SPGTM) module is proposed to mine the pseudo-ground truth (PGT) instances, for which the localization is more accurate than traditional proposals by fully making use of the advantages of SAM, and the object-detection heads are trained by the PGT instances in a fully supervised manner. The ablation studies show the effectiveness of the SSIM and SPGTM modules. Comprehensive comparisons with 15 WSOD methods demonstrate the superiority of our method on two RSI datasets. Full article
Show Figures

Figure 1

22 pages, 32231 KiB  
Article
MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection
by Guoqing Zhang, Wenyu Yu and Ruixia Hou
Remote Sens. 2024, 16(6), 936; https://doi.org/10.3390/rs16060936 - 07 Mar 2024
Viewed by 550
Abstract
Object detection is dedicated to finding objects in an image and estimate their categories and locations. Recently, object detection algorithms suffer from a loss of semantic information in the deeper feature maps due to the deepening of the backbone network. For example, when [...] Read more.
Object detection is dedicated to finding objects in an image and estimate their categories and locations. Recently, object detection algorithms suffer from a loss of semantic information in the deeper feature maps due to the deepening of the backbone network. For example, when using complex backbone networks, existing feature fusion methods cannot fuse information from different layers effectively. In addition, anchor-free object detection methods fail to accurately predict the same object due to the different learning mechanisms of the regression and centrality of the prediction branches. To address the above problem, we propose a multi-scale fusion and interactive learning method for fully convolutional one-stage anchor-free object detection, called MFIL-FCOS. Specifically, we designed a multi-scale fusion module to address the problem of local semantic information loss in high-level feature maps which strengthen the ability of feature extraction by enhancing the local information of low-level features and fusing the rich semantic information of high-level features. Furthermore, we propose an interactive learning module to increase the interactivity and more accurate predictions by generating a centrality-position weight adjustment regression task and a centrality prediction task. Following these strategic improvements, we conduct extensive experiments on the COCO and DIOR datasets, demonstrating its superior capabilities in 2D object detection tasks and remote sensing image detection, even under challenging conditions. Full article
Show Figures

Graphical abstract

25 pages, 9100 KiB  
Article
Spectral–Spatial Graph Convolutional Network with Dynamic-Synchronized Multiscale Features for Few-Shot Hyperspectral Image Classification
by Shuai Liu, Hongfei Li, Chengji Jiang and Jie Feng
Remote Sens. 2024, 16(5), 895; https://doi.org/10.3390/rs16050895 - 02 Mar 2024
Viewed by 917
Abstract
The classifiers based on the convolutional neural network (CNN) and graph convolutional network (GCN) have demonstrated their effectiveness in hyperspectral image (HSI) classification. However, their performance is limited by the high time complexity of CNN, spatial complexity of GCN, and insufficient labeled samples. [...] Read more.
The classifiers based on the convolutional neural network (CNN) and graph convolutional network (GCN) have demonstrated their effectiveness in hyperspectral image (HSI) classification. However, their performance is limited by the high time complexity of CNN, spatial complexity of GCN, and insufficient labeled samples. To ease these limitations, the spectral–spatial graph convolutional network with dynamic-synchronized multiscale features is proposed for few-shot HSI classification. Firstly, multiscale patches are generated to enrich training samples in the feature space. A weighted spectral optimization module is explored to evaluate the discriminate information among different bands of patches. Then, the adaptive dynamic graph convolutional module is proposed to extract local and long-range spatial–spectral features of patches at each scale. Considering that features of different scales can be regarded as sequential data due to intrinsic correlations, the bidirectional LSTM is adopted to synchronously extract the spectral–spatial characteristics from all scales. Finally, auxiliary classifiers are utilized to predict labels of samples at each scale and enhance the training stability. Label smoothing is introduced into the classification loss to reduce the influence of misclassified samples and imbalance of classes. Extensive experiments demonstrate the superiority of the proposed method over other state-of-the-art methods, obtaining overall accuracies of 87.25%, 92.72%, and 93.36% on the Indian Pines, Pavia University, and Salinas datasets, respectively. Full article
Show Figures

Figure 1

23 pages, 22134 KiB  
Article
Multiobjective Evolutionary Superpixel Segmentation for PolSAR Image Classification
by Boce Chu, Mengxuan Zhang, Kun Ma, Long Liu, Junwei Wan, Jinyong Chen, Jie Chen and Hongcheng Zeng
Remote Sens. 2024, 16(5), 854; https://doi.org/10.3390/rs16050854 - 29 Feb 2024
Viewed by 476
Abstract
Superpixel segmentation has been widely used in the field of computer vision. The generations of PolSAR superpixels have also been widely studied for their feasibility and high efficiency. The initial numbers of PolSAR superpixels are usually designed manually by experience, which has a [...] Read more.
Superpixel segmentation has been widely used in the field of computer vision. The generations of PolSAR superpixels have also been widely studied for their feasibility and high efficiency. The initial numbers of PolSAR superpixels are usually designed manually by experience, which has a significant impact on the final performance of superpixel segmentation and the subsequent interpretation tasks. Additionally, the effective information of PolSAR superpixels is not fully analyzed and utilized in the generation process. Regarding these issues, a multiobjective evolutionary superpixel segmentation for PolSAR image classification is proposed in this study. It contains two layers, an automatic optimization layer and a fine segmentation layer. Fully considering the similarity information within the superpixels and the difference information among the superpixels simultaneously, the automatic optimization layer can determine the suitable number of superpixels automatically by the multiobjective optimization for PolSAR superpixel segmentation. Considering the difficulty of the search for accurate boundaries of complex ground objects in PolSAR images, the fine segmentation layer can further improve the qualities of superpixels by fully using the boundary information of good-quality superpixels in the evolution process for generating PolSAR superpixels. The experiments on different PolSAR image datasets validate that the proposed approach can automatically generate high-quality superpixels without any prior information. Full article
Show Figures

Figure 1

24 pages, 4133 KiB  
Article
OII: An Orientation Information Integrating Network for Oriented Object Detection in Remote Sensing Images
by Yangfeixiao Liu and Wanshou Jiang
Remote Sens. 2024, 16(5), 731; https://doi.org/10.3390/rs16050731 - 20 Feb 2024
Viewed by 647
Abstract
Oriented object detection for remote sensing images poses formidable challenges due to arbitrary orientation, diverse scales, and densely distributed targets (e.g., across terrain). Current investigations in remote sensing object detection have primarily focused on improving the representation of oriented bounding boxes yet have [...] Read more.
Oriented object detection for remote sensing images poses formidable challenges due to arbitrary orientation, diverse scales, and densely distributed targets (e.g., across terrain). Current investigations in remote sensing object detection have primarily focused on improving the representation of oriented bounding boxes yet have neglected the significant orientation information of targets in remote sensing contexts. Recent investigations point out that the inclusion and fusion of orientation information yields substantial benefits in training an accurate oriented object system. In this paper, we propose a simple but effective orientation information integrating (OII) network comprising two main parts: the orientation information highlighting (OIH) module and orientation feature fusion (OFF) module. The OIH module extracts orientation features from those produced by the backbone by modeling the frequency information of spatial features. Given that low-frequency components in an image capture its primary content, and high-frequency components contribute to its intricate details and edges, the transformation from the spatial domain to the frequency domain can effectively emphasize the orientation information of images. Subsequently, our OFF module employs a combination of a CNN attention mechanism and self-attention to derive weights for orientation features and original features. These derived weights are adopted to adaptively enhance the original features, resulting in integrated features that contain enriched orientation information. Given the inherent limitation of the original spatial attention weights in explicitly capturing orientation nuances, the incorporation of the introduced orientation weights serves as a pivotal tool to accentuate and delineate orientation information related to targets. Without unnecessary embellishments, our OII network achieves competitive detection accuracy on two prevalent remote sensing-oriented object detection datasets: DOTA (80.82 mAP) and HRSC2016 (98.32 mAP). Full article
Show Figures

Graphical abstract

25 pages, 2880 KiB  
Article
Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images
by Lina Huo, Jiayue Hou, Jie Feng, Wei Wang and Jinsheng Liu
Remote Sens. 2024, 16(4), 624; https://doi.org/10.3390/rs16040624 - 07 Feb 2024
Viewed by 730
Abstract
Salient Object Detection (SOD) is gradually applied in natural scene images. However, due to the apparent differences between optical remote sensing images and natural scene images, directly applying the SOD of natural scene images to optical remote sensing images has limited performance in [...] Read more.
Salient Object Detection (SOD) is gradually applied in natural scene images. However, due to the apparent differences between optical remote sensing images and natural scene images, directly applying the SOD of natural scene images to optical remote sensing images has limited performance in global context information. Therefore, salient object detection in optical remote sensing images (ORSI-SOD) is challenging. Optical remote sensing images usually have large-scale variations. However, the vast majority of networks are based on Convolutional Neural Network (CNN) backbone networks such as VGG and ResNet, which can only extract local features. To address this problem, we designed a new model that employs a transformer-based backbone network capable of extracting global information and remote dependencies. A new framework is proposed for this question, named Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images (GMANet). In this framework, the Pyramid Vision Transformer (PVT) is an encoder to catch remote dependencies. A Multiscale Attention Module (MAM) is introduced for extracting multiscale information. Meanwhile, a Global Guiled Brach (GGB) is used to learn the global context information and obtain the complete structure. Four MAMs are densely connected to this GGB. The Aggregate Refinement Module (ARM) is used to enrich the details of edge and low-level features. The ARM fuses global context information and encoder multilevel features to complement the details while the structure is complete. Extensive experiments on two public datasets show that our proposed framework GMANet outperforms 28 state-of-the-art methods on six evaluation metrics, especially E-measure and F-measure. It is because we apply a coarse-to-fine strategy to merge global context information and multiscale information. Full article
Show Figures

Graphical abstract

25 pages, 7566 KiB  
Article
Multi-Level Feature Extraction Networks for Hyperspectral Image Classification
by Shaoyi Fang, Xinyu Li, Shimao Tian, Weihao Chen and Erlei Zhang
Remote Sens. 2024, 16(3), 590; https://doi.org/10.3390/rs16030590 - 04 Feb 2024
Viewed by 921
Abstract
Hyperspectral image (HSI) classification plays a key role in the field of earth observation missions. Recently, transformer-based approaches have been widely used for HSI classification due to their ability to model long-range sequences. However, these methods face two main challenges. First, they treat [...] Read more.
Hyperspectral image (HSI) classification plays a key role in the field of earth observation missions. Recently, transformer-based approaches have been widely used for HSI classification due to their ability to model long-range sequences. However, these methods face two main challenges. First, they treat HSI as linear vectors, disregarding their 3D attributes and spatial structure. Second, the repeated concatenation of encoders leads to information loss and gradient vanishing. To overcome these challenges, we propose a new solution called the multi-level feature extraction network (MLFEN). MLFEN consists of two sub-networks: the hybrid convolutional attention module (HCAM) and the enhanced dense vision transformer (EDVT). HCAM incorporates a band shift strategy to eliminate the edge effect of convolution and utilizes hybrid convolutional blocks to capture the 3D properties and spatial structure of HSI. Additionally, an attention module is introduced to identify strongly discriminative features. EDVT reconfigures the organization of original encoders by incorporating dense connections and adaptive feature fusion components, enabling faster propagation of information and mitigating the problem of gradient vanishing. Furthermore, we propose a novel sparse loss function to better fit the data distribution. Extensive experiments conducted on three public datasets demonstrate the significant advancements achieved by MLFEN. Full article
Show Figures

Figure 1

23 pages, 9145 KiB  
Article
A Multi-Feature Fusion-Based Method for Crater Extraction of Airport Runways in Remote-Sensing Images
by Yalun Zhao, Derong Chen and Jiulu Gong
Remote Sens. 2024, 16(3), 573; https://doi.org/10.3390/rs16030573 - 02 Feb 2024
Viewed by 630
Abstract
Due to the influence of the complex background of airports and damaged areas of the runway, the existing runway extraction methods do not perform well. Furthermore, the accurate crater extraction of airport runways plays a vital role in the military fields, but there [...] Read more.
Due to the influence of the complex background of airports and damaged areas of the runway, the existing runway extraction methods do not perform well. Furthermore, the accurate crater extraction of airport runways plays a vital role in the military fields, but there are few related studies on this topic. To solve these problems, this paper proposes an effective method for the crater extraction of runways, which mainly consists of two stages: airport runway extraction and runway crater extraction. For the previous stage, we first apply corner detection and screening strategies to runway extraction based on multiple features of the runway, such as high brightness, regional texture similarity, and shape of the runway to improve the completeness of runway extraction. In addition, the proposed method can automatically realize the complete extraction of runways with different degrees of damage. For the latter stage, the craters of the runway can be extracted by calculating the edge gradient amplitude and grayscale distribution standard deviation of the candidate areas within the runway extraction results. In four typical remote-sensing images and four post-damage remote-sensing images, the average integrity of the runway extraction reaches more than 90%. The comparative experiment results show that the extraction effect and running speed of our method are both better than those of state-of-the-art methods. In addition, the final experimental results of crater extraction show that the proposed method can effectively extract craters of airport runways, and the extraction precision and recall both reach more than 80%. Overall, our research is of great significance to the damage assessment of airport runways based on remote-sensing images in the military fields. Full article
Show Figures

Figure 1

22 pages, 6084 KiB  
Article
TreeDetector: Using Deep Learning for the Localization and Reconstruction of Urban Trees from High-Resolution Remote Sensing Images
by Haoyu Gong, Qian Sun, Chenrong Fang, Le Sun and Ran Su
Remote Sens. 2024, 16(3), 524; https://doi.org/10.3390/rs16030524 - 30 Jan 2024
Viewed by 1088
Abstract
There have been considerable efforts in generating tree crown maps from satellite images. However, tree localization in urban environments using satellite imagery remains a challenging task. One of the difficulties in complex urban tree detection tasks lies in the segmentation of dense tree [...] Read more.
There have been considerable efforts in generating tree crown maps from satellite images. However, tree localization in urban environments using satellite imagery remains a challenging task. One of the difficulties in complex urban tree detection tasks lies in the segmentation of dense tree crowns. Currently, methods based on semantic segmentation algorithms have made significant progress. We propose to split the tree localization problem into two parts, dense clusters and single trees, and combine the target detection method with a procedural generation method based on planting rules for the complex urban tree detection task, which improves the accuracy of single tree detection. Specifically, we propose a two-stage urban tree localization pipeline that leverages deep learning and planting strategy algorithms along with region discrimination methods. This approach ensures the precise localization of individual trees while also facilitating distribution inference within dense tree canopies. Additionally, our method estimates the radius and height of trees, which provides significant advantages for three-dimensional reconstruction tasks from remote sensing images. We compare our results with other existing methods, achieving an 82.3% accuracy in individual tree localization. This method can be seamlessly integrated with the three-dimensional reconstruction of urban trees. We visualized the three-dimensional reconstruction of urban trees generated by this method, which demonstrates the diversity of tree heights and provides a more realistic solution for tree distribution generation. Full article
Show Figures

Figure 1

18 pages, 4197 KiB  
Article
City Scale Traffic Monitoring Using WorldView Satellite Imagery and Deep Learning: A Case Study of Barcelona
by Annalisa Sheehan, Andrew Beddows, David C. Green and Sean Beevers
Remote Sens. 2023, 15(24), 5709; https://doi.org/10.3390/rs15245709 - 13 Dec 2023
Viewed by 1477
Abstract
Accurate traffic data is crucial for a range of different applications such as quantifying vehicle emissions, and transportation planning and management. However, the availability of traffic data is geographically fragmented and is rarely held in an accessible form. Therefore, there is an urgent [...] Read more.
Accurate traffic data is crucial for a range of different applications such as quantifying vehicle emissions, and transportation planning and management. However, the availability of traffic data is geographically fragmented and is rarely held in an accessible form. Therefore, there is an urgent need for a common approach to developing large urban traffic data sets. Utilising satellite data to estimate traffic data offers a cost-effective and standardized alternative to ground-based traffic monitoring. This study used high-resolution satellite imagery (WorldView-2 and 3) and Deep Learning (DL) to identify vehicles, road by road, in Barcelona (2017–2019). The You Only Look Once (YOLOv3) object detection model was trained and model accuracy was investigated via parameters such as training data set specific anchor boxes, network resolution, image colour band composition and input image size. The best performing vehicle detection model configuration had a precision (proportion of positive detections that were correct) of 0.69 and a recall (proportion of objects in the image correctly identified) of 0.79. We demonstrated that high-resolution satellite imagery and object detection models can be utilised to identify vehicles at a city scale. However, the approach highlights challenges relating to identifying vehicles on narrow roads, in shadow, under vegetation, and obstructed by buildings. This is the first time that DL has been used to identify vehicles at a city scale and demonstrates the possibility of applying these methods to cities globally where data are often unavailable. Full article
Show Figures

Figure 1

19 pages, 14479 KiB  
Article
FCOSR: A Simple Anchor-Free Rotated Detector for Aerial Object Detection
by Zhonghua Li, Biao Hou, Zitong Wu, Bo Ren and Chen Yang
Remote Sens. 2023, 15(23), 5499; https://doi.org/10.3390/rs15235499 - 25 Nov 2023
Cited by 11 | Viewed by 1163
Abstract
Although existing anchor-based oriented object detection methods have achieved remarkable results, they require manual preset boxes, which introduce additional hyper-parameters and calculations. These methods often use more complex architectures for better performance, which makes them difficult to deploy on computationally constrained embedded platforms, [...] Read more.
Although existing anchor-based oriented object detection methods have achieved remarkable results, they require manual preset boxes, which introduce additional hyper-parameters and calculations. These methods often use more complex architectures for better performance, which makes them difficult to deploy on computationally constrained embedded platforms, such as satellites and unmanned aerial vehicles. We aim to design a high-performance algorithm that is simple, fast, and easy to deploy for aerial image detection. In this article, we propose a one-stage anchor-free rotated object detector, FCOSR, that can be deployed on most platforms and uses our well-defined label assignment strategy for the features of the aerial image objects. We use the ellipse center sampling method to define a suitable sampling region for an oriented bounding box (OBB). The fuzzy sample assignment strategy provides reasonable labels for overlapping objects. To solve the problem of insufficient sampling, we designed a multi-level sampling module. These strategies allocate more appropriate labels to training samples. Our algorithm achieves an mean average precision (mAP) of 79.25, 75.41, and 90.13 on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets, respectively. FCOSR demonstrates a performance superior to that of other methods in single-scale evaluation, where the small model achieves an mAP of 74.05 at a speed of 23.7 FPS on an RTX 2080-Ti GPU. When we convert the lightweight FCOSR model to the TensorRT format, it achieves an mAP of 73.93 on DOTA-v1.0 at a speed of 17.76 FPS on a Jetson AGX Xavier device with a single scale. Full article
Show Figures

Figure 1

23 pages, 2929 KiB  
Article
Misaligned RGB-Infrared Object Detection via Adaptive Dual-Discrepancy Calibration
by Mingzhou He, Qingbo Wu, King Ngi Ngan, Feng Jiang, Fanman Meng and Linfeng Xu
Remote Sens. 2023, 15(19), 4887; https://doi.org/10.3390/rs15194887 - 09 Oct 2023
Viewed by 1391
Abstract
Object detection based on RGB and infrared images has emerged as a crucial research area in computer vision, and the synergy of RGB-Infrared ensures the robustness of object-detection algorithms under varying lighting conditions. However, the RGB-IR image pairs captured typically exhibit spatial misalignment [...] Read more.
Object detection based on RGB and infrared images has emerged as a crucial research area in computer vision, and the synergy of RGB-Infrared ensures the robustness of object-detection algorithms under varying lighting conditions. However, the RGB-IR image pairs captured typically exhibit spatial misalignment due to sensor discrepancies, leading to compromised localization performance. Furthermore, since the inconsistent distribution of deep features from the two modalities, directly fusing multi-modal features will weaken the feature difference between the object and the background, therefore interfering with the RGB-Infrared object-detection performance. To address these issues, we propose an adaptive dual-discrepancy calibration network (ADCNet) for misaligned RGB-Infrared object detection, including spatial discrepancy and domain-discrepancy calibration. Specifically, the spatial discrepancy calibration module conducts an adaptive affine transformation to achieve spatial alignment of features. Then, the domain-discrepancy calibration module separately aligns object and background features from different modalities, making the distribution of the object and background of the fusion feature easier to distinguish, therefore enhancing the effectiveness of RGB-Infrared object detection. Our ADCNet outperforms the baseline by 3.3% and 2.5% in mAP50 on the FLIR and misaligned M3FD datasets, respectively. Experimental results demonstrate the superiorities of our proposed method over the state-of-the-art approaches. Full article
Show Figures

Graphical abstract

19 pages, 4284 KiB  
Article
AOGC: Anchor-Free Oriented Object Detection Based on Gaussian Centerness
by Zechen Wang, Chun Bao, Jie Cao and Qun Hao
Remote Sens. 2023, 15(19), 4690; https://doi.org/10.3390/rs15194690 - 25 Sep 2023
Cited by 2 | Viewed by 918
Abstract
Oriented object detection is a challenging task in scene text detection and remote sensing image analysis, and it has attracted extensive attention due to the development of deep learning in recent years. Currently, mainstream oriented object detectors are anchor-based methods. These methods increase [...] Read more.
Oriented object detection is a challenging task in scene text detection and remote sensing image analysis, and it has attracted extensive attention due to the development of deep learning in recent years. Currently, mainstream oriented object detectors are anchor-based methods. These methods increase the computational load of the network and cause a large amount of anchor box redundancy. In order to address this issue, we proposed an anchor-free oriented object detection method based on Gaussian centerness (AOGC), which is a single-stage anchor-free detection method. Our method uses contextual attention FPN (CAFPN) to obtain the contextual information of the target. Then, we designed a label assignment method for the oriented objects, which can select positive samples with higher quality and is suitable for large aspect ratio targets. Finally, we developed a Gaussian kernel-based centerness branch that can effectively determine the significance of different anchors. AOGC achieved a mAP of 74.30% on the DOTA-1.0 datasets and 89.80% on the HRSC2016 datasets, respectively. Our experimental results show that AOGC exhibits superior performance to other methods in single-stage oriented object detection and achieves similar performance to the two-stage methods. Full article
Show Figures

Figure 1

20 pages, 185958 KiB  
Article
High-Resolution Network with Transformer Embedding Parallel Detection for Small Object Detection in Optical Remote Sensing Images
by Xiaowen Zhang, Qiaoyuan Liu, Hongliang Chang and Haijiang Sun
Remote Sens. 2023, 15(18), 4497; https://doi.org/10.3390/rs15184497 - 13 Sep 2023
Cited by 1 | Viewed by 1056
Abstract
Small object detection in remote sensing enables the identification and analysis of unapparent but important information, playing a crucial role in various ground monitoring tasks. Due to the small size, the available feature information contained in small objects is very limited, making them [...] Read more.
Small object detection in remote sensing enables the identification and analysis of unapparent but important information, playing a crucial role in various ground monitoring tasks. Due to the small size, the available feature information contained in small objects is very limited, making them more easily buried by the complex background. As one of the research hotspots in remote sensing, although many breakthroughs have been made, there still exist two significant shortcomings for the existing approaches: first, the down-sampling operation commonly used for feature extraction can barely preserve weak features of objects in a tiny size; second, the convolutional neural network methods have limitations in modeling global context to address cluttered backgrounds. To tackle these issues, a high-resolution network with transformer embedding parallel detection (HRTP-Net) is proposed in this paper. A high-resolution feature fusion network (HR-FFN) is designed to solve the first problem by maintaining high spatial resolution features with enhanced semantic information. Furthermore, a Swin-transformer-based mixed attention module (STMA) is proposed to augment the object information in the transformer block by establishing a pixel-level correlation, thereby enabling global background–object modeling, which can address the second shortcoming. Finally, a parallel detection structure for remote sensing is constructed by integrating the attentional outputs of STMA with standard convolutional features. The proposed method effectively mitigates the impact of the intricate background on small objects. The comprehensive experiment results on three representative remote sensing datasets with small objects (MASATI, VEDAI and DOTA datasets) demonstrate that the proposed HRTP-Net achieves a promising and competitive performance. Full article
Show Figures

Figure 1

20 pages, 43092 KiB  
Article
RTV-SIFT: Harnessing Structure Information for Robust Optical and SAR Image Registration
by Siqi Pang, Junyao Ge, Lei Hu, Kaitai Guo, Yang Zheng, Changli Zheng, Wei Zhang and Jimin Liang
Remote Sens. 2023, 15(18), 4476; https://doi.org/10.3390/rs15184476 - 12 Sep 2023
Viewed by 906
Abstract
Registration of optical and synthetic aperture radar (SAR) images is challenging because extracting located identically and unique features on both images are tricky. This paper proposes a novel optical and SAR image registration method based on relative total variation (RTV) and scale-invariant feature [...] Read more.
Registration of optical and synthetic aperture radar (SAR) images is challenging because extracting located identically and unique features on both images are tricky. This paper proposes a novel optical and SAR image registration method based on relative total variation (RTV) and scale-invariant feature transform (SIFT), named RTV-SIFT, to extract feature points on the edges of structures and construct structural edge descriptors to improve the registration accuracy. First, a novel RTV-Harris feature point detection method by combining the RTV and the multiscale Harris algorithm is proposed to extract feature points on both images’ significant structures. This ensures a high repetition rate of the feature points. Second, the feature point descriptors are constructed on enhanced phase congruency edge (EPCE), which combines the Sobel operator and maximum moment of phase congruency (PC) to extract edges from structured images that enhance robustness to nonlinear intensity differences and speckle noise. Finally, after coarse registration, the position and orientation Euclidean distance (POED) between feature points is utilized to achieve fine feature point matching to improve the registration accuracy. The experimental results demonstrate the superiority of the proposed RTV-SIFT method in different scenes and image capture conditions, indicating its robustness and effectiveness in optical and SAR image registration. Full article
Show Figures

Figure 1

Back to TopTop