Topic Editors

Perception, Robotics, and Intelligent Machines Research Group (PRIME), Department of Computer Science, Université de Moncton, Moncton, NB E1A 3E9, Canada
Department of Geomatics Engineering, University of Calgary, 2500 University Dr. NW, Calgary, AB T2N 1N4, Canada

Deep Learning and Transformers’ Methods Applied to Remotely Captured Data

Abstract submission deadline
closed (1 July 2023)
Manuscript submission deadline
closed (30 September 2023)
Viewed by
26962

Topic Information

Dear Colleagues,

The areas of machine learning and deep learning have experienced impressive progress in recent years. This progress has mainly been driven by the availability of high processing performance at an affordable cost and a large quantity of data. Most state-of-the-art techniques today are based on deep neural networks or the more recently proposed transformers. This progress has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently. Among the various research areas that have been significantly impacted by this progress is the processing of remotely captured data such as airborne and spaceborne passive and active imagery, underwater imagery, mobile mapping data, etc. This Topic aims to gather cutting-edge contributions from researchers using deep learning and transformers for remote sensing and for processing remotely captured data. Contributions are accepted in different areas of application, including but not limited to environmental studies, precision agriculture, forestry, disaster response, building information modeling, infrastructure inspection, defense and security, benchmarking, and open-access datasets for remote sensing. Studies using active or passive sensors from satellites, airborne platforms, drones, and underwater and terrestrial vehicles are welcome. Contributions can be submitted in various forms, such as research papers, review papers, datasets, and comparative analyses.

Prof. Dr. Moulay A. Akhloufi
Dr. Mozhdeh Shahbazi
Topic Editors

Keywords

  • multispectral, hyperspectral remote sensing, photogrammetry
  • LiDAR, UAV, sensors
  • underwater drones
  • mobile robots
  • forest monitoring, forest fires, precision agriculture, environmental monitoring, natural risks
  • defense and security
  • machine learning, deep learning, data fusion, image processing
  • space sensing and exploration
  • remote sensing datasets
  • navigation
  • 3-D mapping and modelling

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
AI
ai
- - 2020 20.8 Days CHF 1600
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400
Big Data and Cognitive Computing
BDCC
3.7 4.9 2017 18.2 Days CHF 1800
Remote Sensing
remotesensing
5.0 7.9 2009 23 Days CHF 2700
Sensors
sensors
3.9 6.8 2001 17 Days CHF 2600

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (17 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
18 pages, 5695 KiB  
Article
Machine-to-Machine Visual Dialoguing with ChatGPT for Enriched Textual Image Description
by Riccardo Ricci, Yakoub Bazi and Farid Melgani
Remote Sens. 2024, 16(3), 441; https://doi.org/10.3390/rs16030441 - 23 Jan 2024
Viewed by 834
Abstract
Image captioning is a technique that enables the automatic extraction of natural language descriptions about the contents of an image. On the one hand, information in the form of natural language can enhance accessibility by reducing the expertise required to process, analyze, and [...] Read more.
Image captioning is a technique that enables the automatic extraction of natural language descriptions about the contents of an image. On the one hand, information in the form of natural language can enhance accessibility by reducing the expertise required to process, analyze, and exploit remote sensing images, while on the other, it provides a direct and general form of communication. However, image captioning is usually restricted to a single sentence, which barely describes the rich semantic information that typically characterizes remote sensing (RS) images. In this paper, we aim to move one step forward by proposing a captioning system that, mimicking human behavior, adopts dialogue as a tool to explore and dig for information, leading to more detailed and comprehensive descriptions of RS scenes. The system relies on a questions–answers scheme fed by a query image and summarizes the dialogue content with ChatGPT. Experiments carried out on two benchmark remote sensing datasets confirm the potential of such an approach in the context of semantic information mining. Strengths and weaknesses are highlighted and discussed, as well as some possible future developments. Full article
Show Figures

Graphical abstract

23 pages, 2966 KiB  
Article
DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds
by Yaqian Ning, Jie Cao, Chun Bao and Qun Hao
Remote Sens. 2023, 15(23), 5612; https://doi.org/10.3390/rs15235612 - 03 Dec 2023
Viewed by 1351
Abstract
The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To [...] Read more.
The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods. Full article
Show Figures

Graphical abstract

17 pages, 168716 KiB  
Technical Note
Modeling the Global Relationship via the Point Cloud Transformer for the Terrain Filtering of Airborne LiDAR Data
by Libo Cheng, Rui Hao, Zhibo Cheng, Taifeng Li, Tengxiao Wang, Wenlong Lu, Yulin Ding and Han Hu
Remote Sens. 2023, 15(23), 5434; https://doi.org/10.3390/rs15235434 - 21 Nov 2023
Cited by 1 | Viewed by 747
Abstract
Due to the irregularity and complexity of ground and non-ground objects, filtering non-ground data from airborne LiDAR point clouds to create Digital Elevation Models (DEMs) remains a longstanding and unresolved challenge. Recent advancements in deep learning have offered effective solutions for understanding three-dimensional [...] Read more.
Due to the irregularity and complexity of ground and non-ground objects, filtering non-ground data from airborne LiDAR point clouds to create Digital Elevation Models (DEMs) remains a longstanding and unresolved challenge. Recent advancements in deep learning have offered effective solutions for understanding three-dimensional semantic scenes. However, existing studies lack the capability to model global semantic relationships and fail to integrate global and local semantic information effectively, which are crucial for the ground filtering of point cloud data, especially for larger objects. This study focuses on ground filtering challenges in large scenes and introduces an elevation offset-attention (E-OA) module, which considers global semantic features and integrates them into existing network frameworks. The performance of this module has been validated on three classic benchmark models (RandLA-Net, point transformer, and PointMeta-L). It was compared with two traditional filtering methods and the advanced CDFormer model. Additionally, the E-OA module was compared with three state-of-the-art attention frameworks. Experiments were conducted on two distinct data sources. The results show that our proposed E-OA module improves the filtering performance of all three benchmark models across both data sources, with a maximum improvement of 6.15%. The performance of models was enhanced with the E-OA module, consistently exceeding that of traditional methods and all competing attention frameworks. The proposed E-OA module can serve as a plug-and-play component, compatible with existing networks featuring local feature extraction capabilities. Full article
Show Figures

Figure 1

18 pages, 5263 KiB  
Article
Super Resolution of Satellite-Derived Sea Surface Temperature Using a Transformer-Based Model
by Runtai Zou, Li Wei and Lei Guan
Remote Sens. 2023, 15(22), 5376; https://doi.org/10.3390/rs15225376 - 16 Nov 2023
Viewed by 858
Abstract
Sea surface temperature (SST) is one of the most important factors related to the ocean and the climate. In studying the domains of eddies, fronts, and current systems, high-resolution SST data are required. However, the passive microwave radiometer achieves a higher spatial coverage [...] Read more.
Sea surface temperature (SST) is one of the most important factors related to the ocean and the climate. In studying the domains of eddies, fronts, and current systems, high-resolution SST data are required. However, the passive microwave radiometer achieves a higher spatial coverage but lower resolution, while the thermal infrared radiometer has a lower spatial coverage but higher resolution. In this paper, in order to improve the performance of the super-resolution SST images derived from microwave SST data, we propose a transformer-based SST reconstruction model comprising the transformer block and the residual block, rather than purely convolutional approaches. The outputs of the transformer model are then compared with those of the other three deep learning super-resolution models, and the transformer model obtains lower root-mean-squared error (RMSE), mean bias (Bias), and robust standard deviation (RSD) values than the other three models, as well as higher entropy and definition, making it the better performing model of all those compared. Full article
Show Figures

Graphical abstract

16 pages, 2931 KiB  
Technical Note
A Seabed Terrain Feature Extraction Transformer for the Super-Resolution of the Digital Bathymetric Model
by Wuxu Cai, Yanxiong Liu, Yilan Chen, Zhipeng Dong, Hanxiao Yuan and Ningning Li
Remote Sens. 2023, 15(20), 4906; https://doi.org/10.3390/rs15204906 - 11 Oct 2023
Cited by 1 | Viewed by 999
Abstract
The acquisition of high-resolution (HR) digital bathymetric models (DBMs) is crucial for oceanic research activities. However, obtaining HR DBM data is challenging, which has led to the use of super-resolution (SR) methods to improve the DBM’s resolution, as, unfortunately, existing interpolation methods for [...] Read more.
The acquisition of high-resolution (HR) digital bathymetric models (DBMs) is crucial for oceanic research activities. However, obtaining HR DBM data is challenging, which has led to the use of super-resolution (SR) methods to improve the DBM’s resolution, as, unfortunately, existing interpolation methods for DBMs suffer from low precision, which limits their practicality. To address this issue, we propose a seabed terrain feature extraction transform model that combines the seabed terrain feature extraction module with the efficient transform module, focusing on the terrain characteristics of DBMs. By taking advantage of these two modules, we improved the efficient extraction of seabed terrain features both locally and globally, and as a result, we obtained a highly accurate SR reconstruction of DBM data within the study area, including the Mariana Trench in the Pacific Ocean and the adjacent sea. A comparative analysis with bicubic interpolation, SRCNN, SRGAN, and SRResNet shows that the proposed method decreases the root mean square error (RMSE) by 16%, 10%, 13%, and 12%, respectively. These experimental results confirm the high accuracy of the proposed method in terms of reconstructing HR DBMs. Full article
Show Figures

Figure 1

20 pages, 4702 KiB  
Article
Multi-Attention Multi-Image Super-Resolution Transformer (MAST) for Remote Sensing
by Jiaao Li, Qunbo Lv, Wenjian Zhang, Baoyu Zhu, Guiyu Zhang and Zheng Tan
Remote Sens. 2023, 15(17), 4183; https://doi.org/10.3390/rs15174183 - 25 Aug 2023
Cited by 1 | Viewed by 2311
Abstract
Deep-learning-driven multi-image super-resolution (MISR) reconstruction techniques have significant application value in the field of aerospace remote sensing. In particular, Transformer-based models have shown outstanding performance in super-resolution tasks. However, current MISR models have some deficiencies in the application of multi-scale information and the [...] Read more.
Deep-learning-driven multi-image super-resolution (MISR) reconstruction techniques have significant application value in the field of aerospace remote sensing. In particular, Transformer-based models have shown outstanding performance in super-resolution tasks. However, current MISR models have some deficiencies in the application of multi-scale information and the modeling of the attention mechanism, leading to an insufficient utilization of complementary information in multiple images. In this context, we innovatively propose a Multi-Attention Multi-Image Super-Resolution Transformer (MAST), which involves improvements in two main aspects. Firstly, we present a Multi-Scale and Mixed Attention Block (MMAB). With its multi-scale structure, the network is able to extract image features from different scales to obtain more contextual information. Additionally, the introduction of mixed attention allows the network to fully explore high-frequency features of the images in both channel and spatial dimensions. Secondly, we propose a Collaborative Attention Fusion Block (CAFB). By incorporating channel attention into the self-attention layer of the Transformer, we aim to better establish global correlations between multiple images. To improve the network’s perception ability of local detailed features, we introduce a Residual Local Attention Block (RLAB). With the aforementioned improvements, our model can better extract and utilize non-redundant information, achieving a superior restoration effect that balances the global structure and local details of the image. The results from the comparative experiments reveal that our approach demonstrated a notable enhancement in cPSNR, with improvements of 0.91 dB and 0.81 dB observed in the NIR and RED bands of the PROBA-V dataset, respectively, in comparison to the existing state-of-the-art methods. Extensive experiments demonstrate that the method proposed in this paper can provide a valuable reference for solving multi-image super-resolution tasks for remote sensing. Full article
Show Figures

Figure 1

18 pages, 19796 KiB  
Article
Daytime Sea Fog Identification Based on Multi-Satellite Information and the ECA-TransUnet Model
by He Lu, Yi Ma, Shichao Zhang, Xiang Yu and Jiahua Zhang
Remote Sens. 2023, 15(16), 3949; https://doi.org/10.3390/rs15163949 - 09 Aug 2023
Viewed by 1597
Abstract
Sea fog is a weather hazard along the coast and over the ocean that seriously threatens maritime activities. In the deep learning approach, it is difficult for convolutional neural networks (CNNs) to fully consider global context information in sea fog research due to [...] Read more.
Sea fog is a weather hazard along the coast and over the ocean that seriously threatens maritime activities. In the deep learning approach, it is difficult for convolutional neural networks (CNNs) to fully consider global context information in sea fog research due to their own limitations, and the recognition of sea fog edges is relatively vague. To solve the above problems, this paper puts forward an ECA-TransUnet model for daytime sea fog recognition, which consists of a combination of a CNN and a transformer. By designing a two-branch feed-forward network (FFN) module and introducing an efficient channel attention (ECA) module, the model can effectively take into account long-range pixel interactions and feature channel information to capture the global contextual information of sea fog data. Meanwhile, to solve the problem of insufficient existing sea fog detection datasets, we investigated sea fog events occurring in the Yellow Sea and Bohai Sea and their territorial waters, extracted remote sensing images from Moderate Resolution Imaging Spectroradiometer (MODIS) data at corresponding times, and combined data from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO), cloud and sea fog texture features, and waveband feature information to produce a manually annotated sea fog dataset. Our experiments showed that the proposed model achieves 94.5% accuracy and an 85.8% F1 score. Compared with the existing models relying only on CNNs such as UNet, FCN8s, and DeeplabV3+, it achieves state-of-the-art performance in sea fog recognition. Full article
Show Figures

Figure 1

25 pages, 41212 KiB  
Article
ReBiDet: An Enhanced Ship Detection Model Utilizing ReDet and Bi-Directional Feature Fusion
by Zexin Yan, Zhongbo Li, Yongqiang Xie, Chengyang Li, Shaonan Li and Fangwei Sun
Appl. Sci. 2023, 13(12), 7080; https://doi.org/10.3390/app13127080 - 13 Jun 2023
Cited by 1 | Viewed by 1304
Abstract
To enhance ship detection accuracy in the presence of complex scenes and significant variations in object scales, this study introduces three enhancements to ReDet, resulting in a more powerful ship detection model called rotation-equivariant bidirectional feature fusion detector (ReBiDet). Firstly, the feature pyramid [...] Read more.
To enhance ship detection accuracy in the presence of complex scenes and significant variations in object scales, this study introduces three enhancements to ReDet, resulting in a more powerful ship detection model called rotation-equivariant bidirectional feature fusion detector (ReBiDet). Firstly, the feature pyramid network (FPN) structure in ReDet is substituted with a rotation-equivariant bidirectional feature fusion feature pyramid network (ReBiFPN) to effectively capture and enrich multiscale feature information. Secondly, K-means clustering is utilized to group the aspect ratios of ground truth boxes in the dataset and adjust the anchor size settings accordingly. Lastly, the difficult positive reinforcement learning (DPRL) sampler is employed instead of the random sampler to address the scale imbalance issue between objects and backgrounds in the dataset, enabling the model to prioritize challenging positive examples. Through numerous experiments conducted on the HRSC2016 and DOTA remote sensing image datasets, the effectiveness of the proposed improvements in handling complex environments and small object detection tasks is validated. The ReBiDet model demonstrates state-of-the-art performance in remote sensing object detection tasks. Compared to the ReDet model and other advanced models, our ReBiDet achieves mAP improvements of 3.20, 0.42, and 1.16 on HRSC2016, DOTA-v1.0, and DOTA-v1.5, respectively, with only a slight increase of 0.82 million computational parameters. Full article
Show Figures

Figure 1

22 pages, 8069 KiB  
Article
Tree Species Classification in UAV Remote Sensing Images Based on Super-Resolution Reconstruction and Deep Learning
by Yingkang Huang, Xiaorong Wen, Yuanyun Gao, Yanli Zhang and Guozhong Lin
Remote Sens. 2023, 15(11), 2942; https://doi.org/10.3390/rs15112942 - 05 Jun 2023
Cited by 3 | Viewed by 2227
Abstract
We studied the use of self-attention mechanism networks (SAN) and convolutional neural networks (CNNs) for forest tree species classification using unmanned aerial vehicle (UAV) remote sensing imagery in Dongtai Forest Farm, Jiangsu Province, China. We trained and validated representative CNN models, such as [...] Read more.
We studied the use of self-attention mechanism networks (SAN) and convolutional neural networks (CNNs) for forest tree species classification using unmanned aerial vehicle (UAV) remote sensing imagery in Dongtai Forest Farm, Jiangsu Province, China. We trained and validated representative CNN models, such as ResNet and ConvNeXt, as well as the SAN model, which incorporates Transformer models such as Swin Transformer and Vision Transformer (ViT). Our goal was to compare and evaluate the performance and accuracy of these networks when used in parallel. Due to various factors, such as noise, motion blur, and atmospheric scattering, the quality of low-altitude aerial images may be compromised, resulting in indistinct tree crown edges and deficient texture. To address these issues, we adopted Real-ESRGAN technology for image super-resolution reconstruction. Our results showed that the image dataset after reconstruction improved classification accuracy for both the CNN and Transformer models. The final classification accuracies, validated by ResNet, ConvNeXt, ViT, and Swin Transformer, were 96.71%, 98.70%, 97.88%, and 98.59%, respectively, with corresponding improvements of 1.39%, 1.53%, 0.47%, and 1.18%. Our study highlights the potential benefits of Transformer and CNN for forest tree species classification and the importance of addressing the image quality degradation issues in low-altitude aerial images. Full article
Show Figures

Graphical abstract

21 pages, 5382 KiB  
Article
MCPT: Mixed Convolutional Parallel Transformer for Polarimetric SAR Image Classification
by Wenke Wang, Jianlong Wang, Bibo Lu, Boyuan Liu, Yake Zhang and Chunyang Wang
Remote Sens. 2023, 15(11), 2936; https://doi.org/10.3390/rs15112936 - 05 Jun 2023
Cited by 1 | Viewed by 1342
Abstract
Vision transformers (ViT) have the characteristics of massive training data and complex model, which cannot be directly applied to polarimetric synthetic aperture radar (PolSAR) image classification tasks. Therefore, a mixed convolutional parallel transformer (MCPT) model based on ViT is proposed for fast PolSAR [...] Read more.
Vision transformers (ViT) have the characteristics of massive training data and complex model, which cannot be directly applied to polarimetric synthetic aperture radar (PolSAR) image classification tasks. Therefore, a mixed convolutional parallel transformer (MCPT) model based on ViT is proposed for fast PolSAR image classification. First of all, a mixed depthwise convolution tokenization is introduced. It replaces the learnable linear projection in the original ViT to obtain patch embeddings. The process of tokenization can reduce computational and parameter complexity and extract features of different receptive fields as input to the encoder. Furthermore, combining the idea of shallow networks with lower latency and easier optimization, a parallel encoder is implemented by pairing the same modules and recombining to form parallel blocks, which can decrease the network depth and computing power requirement. In addition, the original class embedding and position embedding are removed during tokenization, and a global average pooling layer is added after the encoder for category feature extraction. Finally, the experimental results on AIRSAR Flevoland and RADARSAT-2 San Francisco datasets show that the proposed method achieves a significant improvement in training and prediction speed. Meanwhile, the overall accuracy achieved was 97.9% and 96.77%, respectively. Full article
Show Figures

Graphical abstract

20 pages, 4525 KiB  
Article
Gradient Structure Information-Guided Attention Generative Adversarial Networks for Remote Sensing Image Generation
by Baoyu Zhu, Qunbo Lv, Yuanbo Yang, Kai Zhang, Xuefu Sui, Yinhui Tang and Zheng Tan
Remote Sens. 2023, 15(11), 2827; https://doi.org/10.3390/rs15112827 - 29 May 2023
Viewed by 1239
Abstract
A rich and effective dataset is an important foundation for the development of AI algorithms, and the quantity and quality of the dataset determine the upper limit level of the algorithms. For aerospace remote sensing datasets, due to the high cost of data [...] Read more.
A rich and effective dataset is an important foundation for the development of AI algorithms, and the quantity and quality of the dataset determine the upper limit level of the algorithms. For aerospace remote sensing datasets, due to the high cost of data collection and susceptibility to meteorological and airway conditions, the existing datasets have two problems: firstly, the number of datasets is obviously insufficient, and, secondly, there is large unevenness between different categories in datasets. One of the effective solutions is to use neural networks to generate fake data by learning from real data, but existing methods still find difficulty in generating remote sensing sample images with good texture detail and geometric distortion. To address the shortcomings of existing image generation algorithms, this paper proposes a gradient structure information-guided attention generative adversarial network (SGA-GAN) for remote sensing image generation, which contains two innovative initiatives: on the one hand, a learnable gradient structure information extraction branch network can be added to the generator network to obtain complex structural information in the sample image, thus alleviating the distortion of the sample geometric structure in remote sensing image generation; on the other hand, a multidimensional self-attention feature selection module is proposed to further improve the quality of the generated remote sensing images by connecting cross-attentive modules as well as spatial and channel attention modules in series to guide the generator to better utilize global information. The algorithm proposed in this paper outperformed other methods, such as StyleGAN-XL and FastGAN, in both the qualitative and quantitative evaluation, whereby the FID on the DOTA dataset decreased by 23.927 and the IS was improved by 2.351. The comparison experiments show that the method proposed in this paper can generate more realistic sample images, and images generated by this method can improve object detection metrics by increasing the number of single-category datasets and the number of targets in fewer categories in multi-category datasets, which means it can be effectively used in the field of intelligent processing of remote sensing images. Full article
Show Figures

Graphical abstract

26 pages, 34880 KiB  
Article
AF-OSD: An Anchor-Free Oriented Ship Detector Based on Multi-Scale Dense-Point Rotation Gaussian Heatmap
by Zizheng Hua, Gaofeng Pan, Kun Gao, Hengchao Li and Su Chen
Remote Sens. 2023, 15(4), 1120; https://doi.org/10.3390/rs15041120 - 18 Feb 2023
Cited by 3 | Viewed by 1425
Abstract
Due to the complexity of airborne remote sensing scenes, strong background and noise interference, positive and negative sample imbalance, and multiple ship scales, ship detection is a critical and challenging task in remote sensing. This work proposes an end-to-end anchor-free oriented ship detector [...] Read more.
Due to the complexity of airborne remote sensing scenes, strong background and noise interference, positive and negative sample imbalance, and multiple ship scales, ship detection is a critical and challenging task in remote sensing. This work proposes an end-to-end anchor-free oriented ship detector (AF-OSD) framework based on a multi-scale dense-point rotation Gaussian heatmap (MDP-RGH) to tackle these aforementioned challenges. First, to solve the sample imbalance problem and suppress the interference of negative samples such as background and noise, the oriented ship is modeled via the proposed MDP-RGH according to its shape and direction to generate ship labels with more accurate information, while the imbalance between positive and negative samples is adaptively learned for the ships with different scales. Then, the AF-OSD based on MDP-RGH is further devised to detect the multi-scale oriented ship, which is the accurate identification and information extraction for multi-scale vessels. Finally, a multi-task object size adaptive loss function is designed to guide the training process, improving its detection quality and performance for multi-scale oriented ships. Simulation results show that extensive experiments on HRSC2016 and DOTA ship datasets reveal that the proposed method achieves significantly outperforms the compared state-of-the-art methods. Full article
Show Figures

Figure 1

24 pages, 16385 KiB  
Article
FERA-Net: A Building Change Detection Method for High-Resolution Remote Sensing Imagery Based on Residual Attention and High-Frequency Features
by Xuwei Xu, Yuan Zhou, Xiechun Lu and Zhanlong Chen
Remote Sens. 2023, 15(2), 395; https://doi.org/10.3390/rs15020395 - 09 Jan 2023
Cited by 4 | Viewed by 2153
Abstract
Buildings can represent the process of urban development, and building change detection can support land use management and urban planning. However, existing building change detection models are unable to extract multi-scale building features effectively or fully utilize the local and global information of [...] Read more.
Buildings can represent the process of urban development, and building change detection can support land use management and urban planning. However, existing building change detection models are unable to extract multi-scale building features effectively or fully utilize the local and global information of the feature maps, such as building edges. These defections affect the detection accuracy and may restrict further applications of the models. In this paper, we propose the feature-enhanced residual attention network (FERA-Net) to improve the performance of the ultrahigh-resolution remote sensing image change detection task. The FERA-Net is an end-to-end network with a U-shaped encoder–decoder structure. The Siamese network is used as the encoder with an attention-guided high-frequency feature extraction module (AGFM) extracting building features and enriching detail information, and the decoder applies a feature-enhanced skip connection module (FESCM) to aggregate the enhanced multi-level differential feature maps and gradually recover the change feature maps in this structure. The FERA-Net can generate predicted building change maps by the joint supervision of building change information and building edge information. The performance of the proposed model is tested on the WHU-CD dataset and the LEVIR-CD dataset. The experimental results show that our model outperforms the state-of-the-art models, with 93.51% precision and a 92.48% F1 score on the WHU-CD dataset, and 91.57% precision and an 89.58% F1 score on the LEVIR-CD dataset. Full article
Show Figures

Figure 1

17 pages, 951 KiB  
Article
Sensor Data Prediction in Missile Flight Tests
by Sang-Gyu Ryu, Jae Jin Jeong and David Hyunchul Shim
Sensors 2022, 22(23), 9410; https://doi.org/10.3390/s22239410 - 02 Dec 2022
Cited by 2 | Viewed by 1631
Abstract
Sensor data from missile flights are highly valuable, as a test requires considerable resources, but some sensors may be detached or fail to collect data. Remotely acquired missile sensor data are incomplete, and the correlations between the missile data are complex, which results [...] Read more.
Sensor data from missile flights are highly valuable, as a test requires considerable resources, but some sensors may be detached or fail to collect data. Remotely acquired missile sensor data are incomplete, and the correlations between the missile data are complex, which results in the prediction of sensor data being difficult. This article proposes a deep learning-based prediction network combined with the wavelet analysis method. The proposed network includes an imputer network and a prediction network. In the imputer network, the data are decomposed using wavelet transform, and the generative adversarial networks assist the decomposed data in reproducing the detailed information. The prediction network consists of long short-term memory with an attention and dilation network for accurate prediction. In the test, the actual sensor data from missile flights were used. For the performance evaluation, the test was conducted from the data with no missing values to the data with five different missing rates. The test results showed that the proposed system predicts the missile sensor most accurately in all cases. In the frequency analysis, the proposed system has similar frequency responses to the actual sensors and showed that the proposed system accurately predicted the sensors in both tendency and frequency aspects. Full article
Show Figures

Figure 1

13 pages, 1677 KiB  
Article
CE-BART: Cause-and-Effect BART for Visual Commonsense Generation
by Junyeong Kim, Ji Woo Hong, Sunjae Yoon and Chang D. Yoo
Sensors 2022, 22(23), 9399; https://doi.org/10.3390/s22239399 - 02 Dec 2022
Cited by 1 | Viewed by 1371
Abstract
“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a [...] Read more.
“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision–language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability. Full article
Show Figures

Figure 1

20 pages, 3616 KiB  
Article
Automatic Laboratory Martian Rock and Mineral Classification Using Highly-Discriminative Representation Derived from Spectral Signatures
by Juntao Yang, Zhizhong Kang, Ze Yang, Juan Xie, Bin Xue, Jianfeng Yang and Jinyou Tao
Remote Sens. 2022, 14(20), 5070; https://doi.org/10.3390/rs14205070 - 11 Oct 2022
Cited by 1 | Viewed by 1929
Abstract
The optical properties of rocks and minerals provide a reliable way to measure their chemical and mineralogical composition due to the specific reflection behaviors, which is also the key insight behind most automatic identification and classification approaches. However, the inter-category spectral similarity poses [...] Read more.
The optical properties of rocks and minerals provide a reliable way to measure their chemical and mineralogical composition due to the specific reflection behaviors, which is also the key insight behind most automatic identification and classification approaches. However, the inter-category spectral similarity poses a great challenge to the automatic identification and classification tasks because of the diversity of rocks and minerals. Therefore, this paper develops a recognition and classification approach of rocks and minerals using the highly discriminative representation derived from their raw spectral signatures. More specifically, a transformer-based classification approach integrated with category-aware contrastive learning is constructed and trained in an end-to-end manner, which would force instances of the same category to remain close-by while pushing instances of a dissimilar category far apart in the high-dimensional feature space, in order to produce the highly discriminative feature representation of the rocks and minerals. From both qualitative and quantitative views, experiments are conducted on the laboratory sample dataset with 30 types of rocks and minerals shared from the National Mineral Rock and Fossil Specimens Resource Center, and the spectral information of the laboratory rocks and minerals is captured using a multi-spectral sensor, with a duplicated payload of the counterpart onboard the Zhurong rover. Quantitative results demonstrate that the developed approach can effectively distinguish 30 types of rocks and minerals, with a high overall accuracy of 96.92%. Furthermore, the developed approach is remarkably superior to other existing methods, with average differences of 4.75% in the overall accuracy. Furthermore, we also visualized the derived highly discriminative features of different types of rocks and minerals by projecting them onto a two-dimensional map, where the same categories tend to be modeled by nearby locations and the dissimilar categories by distant locations with high probability. It can be observed that, compared with those in the raw spectral feature space, the clusters are formed better in the derived highly discriminative feature space, which further confirms the promising representation capability. Full article
Show Figures

Figure 1

17 pages, 1117 KiB  
Article
Dual-Scale Doppler Attention for Human Identification
by Sunjae Yoon, Dahyun Kim, Ji Woo Hong, Junyeong Kim and Chang D. Yoo
Sensors 2022, 22(17), 6363; https://doi.org/10.3390/s22176363 - 24 Aug 2022
Cited by 2 | Viewed by 1507
Abstract
This paper considers a Deep Convolutional Neural Network (DCNN) with an attention mechanism referred to as Dual-Scale Doppler Attention (DSDA) for human identification given a micro-Doppler (MD) signature induced as input. The MD signature includes unique gait characteristics by different sized body parts [...] Read more.
This paper considers a Deep Convolutional Neural Network (DCNN) with an attention mechanism referred to as Dual-Scale Doppler Attention (DSDA) for human identification given a micro-Doppler (MD) signature induced as input. The MD signature includes unique gait characteristics by different sized body parts moving, as arms and legs move rapidly, while the torso moves slowly. Each person is identified based on his/her unique gait characteristic in the MD signature. DSDA provides attention at different time-frequency resolutions to cater to different MD components composed of both fast-varying and steady. Through this, DSDA can capture the unique gait characteristic of each person used for human identification. We demonstrate the validity of DSDA on a recently published benchmark dataset, IDRad. The empirical results show that the proposed DSDA outperforms previous methods, using a qualitative analysis interpretability on MD signatures. Full article
Show Figures

Figure 1

Back to TopTop