3D Scene Understanding and Object Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2023) | Viewed by 11246

Special Issue Editors


E-Mail Website
Guest Editor
University Institute for Computer Research, University of Alicante, P.O. Box 99, 03080 Alicante, Spain
Interests: machine learning; computer vision; pattern recognition; gesture recognition; object recognition; neural networks; artificial intelligence

E-Mail Website
Guest Editor
University Institute for Computer Research, University of Alicante, P.O. Box 99, 03080 Alicante, Spain
Interests: computer vision; deep learning; 3D object recognition; mapping; navigation; robotics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

Three-dimensional data have become widespread in recent years due largely to the rise of self-driving cars and intelligent vehicles. These new transportation systems are fitted with LiDAR, ToF cameras, stereo setups and a range of other devices providing 3D data on nearby surroundings. Furthermore, 3D data are actively used in the industry for quality testing and other tasks, as well as in consumer devices, such as smartphones. Moreover, most robots are equipped with a device able to perceive this kind of info.

Managing 3D data is thus of the utmost importance, and the ability to optimally perform guidance and navigation, object recognition and detection, reduction in noise and other related tasks is a hot research topic today.

Against this background, we propose this Special Issue focused on 3D scene understanding and object recognition with an emphasis on new algorithms and applications using 3D data. Topics of interest include:

  • Learning-based 3D object recognition;
  • Monocular depth estimation;
  • Navigation algorithms based on 3D data;
  • Registration and map creation;
  • Noise reduction in 3D data.

Dr. Francisco Gomez-Donoso
Dr. Félix Escalona Moncholí
Prof. Dr. Miguel Cazorla
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • 3D scene understanding
  • registration
  • mapping
  • 3D object recognition
  • depth estimation
  • noise reduction

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 11374 KiB  
Article
3D Point Cloud Completion Method Based on Building Contour Constraint Diffusion Probability Model
by Bo Ye, Han Wang, Jingwen Li, Jianwu Jiang, Yanling Lu, Ertao Gao and Tao Yue
Appl. Sci. 2023, 13(20), 11246; https://doi.org/10.3390/app132011246 - 13 Oct 2023
Viewed by 1008
Abstract
Building point cloud completion is the process of reconstructing missing parts of a building’s point cloud, which have been affected by external factors during data collection, to restore the original geometric shape of the building. However, the uncertainty in filling point positions in [...] Read more.
Building point cloud completion is the process of reconstructing missing parts of a building’s point cloud, which have been affected by external factors during data collection, to restore the original geometric shape of the building. However, the uncertainty in filling point positions in the areas where building features are missing makes it challenging to recover the original distribution of the building’s point cloud shape. To address this issue, we propose a point cloud generation diffusion probability model based on building outline constraints. This method constructs building-outline-constrained regions using information related to the walls on the building’s surface and adjacent roofs. These constraints are encoded by an encoder and fused into latent codes representing the incomplete building point cloud shape. This ensures that the completed point cloud adheres closely to the real geometric shape of the building by constraining the generated points within the missing areas. The quantitative and qualitative results of the experiment clearly show that our method performs better than other methods in building point cloud completion. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

14 pages, 2826 KiB  
Article
Study of Root Canal Length Estimations by 3D Spatial Reproduction with Stereoscopic Vision
by Takato Tsukuda, Noriko Mutoh, Akito Nakano, Tomoki Itamiya and Nobuyuki Tani-Ishii
Appl. Sci. 2023, 13(15), 8651; https://doi.org/10.3390/app13158651 - 27 Jul 2023
Viewed by 1071
Abstract
Extended Reality (XR) applications are considered useful for skill acquisition in dental education. In this study, we examined the functionality and usefulness of an application called “SR View for Endo” that measures root canal length using a Spatial Reality Display (SRD) capable of [...] Read more.
Extended Reality (XR) applications are considered useful for skill acquisition in dental education. In this study, we examined the functionality and usefulness of an application called “SR View for Endo” that measures root canal length using a Spatial Reality Display (SRD) capable of naked-eye stereoscopic viewing. Three-dimensional computer graphics (3DCG) data of dental models were obtained and output to both the SRD and conventional 2D display devices. Forty dentists working at the Kanagawa Dental University Hospital measured root canal length using both types of devices and provided feedback through a questionnaire. Statistical analysis using one-way analysis of variance evaluated the measurement values and time, while multivariate analysis assessed the relationship between questionnaire responses and measurement time. There was no significant difference in the measurement values between the 2D device and SRD, but there was a significant difference in measurement time. Furthermore, a negative correlation was observed between the frequency of device usage and the extended measurement time of the 2D device. Measurements using the SRD demonstrated higher accuracy and shorter measurement times compared to the 2D device, increasing expectations for clinical practice in dental education and clinical education for clinical applications. However, a certain percentage of participants experienced symptoms resembling motion sickness associated with virtual reality (VR). Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

17 pages, 58573 KiB  
Article
A 3D Estimation Method Using an Omnidirectional Camera and a Spherical Mirror
by Yuya Hiruta, Chun Xie, Hidehiko Shishido and Itaru Kitahara
Appl. Sci. 2023, 13(14), 8348; https://doi.org/10.3390/app13148348 - 19 Jul 2023
Viewed by 918
Abstract
As the demand for 3D information continues to grow in various fields, technologies are rapidly being used to acquire such information. Laser-based estimation and multi-view images are popular methods for sensing 3D information, while deep learning techniques are also being developed. However, the [...] Read more.
As the demand for 3D information continues to grow in various fields, technologies are rapidly being used to acquire such information. Laser-based estimation and multi-view images are popular methods for sensing 3D information, while deep learning techniques are also being developed. However, the former method requires precise sensing equipment or large observation systems, while the latter relies on substantial prior information in the form of extensive learning datasets. Given these limitations, our research aims to develop a method that is independent of learning and makes it possible to capture a wide range of 3D information using a compact device. This paper introduces a novel approach for estimating the 3D information of an observed scene utilizing a monocular image based on a catadioptric imaging system employing an omnidirectional camera and a spherical mirror. By employing a curved mirror, it is possible to capture a large area in a single observation. At the same time, using an omnidirectional camera enables the creation of a simplified imaging system. The proposed method focuses on a spherical or spherical cap-shaped mirror in the scene. It estimates the mirror’s position from the captured images, allowing for the estimation of the scene with great flexibility. Simulation evaluations are conducted to validate the characteristics and effectiveness of our proposed method. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

14 pages, 5688 KiB  
Article
FANet: Improving 3D Object Detection with Position Adaptation
by Jian Ye, Fushan Zuo and Yuqing Qian
Appl. Sci. 2023, 13(13), 7508; https://doi.org/10.3390/app13137508 - 25 Jun 2023
Viewed by 998
Abstract
Three-dimensional object detection plays a crucial role in achieving accurate and reliable autonomous driving systems. However, the current state-of-the-art two-stage detectors lack flexibility and have limited feature extraction capabilities to effectively handle the disorder and irregularity of point clouds. In this paper, we [...] Read more.
Three-dimensional object detection plays a crucial role in achieving accurate and reliable autonomous driving systems. However, the current state-of-the-art two-stage detectors lack flexibility and have limited feature extraction capabilities to effectively handle the disorder and irregularity of point clouds. In this paper, we propose a novel network called FANet, which combines the strengths of PV-RCNN and PAConv (position adaptive convolution). The goal of FANet is to address the irregularity and disorder present in point clouds. In our network, the convolution operation constructs convolutional kernels using a basic weight matrix, and the coefficients of these kernels are adaptively learned by LearnNet from relative points. This approach allows for the flexible modeling of complex spatial variations and geometric structures in 3D point clouds, leading to the improved extraction of point cloud features and generation of high-quality 3D proposal boxes. Compared to other methods, extensive experiments on the KITTI dataset have demonstrated that the FANet exhibits superior 3D object detection accuracy, showcasing a significant improvement in our approach. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

25 pages, 5804 KiB  
Article
NGLSFusion: Non-Use GPU Lightweight Indoor Semantic SLAM
by Le Wan, Lin Jiang, Bo Tang, Yunfei Li, Bin Lei and Honghai Liu
Appl. Sci. 2023, 13(9), 5285; https://doi.org/10.3390/app13095285 - 23 Apr 2023
Viewed by 1206
Abstract
Perception of the indoor environment is the basis of mobile robot localization, navigation, and path planning, and it is particularly important to construct semantic maps in real time using minimal resources. The existing methods are too dependent on the graphics processing unit (GPU) [...] Read more.
Perception of the indoor environment is the basis of mobile robot localization, navigation, and path planning, and it is particularly important to construct semantic maps in real time using minimal resources. The existing methods are too dependent on the graphics processing unit (GPU) for acquiring semantic information about the indoor environment, and cannot build the semantic map in real time on the central processing unit (CPU). To address the above problems, this paper proposes a non-use GPU for lightweight indoor semantic map construction algorithm, named NGLSFusion. In the VO method, ORB features are used for the initialization of the first frame, new keyframes are created by optical flow method, and feature points are extracted by direct method, which speeds up the tracking speed. In the semantic map construction method, a pretrained model of the lightweight network LinkNet is optimized to provide semantic information in real time on devices with limited computing power, and a semantic point cloud is fused using OctoMap and Voxblox. Experimental results show that the algorithm in this paper ensures the accuracy of camera pose while speeding up the tracking speed, and obtains a reconstructed semantic map with complete structure without using GPU. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

19 pages, 9929 KiB  
Article
Boundary–Inner Disentanglement Enhanced Learning for Point Cloud Semantic Segmentation
by Lixia He, Jiangfeng She, Qiang Zhao, Xiang Wen and Yuzheng Guan
Appl. Sci. 2023, 13(6), 4053; https://doi.org/10.3390/app13064053 - 22 Mar 2023
Cited by 1 | Viewed by 1181
Abstract
In a point cloud semantic segmentation task, misclassification usually appears on the semantic boundary. A few studies have taken the boundary into consideration, but they relied on complex modules for explicit boundary prediction, which greatly increased model complexity. It is challenging to improve [...] Read more.
In a point cloud semantic segmentation task, misclassification usually appears on the semantic boundary. A few studies have taken the boundary into consideration, but they relied on complex modules for explicit boundary prediction, which greatly increased model complexity. It is challenging to improve the segmentation accuracy of points on the boundary without dependence on additional modules. For every boundary point, this paper divides its neighboring points into different collections, and then measures its entanglement with each collection. A comparison of the measurement results before and after utilizing boundary information in the semantic segmentation network showed that the boundary could enhance the disentanglement between the boundary point and its neighboring points in inner areas, thereby greatly improving the overall accuracy. Therefore, to improve the semantic segmentation accuracy of boundary points, a Boundary–Inner Disentanglement Enhanced Learning (BIDEL) framework with no need for additional modules and learning parameters is proposed, which can maximize feature distinction between the boundary point and its neighboring points in inner areas through a newly defined boundary loss function. Experiments with two classic baselines across three challenging datasets demonstrate the benefits of BIDEL for the semantic boundary. As a general framework, BIDEL can be easily adopted in many existing semantic segmentation networks. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

27 pages, 4579 KiB  
Article
An Accurate, Efficient, and Stable Perspective-n-Point Algorithm in 3D Space
by Rui Qiao, Guili Xu, Ping Wang, Yuehua Cheng and Wende Dong
Appl. Sci. 2023, 13(2), 1111; https://doi.org/10.3390/app13021111 - 13 Jan 2023
Cited by 1 | Viewed by 2331
Abstract
The Perspective-n-Point problem is usually addressed by means of a projective imaging model of 3D points, but the spatial distribution and quantity of 3D reference points vary, making it difficult for the Perspective-n-Point algorithm to balance accuracy, robustness, and computational efficiency. To address [...] Read more.
The Perspective-n-Point problem is usually addressed by means of a projective imaging model of 3D points, but the spatial distribution and quantity of 3D reference points vary, making it difficult for the Perspective-n-Point algorithm to balance accuracy, robustness, and computational efficiency. To address this issue, this paper introduces Hidden PnP, a hidden variable method. Following the parameterization of the rotation matrix by CGR parameters, the method, unlike the existing best matrix synthesis technique (Gröbner technology), does not require construction of a larger matrix elimination template in the polynomial solution phase. Therefore, it is able to solve CGR parameter rapidly, and achieve an accurate location of the solution using the Gauss–Newton method. According to the synthetic data test, the PnP algorithm solution, based on hidden variables, outperforms the existing best Perspective-n-Point method in accuracy and robustness, under cases of Ordinary 3D, Planar Case, and Quasi-Singular. Furthermore, its computational efficiency can be up to seven times that of existing excellent algorithms when the spatially redundant reference points are increased to 500. In physical experiments on pose reprojection from monocular cameras, this algorithm even showed higher accuracy than the best existing algorithm. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

15 pages, 1736 KiB  
Article
LUMDE: Light-Weight Unsupervised Monocular Depth Estimation via Knowledge Distillation
by Wenze Hu, Xue Dong, Ning Liu and Yuanfeng Chen
Appl. Sci. 2022, 12(24), 12593; https://doi.org/10.3390/app122412593 - 08 Dec 2022
Cited by 1 | Viewed by 1689
Abstract
The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth [...] Read more.
The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth estimation networks have been reported previously, such as Monodepth2 and SC-SfMLearner, most of these approaches are still computationally expensive for lightweight devices. Therefore, in this paper, we introduced a knowledge-distillation-based approach named LUMDE, to deal with the pixel-by-pixel unsupervised monocular depth estimation task. Specifically, we use a teacher network and lightweight student network to distill the depth information, and further, integrate a pose network into the student module to improve the depth performance. Moreover, referring to the idea of the Generative Adversarial Network (GAN), the outputs of the student network and teacher network are taken as fake and real samples, respectively, and Transformer is introduced as the discriminator of GAN to further improve the depth prediction results. The proposed LUMDE method achieves state-of-the-art (SOTA) results in the knowledge distillation of unsupervised depth estimation and also outperforms the results of some dense networks. The proposed LUMDE model only loses 2.6% on δ1 accuracy on the NYUD-V2 dataset compared with the teacher network but reduces the computational complexity by 95.2%. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Back to TopTop