Editorial

6 pages, 197 KiB

Open AccessEditorial

Special Issue on “Augmented Reality, Virtual Reality & Semantic 3D Reconstruction”

by Zhihan Lv, Jing-Yan Wang, Neeraj Kumar and Jaime Lloret

Appl. Sci. 2021, 11(18), 8590; https://doi.org/10.3390/app11188590 - 16 Sep 2021

Cited by 2 | Viewed by 1575

Augmented Reality is a key technology that will facilitate a major paradigm shift in the way users interact with data and has only just recently been recognized as a viable solution for solving many critical needs [...] Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

Research

Jump to: Editorial, Review

11 pages, 5417 KiB

Open AccessArticle

A Novel Real-Time Virtual 3D Object Composition Method for 360° Video

by Jaehyun Lee, Sungjae Ha, Philippe Gentet, Leehwan Hwang, Soonchul Kwon and Seunghyun Lee

Appl. Sci. 2020, 10(23), 8679; https://doi.org/10.3390/app10238679 - 04 Dec 2020

Cited by 1 | Viewed by 2659

Abstract

As highly immersive virtual reality (VR) content, 360° video allows users to observe all viewpoints within the desired direction from the position where the video is recorded. In 360° video content, virtual objects are inserted into recorded real scenes to provide a higher [...] Read more.

As highly immersive virtual reality (VR) content, 360° video allows users to observe all viewpoints within the desired direction from the position where the video is recorded. In 360° video content, virtual objects are inserted into recorded real scenes to provide a higher sense of immersion. These techniques are called 3D composition. For a realistic 3D composition in a 360° video, it is important to obtain the internal (focal length) and external (position and rotation) parameters from a 360° camera. Traditional methods estimate the trajectory of a camera by extracting the feature point from the recorded video. However, incorrect results may occur owing to stitching errors from a 360° camera attached to several high-resolution cameras for the stitching process, and a large amount of time is spent on feature tracking owing to the high-resolution of the video. We propose a new method for pre-visualization and 3D composition that overcomes the limitations of existing methods. This system achieves real-time position tracking of the attached camera using a ZED camera and a stereo-vision sensor, and real-time stabilization using a Kalman filter. The proposed system shows high time efficiency and accurate 3D composition. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

16 pages, 2221 KiB

Open AccessArticle

Skeleton-Based Dynamic Hand Gesture Recognition Using an Enhanced Network with One-Shot Learning

by Chunyong Ma, Shengsheng Zhang, Anni Wang, Yongyang Qi and Ge Chen

Appl. Sci. 2020, 10(11), 3680; https://doi.org/10.3390/app10113680 - 26 May 2020

Cited by 23 | Viewed by 4798

Abstract

Dynamic hand gesture recognition based on one-shot learning requires full assimilation of the motion features from a few annotated data. However, how to effectively extract the spatio-temporal features of the hand gestures remains a challenging issue. This paper proposes a skeleton-based dynamic hand [...] Read more.

Dynamic hand gesture recognition based on one-shot learning requires full assimilation of the motion features from a few annotated data. However, how to effectively extract the spatio-temporal features of the hand gestures remains a challenging issue. This paper proposes a skeleton-based dynamic hand gesture recognition using an enhanced network (GREN) based on one-shot learning by improving the memory-augmented neural network, which can rapidly assimilate the motion features of dynamic hand gestures. Besides, the network effectively combines and stores the shared features between dissimilar classes, which lowers the prediction error caused by the unnecessary hyper-parameters updating, and improves the recognition accuracy with the increase of categories. In this paper, the public dynamic hand gesture database (DHGD) is used for the experimental comparison of the state-of-the-art performance of the GREN network, and although only 30% of the dataset was used for training, the accuracy of skeleton-based dynamic hand gesture recognition reached 82.29% based on one-shot learning. Experiments with the Microsoft Research Asia (MSRA) hand gesture dataset verified the robustness of the GREN network. The experimental results demonstrate that the GREN network is feasible for skeleton-based dynamic hand gesture recognition based on one-shot learning. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

15 pages, 2324 KiB

Open AccessArticle

Exploring Visual Perceptions of Spatial Information for Wayfinding in Virtual Reality Environments

by Ju Yeon Kim and Mi Jeong Kim

Appl. Sci. 2020, 10(10), 3461; https://doi.org/10.3390/app10103461 - 17 May 2020

Cited by 7 | Viewed by 3465

Abstract

Human cognitive processes in wayfinding may differ depending on the time taken to accept visual information in environments. This study investigated users’ wayfinding processes using eye-tracking experiments, simulating a complex cultural space to analyze human visual movements in the perception and the cognitive [...] Read more.

Human cognitive processes in wayfinding may differ depending on the time taken to accept visual information in environments. This study investigated users’ wayfinding processes using eye-tracking experiments, simulating a complex cultural space to analyze human visual movements in the perception and the cognitive processes through visual perception responses. The experiment set-up consisted of several paths in COEX Mall, Seoul—from the entrance of the shopping mall Starfield to the Star Hall Library to the COEX Exhibition Hall—using visual stimuli created by virtual reality (four stimuli and a total of 60 seconds stimulation time). The participants in the environment were 24 undergraduate or graduate students, with an average age of 24.8 years. Participants’ visual perception processes were analyzed in terms of the clarity and the recognition of spatial information and the activation of gaze fixation on spatial information. That is, the analysis of the visual perception process was performed by extracting “conscious gaze perspective” data comprising more than 50 consecutive 200 ms continuous gaze fixations; “visual understanding perspective” data were also extracted for more than 300 ms of continuous gaze fixation. The results show that the methods for analyzing the gaze data may vary in terms of processing, analysis, and scope of the data depending on the purpose of the virtual reality experiments. Further, they demonstrate the importance of what purpose statements are given to the subject during the experiment and the possibility of a technical approach being used for the interpretation of spatial information. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

25 pages, 4863 KiB

Open AccessFeature PaperArticle

Semi-Immersive Virtual Reality as a Tool to Improve Cognitive and Social Abilities in Preschool Children

by Maria Luisa Lorusso, Simona Travellini, Marisa Giorgetti, Paola Negrini, Gianluigi Reni and Emilia Biffi

Appl. Sci. 2020, 10(8), 2948; https://doi.org/10.3390/app10082948 - 24 Apr 2020

Cited by 16 | Viewed by 5961

Abstract

Virtual reality (VR) creates computer-generated virtual environments where users can experience and interact in a similar way as they would do in real life. VR systems are increasingly being used for rehabilitation goals, mainly with adults, but also with children, extending their application [...] Read more.

Virtual reality (VR) creates computer-generated virtual environments where users can experience and interact in a similar way as they would do in real life. VR systems are increasingly being used for rehabilitation goals, mainly with adults, but also with children, extending their application to the educational field. This report concerns a study of the impact of a semi-immersive VR system in a group of 25 children in a kindergarten context. The children were involved in several different games and activity types, specifically developed with the aim of learning specific skills and foster team collaboration. Their reactions and behaviors were recorded by their teachers and by trained psychologists through observation grids addressing task comprehension, participation and enjoyment, interaction and cooperation, conflict, strategic behaviors, and adult-directed questions concerning the activity, the device or general help requests. The grids were compiled at the initial, intermediate and final timepoint during each session. The results show that the activities are easy to understand, enjoyable, and stimulate strategic behaviors, interaction and cooperation, while they do not elicit the need for many explanations. These results are discussed within a neuroconstructivist educational framework and the suitability of semi-immersive, virtual-reality-based activities for cognitive empowerment and rehabilitation purposes is discussed. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Graphical abstract

12 pages, 2077 KiB

Open AccessArticle

FCN-Based 3D Reconstruction with Multi-Source Photometric Stereo

by Ruixin Wang, Xin Wang, Di He, Lei Wang and Ke Xu

Appl. Sci. 2020, 10(8), 2914; https://doi.org/10.3390/app10082914 - 23 Apr 2020

Cited by 3 | Viewed by 2308

Abstract

As a classical method widely used in 3D reconstruction tasks, the multi-source Photometric Stereo can obtain more accurate 3D reconstruction results compared with the basic Photometric Stereo, but its complex calibration and solution process reduces the efficiency of this algorithm. In this paper, [...] Read more.

As a classical method widely used in 3D reconstruction tasks, the multi-source Photometric Stereo can obtain more accurate 3D reconstruction results compared with the basic Photometric Stereo, but its complex calibration and solution process reduces the efficiency of this algorithm. In this paper, we propose a multi-source Photometric Stereo 3D reconstruction method based on the fully convolutional network (FCN). We first represent the 3D shape of the object as a depth value corresponding to each pixel as the optimized object. After training in an end-to-end manner, our network can efficiently obtain 3D information on the object surface. In addition, we added two regularization constraints to the general loss function, which can effectively help the network to optimize. Under the same light source configuration, our method can obtain a higher accuracy than the classic multi-source Photometric Stereo. At the same time, our new loss function can help the deep learning method to get a more realistic 3D reconstruction result. We have also used our own real dataset to experimentally verify our method. The experimental results show that our method has a good effect on solving the main problems faced by the classical method. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

8 pages, 248 KiB

Open AccessArticle

Is an ADHD Observation-Scale Based on DSM Criteria Able to Predict Performance in a Virtual Reality Continuous Performance Test?

by Débora Areces, Celestino Rodríguez, Trinidad García and Marisol Cueli

Appl. Sci. 2020, 10(7), 2409; https://doi.org/10.3390/app10072409 - 01 Apr 2020

Cited by 5 | Viewed by 2834

Abstract

The Diagnosis of Attention Deficit/Hyperactivity Disorder (ADHD) requires an exhaustive and objective assessment in order to design an intervention that is adapted to the peculiarities of the patients. The present study aimed to determine if the most commonly used ADHD observation scale—the Evaluation [...] Read more.

The Diagnosis of Attention Deficit/Hyperactivity Disorder (ADHD) requires an exhaustive and objective assessment in order to design an intervention that is adapted to the peculiarities of the patients. The present study aimed to determine if the most commonly used ADHD observation scale—the Evaluation of Attention Deficit and Hyperactivity (EDAH) scale—is able to predict performance in a Continuous Performance Test based on Virtual Reality (VR-CPT). One-hundred-and-fifty students (76% boys and 24% girls) aged 6–16 (M = 10.35; DT = 2.39) participated in the study. Regression analyses showed that only the EDAH subscale referring to inattention symptoms, was a statistically significant predictor of performance in a VR-CPT. More specifically, this subscale showed 86.5% prediction-accuracy regarding performance in the Omissions variable, 80.4% in the Commissions variable, and 74.5% in the Response-time variable. The EDAH subscales referring to impulsivity and hyperactivity were not statistically significant predictors of any variables in the VR-CPT. Our findings may partially explain why impulsive-hyperactive and the combined presentations of ADHD might be considered as unique and qualitatively different sub-categories of ADHD. These results also highlighted the importance of measuring not only the observable behaviors of ADHD individuals, but also the scores in performance tests that are attained by the patients themselves. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

27 pages, 6065 KiB

Open AccessArticle

Development and Assessment of a Sensor-Based Orientation and Positioning Approach for Decreasing Variation in Camera Viewpoints and Image Transformations at Construction Sites

by Mohsen Foroughi Sabzevar, Masoud Gheisari and James Lo

Appl. Sci. 2020, 10(7), 2305; https://doi.org/10.3390/app10072305 - 27 Mar 2020

Cited by 3 | Viewed by 2516

Abstract

Image matching techniques offer valuable opportunities for the construction industry. Image matching, a fundamental process in computer vision, is required for different purposes such as object and scene recognition, video data mining, reconstruction of three-dimensional (3D) objects, etc. During the image matching process, [...] Read more.

Image matching techniques offer valuable opportunities for the construction industry. Image matching, a fundamental process in computer vision, is required for different purposes such as object and scene recognition, video data mining, reconstruction of three-dimensional (3D) objects, etc. During the image matching process, two images that are randomly (i.e., from different position and orientation) captured from a scene are compared using image matching algorithms in order to identify their similarity. However, this process is very complex and error prone, because pictures that are randomly captured from a scene vary in viewpoints. Therefore, some main features in images such as position, orientation, and scale of objects are transformed. Sometimes, these image matching algorithms cannot correctly identify the similarity between these images. Logically, if these features remain unchanged during the picture capturing process, then image transformations are reduced, similarity increases, and consequently, the chances of algorithms successfully conducting the image matching process increase. One way to improve these chances is to hold the camera at a fixed viewpoint. However, in messy, dusty, and temporary locations such as construction sites, holding the camera at a fixed viewpoint is not always feasible. Is there any way to repeat and retrieve the camera’s viewpoints during different captures at locations such as construction sites? This study developed and evaluated an orientation and positioning approach that decreased the variation in camera viewpoints and image transformation on construction sites. The results showed that images captured while using this approach had less image transformation in contrast to images not captured using this approach. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

13 pages, 1875 KiB

Open AccessArticle

Generative Adversarial Network for Image Super-Resolution Combining Texture Loss

by Yuning Jiang and Jinhua Li

Appl. Sci. 2020, 10(5), 1729; https://doi.org/10.3390/app10051729 - 03 Mar 2020

Cited by 15 | Viewed by 4158

Abstract

Objective: Super-resolution reconstruction is an increasingly important area in computer vision. To alleviate the problems that super-resolution reconstruction models based on generative adversarial networks are difficult to train and contain artifacts in reconstruction results, we propose a novel and improved algorithm. Methods: This [...] Read more.

Objective: Super-resolution reconstruction is an increasingly important area in computer vision. To alleviate the problems that super-resolution reconstruction models based on generative adversarial networks are difficult to train and contain artifacts in reconstruction results, we propose a novel and improved algorithm. Methods: This paper presented TSRGAN (Super-Resolution Generative Adversarial Networks Combining Texture Loss) model which was also based on generative adversarial networks. We redefined the generator network and discriminator network. Firstly, on the network structure, residual dense blocks without excess batch normalization layers were used to form generator network. Visual Geometry Group (VGG)19 network was adopted as the basic framework of discriminator network. Secondly, in the loss function, the weighting of the four loss functions of texture loss, perceptual loss, adversarial loss and content loss was used as the objective function of generator. Texture loss was proposed to encourage local information matching. Perceptual loss was enhanced by employing the features before activation layer to calculate. Adversarial loss was optimized based on WGAN-GP (Wasserstein GAN with Gradient Penalty) theory. Content loss was used to ensure the accuracy of low-frequency information. During the optimization process, the target image information was reconstructed from different angles of high and low frequencies. Results: The experimental results showed that our method made the average Peak Signal to Noise Ratio of reconstructed images reach 27.99 dB and the average Structural Similarity Index reach 0.778 without losing too much speed, which was superior to other comparison algorithms in objective evaluation index. What is more, TSRGAN significantly improved subjective visual evaluations such as brightness information and texture details. We found that it could generate images with more realistic textures and more accurate brightness, which were more in line with human visual evaluation. Conclusions: Our improvements to the network structure could reduce the model’s calculation amount and stabilize the training direction. In addition, the loss function we present for generator could provide stronger supervision for restoring realistic textures and achieving brightness consistency. Experimental results prove the effectiveness and superiority of TSRGAN algorithm. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

11 pages, 8632 KiB

Open AccessArticle

The Imperial Cathedral in Königslutter (Germany) as an Immersive Experience in Virtual Reality with Integrated 360° Panoramic Photography

by Alexander P. Walmsley and Thomas P. Kersten

Appl. Sci. 2020, 10(4), 1517; https://doi.org/10.3390/app10041517 - 23 Feb 2020

Cited by 44 | Viewed by 6121

Abstract

As virtual reality (VR) and the corresponding 3D documentation and modelling technologies evolve into increasingly powerful and established tools for numerous applications in architecture, monument preservation, conservation/restoration and the presentation of cultural heritage, new methods for creating information-rich interactive 3D environments are increasingly [...] Read more.

As virtual reality (VR) and the corresponding 3D documentation and modelling technologies evolve into increasingly powerful and established tools for numerous applications in architecture, monument preservation, conservation/restoration and the presentation of cultural heritage, new methods for creating information-rich interactive 3D environments are increasingly in demand. In this article, we describe the development of an immersive virtual reality application for the Imperial Cathedral in Königslutter, in which 360° panoramic photographs were integrated within the virtual environment as a novel and complementary form of visualization. The Imperial Cathedral (Kaiserdom) of Königslutter is one of the most important examples of Romanesque architecture north of the Alps. The Cathedral had previously been subjected to laser-scanning and recording with 360° panoramic photography by the Photogrammetry & Laser Scanning lab of HafenCity University Hamburg in 2010. With the recent rapid development of consumer VR technology, it was subsequently decided to investigate how these two data sources could be combined within an immersive VR application for tourism and for architectural heritage preservation. A specialised technical workflow was developed to build the virtual environment in Unreal Engine 4 (UE4) and integrate the panorama photographs so as to ensure the seamless integration of these two datasets. A simple mechanic was developed using the native UE4 node-based programming language to switch between these two modes of visualisation. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

14 pages, 9172 KiB

Open AccessArticle

Semantic 3D Reconstruction with Learning MVS and 2D Segmentation of Aerial Images

by Zizhuang Wei, Yao Wang, Hongwei Yi, Yisong Chen and Guoping Wang

Appl. Sci. 2020, 10(4), 1275; https://doi.org/10.3390/app10041275 - 14 Feb 2020

Cited by 6 | Viewed by 4654

Abstract

Semantic modeling is a challenging task that has received widespread attention in recent years. With the help of mini Unmanned Aerial Vehicles (UAVs), multi-view high-resolution aerial images of large-scale scenes can be conveniently collected. In this paper, we propose a semantic Multi-View Stereo [...] Read more.

Semantic modeling is a challenging task that has received widespread attention in recent years. With the help of mini Unmanned Aerial Vehicles (UAVs), multi-view high-resolution aerial images of large-scale scenes can be conveniently collected. In this paper, we propose a semantic Multi-View Stereo (MVS) method to reconstruct 3D semantic models from 2D images. Firstly, 2D semantic probability distribution is obtained by Convolutional Neural Network (CNN). Secondly, the calibrated cameras poses are determined by Structure from Motion (SfM), while the depth maps are estimated by learning MVS. Combining 2D segmentation and 3D geometry information, dense point clouds with semantic labels are generated by a probability-based semantic fusion method. In the final stage, the coarse 3D semantic point cloud is optimized by both local and global refinements. By making full use of the multi-view consistency, the proposed method efficiently produces a fine-level 3D semantic point cloud. The experimental result evaluated by re-projection maps achieves 88.4% Pixel Accuracy on the Urban Drone Dataset (UDD). In conclusion, our graph-based semantic fusion procedure and refinement based on local and global information can suppress and reduce the re-projection error. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

13 pages, 2724 KiB

Open AccessArticle

Semantic 3D Reconstruction for Robotic Manipulators with an Eye-In-Hand Vision System

by Fusheng Zha, Yu Fu, Pengfei Wang, Wei Guo, Mantian Li, Xin Wang and Hegao Cai

Appl. Sci. 2020, 10(3), 1183; https://doi.org/10.3390/app10031183 - 10 Feb 2020

Cited by 9 | Viewed by 4177

Abstract

Three-dimensional reconstruction and semantic understandings have attracted extensive attention in recent years. However, current reconstruction techniques mainly target large-scale scenes, such as an indoor environment or automatic self-driving cars. There are few studies on small-scale and high-precision scene reconstruction for manipulator operation, which [...] Read more.

Three-dimensional reconstruction and semantic understandings have attracted extensive attention in recent years. However, current reconstruction techniques mainly target large-scale scenes, such as an indoor environment or automatic self-driving cars. There are few studies on small-scale and high-precision scene reconstruction for manipulator operation, which plays an essential role in the decision-making and intelligent control system. In this paper, a group of images captured from an eye-in-hand vision system carried on a robotic manipulator are segmented by deep learning and geometric features and create a semantic 3D reconstruction using a map stitching method. The results demonstrate that the quality of segmented images and the precision of semantic 3D reconstruction are effectively improved by our method. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

9 pages, 2373 KiB

Open AccessArticle

3D Face Model Super-Resolution Based on Radial Curve Estimation

by Fan Zhang, Junli Zhao, Liang Wang and Fuqing Duan

Appl. Sci. 2020, 10(3), 1047; https://doi.org/10.3390/app10031047 - 05 Feb 2020

Cited by 2 | Viewed by 2102

Abstract

Consumer depth cameras bring about cheap and fast acquisition of 3D models. However, the precision and resolution of these consumer depth cameras cannot satisfy the requirements of some 3D face applications. In this paper, we present a super-resolution method for reconstructing a high [...] Read more.

Consumer depth cameras bring about cheap and fast acquisition of 3D models. However, the precision and resolution of these consumer depth cameras cannot satisfy the requirements of some 3D face applications. In this paper, we present a super-resolution method for reconstructing a high resolution 3D face model from a low resolution 3D face model acquired from a consumer depth camera. We used a group of radial curves to represent a 3D face. For a given low resolution 3D face model, we first extracted radial curves on it, and then estimated their corresponding high resolution ones by radial curve matching, for which Dynamic Time Warping (DTW) was used. Finally, a reference high resolution 3D face model was deformed to generate a high resolution face model by using the radial curves as the constraining feature. We evaluated our method both qualitatively and quantitatively, and the experimental results validated our method. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

13 pages, 13598 KiB

Open AccessArticle

Projection-Based Augmented Reality Assistance for Manual Electronic Component Assembly Processes

by Marco Ojer, Hugo Alvarez, Ismael Serrano, Fátima A. Saiz, Iñigo Barandiaran, Daniel Aguinaga, Leire Querejeta and David Alejandro

Appl. Sci. 2020, 10(3), 796; https://doi.org/10.3390/app10030796 - 22 Jan 2020

Cited by 28 | Viewed by 4719

Abstract

Personalized production is moving the progress of industrial automation forward, and demanding new tools for improving the decision-making of the operators. This paper presents a new, projection-based augmented reality system for assisting operators during electronic component assembly processes. The paper describes both the [...] Read more.

Personalized production is moving the progress of industrial automation forward, and demanding new tools for improving the decision-making of the operators. This paper presents a new, projection-based augmented reality system for assisting operators during electronic component assembly processes. The paper describes both the hardware and software solutions, and depicts the results obtained during a usability test with the new system. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

12 pages, 1867 KiB

Open AccessArticle

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

by Jing Wen and Yuanyao Lu

Appl. Sci. 2019, 9(24), 5432; https://doi.org/10.3390/app9245432 - 11 Dec 2019

Cited by 4 | Viewed by 4507

Abstract

Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has [...] Read more.

Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has a broad development prospect. It is particularly important in a noisy environment and within the hearing- impaired population and is obtained by means of visual information from a video to make up for the deficiency of voice information. This information is a visual language that benefits from Augmented Reality (AR). The purpose is to establish an efficient and convenient way of communication. However, the traditional lip reading recognition system has high requirements of running speed and performance of the equipment because of its long recognition process and large number of parameters, so it is difficult to meet the requirements of practical application. In this paper, the mobile end lip-reading recognition system based on Raspberry Pi is implemented for the first time, and the recognition application has reached the latest level of our research. Our mobile lip-reading recognition system can be divided into three stages: First, we extract key frames from our own independent database, and then use a multi-task cascade convolution network (MTCNN) to correct the face, so as to improve the accuracy of lip extraction. In the second stage, we use MobileNets to extract lip image features and long short-term memory (LSTM) to extract sequence information between key frames. Finally, we compare three lip reading models: (1) The fusion model of Bi-LSTM and AlexNet. (2) A fusion model with attention mechanism. (3) The LSTM and MobileNets hybrid network model proposed by us. The results show that our model has fewer parameters and lower complexity. The accuracy of the model in the test dataset is 86.5%. Therefore, our mobile lip reading system is simpler and smaller than other PC platforms and saves computing resources and memory space. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

16 pages, 2273 KiB

Open AccessArticle

Design, Application and Effectiveness of an Innovative Augmented Reality Teaching Proposal through 3P Model

by Alejandro López-García, Pedro Miralles-Martínez and Javier Maquilón

Appl. Sci. 2019, 9(24), 5426; https://doi.org/10.3390/app9245426 - 11 Dec 2019

Cited by 10 | Viewed by 4118

Abstract

Augmented reality (AR) has evolved hand in hand with advances in technology, and today is considered as an emerging technique in its own right. The aim of our study was to analyze students’ perceptions of how useful AR is in the school environment. [...] Read more.

Augmented reality (AR) has evolved hand in hand with advances in technology, and today is considered as an emerging technique in its own right. The aim of our study was to analyze students’ perceptions of how useful AR is in the school environment. A non-experimental quantitative design was used in the form of a questionnaire in which 106 primary sixth-grade students from six schools in the Region of Murcia (Spain) participated. During the study, a teaching proposal using AR related to the content of some curricular areas was put forward in the framework of the 3P learning model. The participants’ perceptions of this technique were analyzed according to each variable, both overall and by gender, via a questionnaire of our own making, which had previously been validated by AR experts, analyzing its psychometric qualities. The initial results indicate that this technique is, according to the students, useful for teaching the curriculum. The conclusion is that AR can increase students’ motivation and enthusiasm while enhancing teaching and learning at the same time. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Graphical abstract

12 pages, 4228 KiB

Open AccessArticle

Aroma Release of Olfactory Displays Based on Audio-Visual Content

by Safaa Alraddadi, Fahad Alqurashi, Georgios Tsaramirsis, Amany Al Luhaybi and Seyed M. Buhari

Appl. Sci. 2019, 9(22), 4866; https://doi.org/10.3390/app9224866 - 14 Nov 2019

Cited by 10 | Viewed by 2664

Abstract

Variant approaches used to release scents in most recent olfactory displays rely on time for decision making. The applicability of such an approach is questionable in scenarios like video games or virtual reality applications, where the specific content is dynamic in nature and [...] Read more.

Variant approaches used to release scents in most recent olfactory displays rely on time for decision making. The applicability of such an approach is questionable in scenarios like video games or virtual reality applications, where the specific content is dynamic in nature and thus not known in advance. All of these are required to enhance the experience and involvement of the user while watching or participating virtually in 4D cinemas or fun parks, associated with short films. Recently, associating the release of scents to the visual content of the scenario has been studied. This research enhances one such work by considering the auditory content along with the visual content. Minecraft, a computer game, was used to collect the necessary dataset with 1200 audio segments. The Inception v3 model was used to classified the sound and image dataset. Further ground truth classification on this dataset resulted in four classes: grass, fire, thunder, and zombie. Higher accuracies of 91% and 94% were achieved using the transfer learning approach for the sound and image models, respectively. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

14 pages, 14546 KiB

Open AccessArticle

Construction Hazard Investigation Leveraging Object Anatomization on an Augmented Photoreality Platform

by Hai Chien Pham, Nhu-Ngoc Dao, Sungrae Cho, Phong Thanh Nguyen and Anh-Tuan Pham-Hang

Appl. Sci. 2019, 9(21), 4477; https://doi.org/10.3390/app9214477 - 23 Oct 2019

Cited by 27 | Viewed by 3522

Abstract

Hazard investigation education plays a crucial role in equipping students with adequate knowledge and skills to avoid or eliminate construction hazards at workplaces. With the emergence of various visualization technologies, virtual photoreality as well as 3D virtual reality have been adopted and proved [...] Read more.

Hazard investigation education plays a crucial role in equipping students with adequate knowledge and skills to avoid or eliminate construction hazards at workplaces. With the emergence of various visualization technologies, virtual photoreality as well as 3D virtual reality have been adopted and proved advantageous to various educational disciplines. Despite the significant benefits of providing an engaging and immersive learning environment to promote construction education, recent research has also pointed out that virtual photoreality lacks a 3D object anatomization tools to support learning, while 3D-virtual reality cannot provide a real-world environment. In recent years, research efforts have studied virtual reality applications separately, and there is a lack of research integrating these technologies to overcome limitations and maximize advantages for enhancing learning outcomes. In this regard, the paper develops a construction hazard investigation system leveraging object anatomization on an Interactive Augmented Photoreality platform (iAPR). The proposed iAPR system integrates virtual photoreality with 3D-virtual reality. The iAPR consists of three key learning modules, namely Hazard Understanding Module (HUM), Hazard Recognition Module (HRM), and Safety Performance Module (SPM), which adopt the revised Bloom’s taxonomy theory. A prototype is developed and evaluated objectively through interactive system trials with educators, construction professionals, and learners. The findings demonstrate that the iAPR platform has significant pedagogic methods to improve learner’s construction hazard investigation knowledge and skills, which improve safety performance. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

21 pages, 10609 KiB

Open AccessArticle

Superpixel-Based Feature Tracking for Structure from Motion

by Mingwei Cao, Wei Jia, Zhihan Lv, Liping Zheng and Xiaoping Liu

Appl. Sci. 2019, 9(15), 2961; https://doi.org/10.3390/app9152961 - 24 Jul 2019

Cited by 2 | Viewed by 2768

Abstract

Feature tracking in image collections significantly affects the efficiency and accuracy of Structure from Motion (SFM). Insufficient correspondences may result in disconnected structures and incomplete components, while the redundant correspondences containing incorrect ones may yield to folded and superimposed structures. In this paper, [...] Read more.

Feature tracking in image collections significantly affects the efficiency and accuracy of Structure from Motion (SFM). Insufficient correspondences may result in disconnected structures and incomplete components, while the redundant correspondences containing incorrect ones may yield to folded and superimposed structures. In this paper, we present a Superpixel-based feature tracking method for structure from motion. In the proposed method, we first propose to use a joint approach to detect local keypoints and compute descriptors. Second, the superpixel-based approach is used to generate labels for the input image. Third, we combine the Speed Up Robust Feature and binary test in the generated label regions to produce a set of combined descriptors for the detected keypoints. Fourth, the locality-sensitive hash (LSH)-based k nearest neighboring matching (KNN) is utilized to produce feature correspondences, and then the ratio test approach is used to remove outliers from the previous matching collection. Finally, we conduct comprehensive experiments on several challenging benchmarking datasets including highly ambiguous and duplicated scenes. Experimental results show that the proposed method gets better performances with respect to the state of the art methods. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

Review

Jump to: Editorial, Research

17 pages, 2416 KiB

Open AccessReview

Analysis of the Productive, Structural, and Dynamic Development of Augmented Reality in Higher Education Research on the Web of Science

by Jesús López Belmonte, Antonio-José Moreno-Guerrero, Juan Antonio López Núñez and Santiago Pozo Sánchez

Appl. Sci. 2019, 9(24), 5306; https://doi.org/10.3390/app9245306 - 05 Dec 2019

Cited by 47 | Viewed by 3995

Abstract

Augmented reality is an emerging technology that has gained great relevance thanks to the benefits of its use in learning spaces. The present study focuses on determining the performance and scientific production of augmented reality in higher education (ARHE). A bibliometric methodology for [...] Read more.

Augmented reality is an emerging technology that has gained great relevance thanks to the benefits of its use in learning spaces. The present study focuses on determining the performance and scientific production of augmented reality in higher education (ARHE). A bibliometric methodology for scientific mapping has been used, based on processes of estimation, quantification, analytical tracking, and evaluation of scientific research, taking as its reference the analysis protocols included in the Preferred Reporting Items for Systematic reviews and Meta-analyses for Protocols (PRISMA-P) matrix. A total of 552 scientific publications on the Web of Science (WoS) have been analyzed. Our results show that scientific productions on ARHE are not abundant, tracing its beginnings to the year 1997, with its most productive period beginning in 2015. The most abundant studies are communications and articles (generally in English), with a wide thematic variety in which the bibliometric indicators “virtual environments” and “higher education” stand out. The main sources of origin are International Technology, Education and Development Conference (INTED) Proceedings and Education and New Learning Technologies (EDULEARN) Proceedings, although Spanish institutions are the most prolific. In conclusion, studies related to ARHE in the WoS have become increasingly abundant since ARHE’s research inception in 1997 (and especially since 2009), dealing with a wide thematic variety focused on “virtual environments” and “higher education”; abundant manuscripts are written in English (communications and articles) and originate from Spanish institutions. The main limitation of the study is that the results only reveal the status of this issue in the WoS database. Full article

(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Augmented Reality, Virtual Reality & Semantic 3D Reconstruction

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (20 papers)

Editorial

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI