1. Introduction
Recently, there has been a progressive shift in virtual space, and technologies such as Mixed Reality (MR) and Extended Reality (XR) have promised cyber users the next level of visual interaction. These technologies allow a close interaction between the virtual and the physical space and have opened up innovation and industrial opportunities. This shift is possible due to technical advancements in computer vision, Three-Dimensional (3D) rendering engines, and access to Graphical Processing Units (GPUs). XR is increasingly deployed in mainstream applications such as gaming, infotainment services, healthcare, and 3D avatar designs [
1]. With technical advancements in deep learning, responsive analytics, and security integration over open communication channels, the physical and virtual world’s barriers are fast diminishing. This integration has uplifted the Metaverse, a term that was coined to represent the virtual world, or space, where humans, applications, processes, and services interact and affect one another. The term
Meta is a Greek word meaning
beyond, and
verse means
universe. The
“Metaverse” was initially coined in 1992 by Neal Stephenson in his novel,
Snow Crash.
The Metaverse is closely related to cyberspace and defines user interaction between real and digital worlds. The Metaverse is defined as a holistic space in the virtual world, where multiple worlds can also intermix, and the concept is often called the Omniverse [
2]. The Metaverse’s initial prominence rose via
Fortnite, which is accessible via VR headsets and allows users to create, buy, and sell objects. However, the term
Meta attracted people’s attention in October 2021 when Facebook rebranded itself as
Meta [
3]. In the same period, immediately after this rebranding was announced, another technical giant, Microsoft, also announced its vision and entry into the Metaverse market. Their vision is to build an environment where people not only interact socially with others, but can create a working space and use it. They called this
Mesh, where people come to their virtual offices and interact and discuss their meeting agendas in a shared space. The usage of Mesh for Microsoft Teams has increased significantly owing to the global pandemic. Productivity experts’ analysis has revealed that the virtual meeting space allows a high level of interaction among remote workers and is a positive tool for building relationships. The engagement levels have increased significantly, which has improved the working dynamics. Seeing the positive aspects, Microsoft shifted towards a merger with Activision Blizzard Inc. [
4], and the vision is to accelerate the development of Metaverse gaming platforms in a wide array of mobile platforms and personal computers. The merger was estimated to be valued at USD 68.7 billion. These developments have propelled the Metaverse and social communities to be omnipresent and from the comfort of home. With the Metaverse being developed, the Internet will create communities and facilitate experiences that exist alongside physical reality [
5]. Another joint venture by Microsoft, Meta, and Accenture is in deployment. Meta Quest Pro has been set up to bring real immersive experiences during Mesh meetings or socializing with other persons (via avatars or real forms) [
6]. The is venture working toward developing high-resolution sensor-based LCD designs that capture eye-tracking movements and facial expressions to make the avatars more realistic and lively. Currently, the deployment is in VR and supports various applications such as Microsoft Teams, Office 365, Microsoft Intune, and Azure active directory setups.
The Metaverse will have a device-agnostic and cross-platform experience [
7], meaning that it will be accessible regardless of which devices are used and will support open, decentralized applications. In reality, the Metaverse might come across as a fantasy world at first instance. The idea of interacting in a multi-dimensional environment has encouraged the development of new technologies to synergize and augment its functionality. Technically, three key components will build the Metaverse, namely the hardware resources (GPU and I/O), base communication networks such as Sixth-Generation (6G) networks, and artificial-intelligence (AI)-enabled models [
8,
9]. It is operational on the underlying Web 3.0 technology, which is intelligent and fully decentralized, supports transactional payments via Blockchains (BC), and is supported via Non-Fungible-Token (NFT)-enabled cryptocurrencies. Web 3.0 allows the semantic interpretation of context and allows users to interact with machines more intelligently. The Web 3.0 front will support tailored Smart Contracts (SCs) or chain codes to automate transactions on Decentralized Applications (DApps). It will support custom consensus protocols that drive business logic across industry verticals.
In the industrial front, companies such as Meta (previously Facebook) are working on devices such as the
Oculus Quest to bring the immersive vision a step closer. Games such as
Beat Saber and
GTA San Andreas have already shifted to the Metaverse, and these games allow players to interact immersively and also allow objects to be traded online [
10]. The scope currently is more limited to gaming, but will expand soon. The fitness sector will also transform with the advent of the Metaverse, and virtual outdoor activities such as body exercises, gym training, and 3D meetings will be conducted [
11]. It also has the potential to revolutionize the education sector as students can study in virtual classrooms that will feel like a physical classroom with co-students [
12]. It will also support industry applications such as digital twins to model real-time industry processes [
13]. In healthcare [
14], the Metaverse will support telesurgery and 3D anatomy, bringing new insights and significant developments to healthcare.
Table 1 presents the list of abbreviations and their descriptions.
In real-world setups, tech companies are integrating many avant-garde technologies to realize the vision of the Future Internet. Some of these technologies include AR and VR, 3D virtual space, BC, holographic projection (WIMI six-dimensional light field technology), 6G-enabled holographic communication, AI models (with explainable modules for interpretability), massive content-centric Internet-of-Things (mC-IoT), and Web 3.0. Spark AR, a studio tool, was launched by Meta in 2017 to enable users to create their AR effects [
15]. AI aids the Metaverse experience by providing features such as face tracking, object detection, segmentation, sentiment recognition, voice control, and others. Low latency and higher bandwidth are also critical for smooth interactions in the Metaverse. Service innovation in 6G technologies would be crucial to provide real-time Quality-of-Experience (QoE) to the users [
16]. The economy of the Metaverse is based on trading and ownership. Cryptocurrencies are for trading, while the ownership transfer is managed via NFTs [
17]. Thus, it will be the Future Internet, where all sorts of physical activities will be shifted to a digital avatar. This will allow the world to become a global digital community and bridge the gap in trading, socializing, and interacting without boundaries, increasing the connectivity of a user at the remotest end.
Table 2 presents projects initiated in key domains to assist the Metaverse by different industry giants. The key focus is to drive applications toward digital platforms that assist users intelligently in diving into a seamless space and trade effectively on decentralized applications.
In the Metaverse, some popular use cases on real estate management, e-skills training, and online shopping experience through AR and VR are already operational. For example, in Decentraland [
18], a virtual ecosystem is presented to persons who wish to buy and sell land properties. Virtual tours are conducted, and the transactional aspects are handled through NFT assets. In the education sector [
34], technologies such as XR and Metaverse projection will greatly improve e-skills’ training and development, as the gap between theory and visualization in practical aspects will be reduced. Furthermore, the decentralization of the Metaverse using Web 3.0 will allow customers to buy goods online from another country and have them shipped to their digital home [
35]. Hence, the Metaverse will enable people to trade irrespective of their geographical location, making the world a huge global community. The Metaverse is potentially a disruption that will revolutionize different industrial and application domains.
1.1. Market Trends and Research Activities
Recent innovations in AR and VR have boosted the global economy. As per a report by
Crunchbase News [
36], USD 1.9 billion of venture capital are being invested in software and hardware to support the Metaverse. It can potentially boost the global economy by
trillion jobs. Owing to the global pandemic, the Metaverse’s progress has been increased four-fold. The NFT purchases rose to USD
billion during the novel Coronavirus Disease-2019 (COVID-19) global second peak around May–August 2021. As per the report by
Influencer Marketing Hub, the revenue generated by companies that have adopted the Metaverse as a key technology is expected to reach a staggering figure of USD 800 billion by 2024 [
37], which resonates with the increase in the AR and VR markets’ revenue, which is predicted to ≈USD 300 billion by the same year.
At the research front, industry–academia collaborative grants are offered.
Reality Labs has granted 17 live project grants amounting to CAD 510,000 to design effective hardware and GPU resources to support the Metaverse [
38]. Meta has also announced an investment of USD 2.5 million for independent academic research in Europe for the Metaverse. Comprehensively, the global Metaverse market is expected to reach USD 1607.12 billion by 2030, at a Compounded Annual Growth Rate (CAGR) of
%. Object recognition, tacking, and AI-assisted AR and VR projects have started to support motion assistance. Explainable AI models are built in such scenarios to ensure fairness and interpretability. In smart manufacturing, project
XMANAI has demonstrated four real-life manufacturing cases, where explainable models are built for collecting the data of human-in-the-loop over AI models [
39]. Likewise,
Tiktok purchased a Chinese VR headset manufacturer named
Pico [
40]. On gaming platforms, games such
Robolox and
Fortnite are designed with trading and immersive interactions with peer gamers [
41].
Walmart is expected to launch a smart VR-assisted online shopping platform for try-on for buyers. Microsoft HoloLens [
42] enables medical professionals to collaborate on a surgical operation using hand gestures and voice commands. NVIDIA recently launched the
Omniverse Avatar [
43], a platform to create real-time self-driven AI avatars that can see, talk, and converse about various topics.
Figure 1 presents the article’s overall organization and reading map.
1.2. Survey Contributions
The survey contributions of the article are summarized as follows:
A generic Metaverse architecture is proposed with a functional perspective to support the Future Internet use-cases in industrial applications such as healthcare, finance, infotainment and gaming, and vehicular networks. The architecture presents Web 3.0 as an engineering frontend and communicates with applications through 6G-assisted virtual service endpoints.
The open challenges and future directions of the Metaverse in terms of cryptocurrency, communication, and AI aspects are presented.
The effectiveness of the proposed scheme is presented through a proposed case study on Metaverse-assisted real estate management, and we compared our scheme with the traditional SC-based Buyer–Broker–Seller (BBS) architecture in terms of parameters such as realization cost, trust probability, and the number of transactions (to address the scalability of diverse connections) with traditional approaches.
1.3. Article Structure
The article is organized as follows.
Section 2 presents the survey’s review process.
Section 3 presents the existing surveys on the Metaverse and highlights the necessity of the proposed survey, where existing gaps are presented.
Section 4 presents the background of key assistive Metaverse technologies.
Section 5 presents the generic Metaverse architecture from a generic functional viewpoint.
Section 6 presents the integration of the Metaverse with different industrial applications.
Section 7 discusses the open issues and future directions.
Section 8 discusses the proposed case study, with an experimental evaluation with an SC-enabled BBS architecture.
Section 9 presents the discussion of the Metaverse’s realization and lessons learned in the Metaverse’s deployment in real-world scenarios. Finally,
Section 10 presents the concluding remarks and the future scope of work.
2. The Review Method
This section discusses the review selection process and presents the research questions to propose and outline the survey. The details are presented as follows.
2.1. The Review and Article Selection Process
Figure 2 presents the article selection and the inclusion–exclusion process. The selection process followed the guidelines by Brereton et al. [
44], who outlined the pre-process to be followed for conducting a survey.
We started by selecting articles explicitly related to the Metaverse, and we found that the related literature is specific to 3 years, i.e., 2021 and 2022. Before 2021, articles were not selected, as they did not directly link to the Metaverse and focused on essential discussions of NFTs, AR and VR realization, and the basics of web applications. This shows an absence of articles focused on the Metaverse in the literature database. Initially, we used search strings such as “the Metaverse”, “AR and VR in Industry”, “Web 3.0, Modern Internet”, “Metaverse, Metaverse and industry”, “Web 3.0 and NFTs”, “Blockchain and Metaverse”, “Metaverse and networks”, “Metaverse and 5G”, and “Metaverse and 6G” to search the titles, abstracts, and article bodies. Some useful links to projects, key conferences in computer vision, and invited talks by leading experts were found. We referred to the databases IEEE Xplore, ACM Library, Springer, and ScienceDirect to find the relevant papers. In the first attempt, we found 233 articles that fit our scope, but we rejected 36 articles due to misleading titles. Next, we excluded 32 more articles based on the abstracts’ relevance and the conclusions to support our requirements. Thus, our search narrowed this to 165 articles, excluding 28 articles based on the full text. This left us with only relevant articles to support the survey. However, some articles still had common studies with web links/portals, which led us to further exclude more 26 articles. Finally, we present the 109 articles/web links/talks as part of the survey’s reference.
2.2. The Research Questions
Before starting the survey, the authors brainstormed and presented the research questions to highlight the survey objectives.
Table 3 presents the identified research questions and the objectives they were expected to meet through the proposed survey. The research questions are presented keeping in mind the changing technical landscape of Web 3.0 and its potential to support the Metaverse. Furthermore, technical advancements in communication technologies such as 5G and Beyond networks to support the smooth functioning of the Metaverse avatars were considered. Finally, we focused on a holistic core for the Metaverse to support industrial applications with a generic reference architecture.
3. Related Work
This section provides the surveys related to the Metaverse and its assistive technologies.
Table 4 presents a comparative study of existing works with the proposed survey. The section is designed to achieve the objectives of RQ 1. In AR and VR and the networking domain, Bhattacharya et al. [
1] proposed a survey that integrated 6G and BC to support AR and VR environments. The work presented a solution taxonomy concerning AR and VR from a communication and security point of view in different industry verticals. The survey did not highlight the resource support and AI role in the AR and VR applications. Lee et al. [
45] presented a comprehensive review of the Metaverse’s development in user-centric designs. The latest technologies to build the Metaverse and the potential pitfalls were presented. In the industrial domain, Far et al. [
46] presented an applicative digital twin for the Metaverse and proposed a three-tier architecture that links IoT objects and the Metaverse. Finally, a set of challenges, solutions, and future work was discussed for integrating digital twins with the Metaverse environment. Xu et al. [
47] highlighted the architecture, state-of-the-art implementations, solution taxonomy, and frameworks currently utilized in the Metaverse environment. The work also focused on using BC-based edge networks for cloud–edge and computation-efficient AI techniques in resource-constrained networks. Yang et al. [
48] presented the economic assets in the Metaverse implementation and discussed the fusion of BC and AI to store distributed AI models of the Metaverse, with potential directions for security and data collection. Luca Turchet [
49] proposed a survey on the potential of a musical Metaverse, which sets up exciting opportunities in the Internet-of-Musical-Things eco-space. An integrative XR-related platform was suggested, and for music ownership transfer, the NFT assets of artists were considered. Special attention to digital rights was also given, and recommendation models were presented for the music choice of the other user. However, privacy and ethical concerns were not discussed. The authors in [
50] presented an opinion-mining scheme for NFT transfers in the education industry and suggested thinking, making, sharing, and improving a model that enhances Metaverse-assisted education. Zhang et al. [
51] presented a healthcare framework in the Metaverse, where the healthcare data are stored on public servers and are analyzed for predictions. To respect the privacy of healthcare records, attribute-based encryption was presented with minimal overheads to support a large number of concurrent client requests. A simulation was performed on the PBC library for scalability and latency, and the model showed better performance with a good level of security.
In the tourism sector, Dongying Wei [
55] presented a novel architecture,
Gemiverse, which integrates BC-enabled professional tourism certification and a travel platform. The developed platform includes online travel, gameplay, learn-and-earn, and long-term growth in business through the support of mobile applications (via iOS and Android) with real-time UX support. Chu et al. [
58] proposed
Metaslicing, a multi-tier semi-Markov framework for efficient resource allocation in the Metaverse. The scheme implements two techniques and an algorithm for optimal policy decisions to meet the Metaverse’s dynamics and uncertainty of resources. The results showed a long-term revenue of up to 80% for the Metaverse providers. The authors in [
56] proposed a federated learning and BC-based framework for the industrial Metaverse. To enhance privacy, they presented an Age-of-Information (AoI) metric, which defines the age of shared contracts between multiple industrial nodes. Alpala et al. [
54] presented a VR-based Metaverse framework for Industry 4.0. The work also discussed a case study for VR in the Metaverse concerning major applications in Industry 4.0. In the Metaverse, high-quality videos are presented, but captioning is a crucial challenge. In the same regard, Yan et al. [
62] presented a framework named Global–Local Representation Granularity (GL-RG), which generates a rich set of video captioning from high-quality video frames. The model exploits the design of a global–local encoder, which has a large vocabulary and is ably supported through temporal representation, which generates accurate video captioning. Long-range frames are considered to determine the spatiotemporal relations, and short-range sequences are used for user motion captures; thus, fine granular details are preserved. An incremental training strategy was used, where a seeding and a cross-entropy phase are present, and it eliminates the limitations of annotation marking through a discriminative reinforcement learning model. The approach was evaluated for the MSR-VTT and MSVD datasets, and the model had significantly high accuracy and precision.
In mobile communication networks, Khan et al. [
63] presented the importance of the Metaverse towards improvements of wireless network technology and mobile services. The work devised a two-tier architecture and highlighted the requirements and components required to address Metaverse-enabled wireless networks. The work also presented a case example for healthcare and a live support system in an Industry 4.0 context. Finally, the authors discussed the open challenges and future directions of the Metaverse-enabled wireless systems to analyze and construct an end-to-end system to unlock their full potential. Braud et al. [
64] emphasized the utilization of large-scale and persistent AR at schools or universities for a Metaverse environment. The work presented a layered architecture for a single- and multi-user environment to provide real-time content sharing and creation.
Survey Necessities and Gaps
Recent Metaverse surveys have been focused mostly on the AI control, AR and VR rendering, and crypto-markets of the Metaverse. In the manufacturing sector, the surveys and case studies propose digital twins in the Metaverse landscape. Solution taxonomies, architectures, and frameworks have been presented in these surveys. The recent surveys indicate a limited exploration of the utilization of the Metaverse in different industrial verticals. As the Metaverse’s realization is in its early stages, there is a necessity for a survey that studies the Metaverse as a potential candidate to build the Future Internet and presents a generic and functional view of the Metaverse in terms of finance, applications, and its interaction with web platforms.
Thus, this survey fills in these existing gaps and presents the critical aspects (the technology, key drivers, and functional architecture) to shape the Future Internet. A functional perspective of the Metaverse is presented in industrial applications, with challenges in actual realization. The proposed survey also highlights the future of the Metaverse and the emerging technologies required to bridge the gap between theory and potential realization. We also highlight a case study for real estate management in the Metaverse and compare the Metaverse real estate tour with the traditional physical tour. Thus, the article builds and connects the dots of the Metaverse components and demonstrates its practicality in interacting with the existing technologies. We believe the Future Internet will integrate the Metaverse with key technologies and fuel a revolution in user experience ecosystems.
4. Nuts and Bolts of the Future Internet: The Metaverse’s Key Drivers
The concept of the Metaverse has undergone several technological advancements and iterations; at each iteration, it has evolved and provided an immersive experience closer to reality with the help of fast Internet and 3D rendering technology. There are many technologies involved in the Metaverse. Based on the competitiveness in producing novel solutions, many organizations have kick-started development towards the Metaverse. Thus, the technologies that build up the Metaverse are also gaining momentum, such as wearable devices, XR, NFTs, and BC. With the advancements in wireless communication infrastructures, affordable AR, VR, and edge computing have matured and are supported by the low-latency and high-bandwidth of 5G and Beyond networks [
65]. The section aims to address the objectives of RQ 2 through a rational discussion on the Metaverse’s components.
4.1. The Timeline of the Evolution of Metaverse Assistive Technologies
The subsection highlights the evolution of key technologies that are building up the Metaverse. The timeline addresses the objectives of RQ 3 by unfolding the Metaverse’s progression.
Figure 3 presents the timeline of the key Metaverse technologies.
In the early 1980s, the term VR was coined by Jaron Lanier. Through collaboration with Thomas Zimmerman, he invented the data glove. In 1990, Time-Berners Lee invented the World Wide Web. At the same time, AR was coined by Tom Caudell, followed by Paul Milgram and Fumio Kishno. They proposed the reality–virtuality continuum [
66], which presents a scale between the physical and virtual worlds. The real reality is at one end of the scale, and the other is complete virtual reality. Thus, the continuum presents all objects’ possible projections and compositions in real and virtual spaces. On the networking front, the Third-Generation Partnership Project launched the Third-Generation wireless IP handsets by NTT DoCoMo in 2001. In 2009, two breakthrough developments occurred, namely the shift of wireless networking towards Fourth-Generation (4G) Long-Term Evolution (LTE) networks, which brought visible differences in browsing speed over the High-Speed Packet Access networks (HSPA+), which allowed speeds up to 42 Mbps in practice [
67]. Furthermore, in 2009, Satoshi Nakamoto invented Bitcoin, which laid the foundation for the BC cryptocurrency (Blockchain 1.0) [
68]. In 2015, we encountered Blockchain 2.0, which shifted from cryptocurrency-based primitives towards Smart Contracts (SCs), which paved the way for automated agreements via digital contracts between autonomous parties, where funds are transferred between user wallets. In 2014, Kevin McCoy and Anil Dash developed NFTs as financial assets. This allowed financial security for distributed transactions and digital assets stored on the BC, as the ownership of NFTs is recorded on the BC ledger. NFTs are typically digitally signed references and are uniquely identifiable. Cryptocurrency is a digital currency form and has economic value (fungible and mutually interchangeable), whereas NFTs are not mutually interchangeable.
In 2016, with the evolution of Decentralized Applications (DApps), we shifted towards Blockchain 3.0, which refines its predecessor’s data storing and structuring capabilities to leverage decentralized transactions on the BC with effective consensus protocols. This marked the adoption of BC as a mainstream technology in different industrial verticals, including the IoT, healthcare, edge networks, supply chain logistics, and others. In 2019, the deployment of Fifth-Generation (5G) networks changed the operational landscape of 4G-LTE networks. 5G is designed over a new radio interface and uses higher frequencies (28GHZ) than 4G-LTE 2500 MHz. It supports a millimeter wave (mmWave) spectrum and has high coverage over 4G. In antenna design, it supports massive Multiple-Input, Multiple-Output (mMIMO), which combines small cellular antennas for improved throughput, which support ultra-high-quality video streaming, and control [
69].
The developments in web technology also accelerated, which led to Web 2.0, which differed from the traditional WWW (or Web 1.0) in terms of dynamic content, sharing, and user interaction. The web consists of structured and unstructured data, and several natural language models are used to extract the structured data and sequence them. However, extracting meaningful information from the unstructured web is challenging. Researchers are working on a web page transformer model to extract the web resources. In a similar direction, Wang et al. [
70] proposed a framework, named
WebFormer, which presents a web page transformer model, where the document’s structural information is extracted for semantic interpretation. The authors encoded the HTML, content field, and text information for language uniformity into a unified transformer. Initially, HTML tokens were introduced, and attention-rich patterns were captured. Based on the web layout and language structure, the model showed promising results over long textual sequences via a rich attention mechanism. Currently (and quickly), we are progressing towards the design of Web 3.0, which will be content-interpretable, dynamic, and semantic. Furthermore, the emerging networking (5G and Beyond) and 6G will involve AI-based radio, with increased cognitive and computational capabilities over 5G. This will support real-time XR interaction in Web 3.0, rendering the Metaverse’s functioning flawless. Peak data rates of 1–10 Tbps, with a 0.1 ms latency, are envisioned in the 6G radio design. In the BC generation, we will witness Blockchain 4.0, called web-scalable BC. We will mostly focus on addressing the issues of scalability, power consumption, user experience, and latency through low-powered and customizable consensus protocols. This will be an effective fit for massive IoT networks and work closely with Web 3.0. However, the potential realization is still in the early stages.
4.2. NFTs, BCs, and Cryptocurrency
BC is a shared, distributed, and immutable digital ledger that enables the secure transfer of assets and digital currencies between distributed and autonomous stakeholders. BC technology allows decentralized solutions for creating digital evidence of ownership, trading, and interoperability. Cryptocurrency and NFTs are governed by BC technology. However, they differ from each other in several aspects. Cryptocurrency is a fungible digital currency that is nearly impossible to counterfeit and double-spend. Cryptocurrencies enable people in the Metaverse to buy or sell digital assets such as virtual land, offices, home, NFTs, and other services. Hence, cryptocurrency connects the physical world to the virtual world commercially. On the other hand, NFTs are non-fungible, being unique digital assets, meaning that each NFT is unique from another one. They are non-tradable and act as proof of confirmation of ownership for any digital asset, such as digital art, signatures, in-game avatars, and other object forms such as images, audio, and videos. NFTs are characterized by the fact that they are one-of-a-kind, indivisible, and unchangeable. Cryptocurrency and NFTs both serve as key concepts in the Metaverse. While the former plays a crucial role in digital trading assets, the latter is essential in linking the true ownership of those assets. Therefore, cryptocurrency and NFTs are the Metaverse’s cornerstones in the commercial sector. In terms of value proposition, blockchain offers a comprehensive framework to support Metaverse-assisted digital platforms and content creation. Furthermore, the decentralized nature of blockchain allows easy management of data migration, which forms an good fit for fintech-related applications [
71]. In terms of supply chain management problems, researchers have applied Newsvendor models, which support the operational and inventory management of transactional goods, where the demand of asset transfers over the Metaverse will be uncertain [
72]. This will allow seamless control over financial payment tracking and add economic value to the supply chain.
4.3. Immersive Experience: Extended Reality
XR is a term that encompasses AR, VR, MR, and the virtual and real-world surroundings created by computers. AR and VR are currently the gateways to the Metaverse. VR uses computer-generated visuals and images to create an entirely virtual 3D environment. VR headsets and various sensors can be used to interact with the virtual space. In the Metaverse, VR is used to build an entire ecosystem (multiverse), where users can interact with each other and objects in the space. However, one feature that VR lacks is the capacity to view customized objects in the actual world. This can be accomplished through the use of AR. AR is a technology that casts computer-generated graphical content into the real world, which ostensibly interacts with the surrounding elements. AR can be directly accessed from most mobile phones or digital devices with built-in cameras. As a commercial example, the game Pokémon GO is a well-known example of AR. However, the drawback with AR is that it does not recognize real-world objects in the environment. Hence, communication between real objects and augmented objects is not possible.
XR [
73] straddles the boundaries between the physical and virtual worlds. MR attempts to create an environment that feels as genuine to the user as possible, resulting in an enhanced world. Here, computer-generated objects are visibly obscured by actual objects present in the real world. The Metaverse is built on XR to provide users with an immersive experience.
4.4. Communication Networks
In the Metaverse, millions of people will interact through digital avatars and share their experiences. Thus, the communication backbone has to be resilient enough to support the interaction smoothly. Thus, the Metaverse requires a high bandwidth, low latency, and the most-efficient and fault-tolerant systems. The most-fundamental solution for bringing these systems online is to upgrade the communication infrastructure through innovation in wireless channels. The 5G and emerging 6G networks are a possible fit to support real-time communication in the Metaverse. 5G and 6G support services such as ultra-Reliable Low-Latency Communications (uRLLC), enhanced Mobile Broadband (eMBB), and massive Machine-Type Communications (mMTC), which can deal with the communication requirements of the Metaverse [
74]. In 6G, the service sets are supported through flexible network-in-a-box configurations, which enhance the performance capabilities [
8]. 6G eMBB supports AI-assisted channel-state information to offer average download speeds of 100 Gbps, which is 10-times that of its 5G counterpart. A low latency of <1 ms is proposed under Enhanced uRLLC (ERLLC), with 99.9999999% reliability. However, the practical, deployable configurations still need to be standardized and are in the proposal stages. In the Metaverse, the stored data need to be accessed in near-real time; thus, it requires sub-ms latency channels, which makes 6G networks a potential choice. The Metaverse’s resources will be managed in a decentralized manner, and edge networks will support effective scheduling and cache [
75]. This eliminates the burden on cloud servers, which resonates with the vision of the real response of avatar movements in the Metaverse.
4.5. Role of AI in the Metaverse
AI will play a prominent role in the Metaverse to improve the immersive XR experience and personalize the user environment. Creating digital avatars in the Metaverse requires high-end configuration and storage resources, and developers require access to high-end machines, which can support the rendering and the environment learning and meta-cognition process. For instance, NVIDIA’s GauGAN2 [
76] creates images from input text and sketches. As the Metaverse involves high-end computer vision models, fast and accurate models for video segmentation and instance identification are crucial. An effective model was proposed by Liu et al. [
77] and is named the Spatial Granularity Network (SGN) for single-stage video instance segmentation (SG-Net), and the model is a two-staged masked regional convolutional neural network used over a compact detection and segmentation architecture. Every task in SG-Net is classified into the detection, segmentation, and tracking phases, and an optimal solution is achieved for accuracy and scalability via a joint optimization formulation. Mask prediction is made on sub-regions of the videos, and the model ensures that fine details are captured. The framework ensures high scalability as it avoids the overheads caused by the proposed regions of interest via correct prediction on the tracking movements, and the overall runtime complexity is minimized. The authors performed an evaluation based on the YouTube-VIS dataset, and the one-stage architecture performed optimally on both the accuracy and speed metrics over baseline image segmentation models. AI will enable features such as intelligent networking, immersive digital worlds, inclusive user interfaces, accurate avatar creation, multilingual accessibility, and many others. The Metaverse concept is user-centric, and the higher the accuracy of the avatar, the better the experience. Multilingual accessibility and automatic translation are essential features that will allow users worldwide to access the Metaverse. In addition, Machine Learning (ML) algorithms are implemented to create personalized content and attract more users. User experience is enhanced through innovation in scene generation through VR headsets. The Metaverse will also augment brain–computer interface mapping, where brain signals will be sent to the avatars to provide actions in virtual scenarios. As a real use-case, Neuralink has conducted experiments where a monkey played a Ping-Pong game through brain signals [
78]. Human trials are scheduled for early 2023. However, to improve the accuracy of brain mapping, Explainable AI (XAI) will be critical in providing interpretations to the trained models and improving the AI black box operations [
79]. The Metaverse requires XAI as a core component, from gesture controls to scenery visualization.
4.6. Decentralizing the Metaverse: Web 3.0
The most-prominent aspect of Metaverse marketing is its interoperability. The digital assets a person owns in one Metaverse application can be utilized in other applications. This, however, is not conceivable in Web 2.0, where the users never own the assets. For example, in the popular game Fortnite, users have the illusion of owning skins, cards, coins, and other items held by the corporation that makes the game. The problem is more visible for social media companies. Facebook, Instagram, YouTube, and other social media platforms own the content the users post on the platform. The data are mostly stored on a centralized web server. The drawbacks of Web 2.0 are addressed via Web 3.0. The issue of the ownership of assets can be solved by using NFTs. However, this is only the beginning of Web 3.0’s numerous advantages. Data can be stored in a distributed system using Interplanetary File System (IPFS) storage facilities. In addition, Web 3.0 allows companies to own the Metaverse stakes via Decentralized Autonomous Organizations (DAOs). A DAO is a Limited Liability Company (LLC), but has no single commanding authority for decision-making. Thus, Web 3.0 will lay the foundation for the Metaverse and Future Internet commercialization.
5. The Metaverse Architecture: A Functional Perspective
This section proposes a generic reference architecture of the Metaverse to support diverse industrial applications. The section addresses the objectives of RQ 4 as it functionally demarcates the Metaverse components closely connected to the different verticals in a generic sense. We first discuss the basics of avatar generation and modeling in the Metaverse and scenario generation, and then, the generic reference architecture is presented. The details are presented as follows.
5.1. Avatar Generation and Scenario Depiction
This subsection discusses avatar creation and modeling and then virtual scenario generation and enhancement. The details are presented as follows.
5.1.1. Avatar Creation and Modeling
In the Metaverse, avatars form the digital representation of users, where different styles of the user can be given. Furthermore, imaginary shapes, objects, and creatures can be added to the virtual space. Two important mechanisms for avatar generation are avatar creation and avatar modeling. Generative Adversarial Network (GAN) models are considered a good choice for avatar creation. The generator network is trained on fake images, and the discriminator network recognizes the real and fake images. Apart from GANs, mesh designs are most prominent in creating a 3D model of an avatar. Chalas et al. presented a 3D design model for avatar generation, which scans the facial expressions and renders them to a 2D pipeline [
80]. In the Metaverse, the avatar generator is most prominent in the gaming Metaverse, such as Drivatars, a racing game where online users interact while driving cars through their created avatars. The data are collected and trained through Deep Neural Networks (DNNs) for realistic scenery [
81]. Other approaches involve reinforcement learning to train networks based on the action–penalty critic.
For visual localization and mapping, human activity recognition in the physical world is crucial. The spatial features are captured to build the 3D reconstruction, and the localization of the user avatar in the environment is finalized. A computer vision algorithm is normally used to achieve the same Simultaneous Localization And Mapping (SLAM). However, the algorithm has to address the challenges of space generation, camera motion, and robust feature tracking. Overall, SLAM has a rich set of algorithms such ORB-SLAM v2 [
82], which blends well with AR headsets. These algorithms have three steps: the feature extraction phase, the mapping of 2D frames into a 3D mesh, and closed-loop detection. In the first step, feature points and descriptors are fixed. Recently, CNN-based algorithms have been a popular choice with SLAM. In the second step, the visual SLAM algorithm maps the 2D camera into 3D pose estimation. For each captured frame, the 3D coordinates are marked in the virtual scene and mapped to determine a user’s position in the scene. ORB-SLAM adds additional data to refine camera poses and movements and forms a movement-by-movement correspondence. The key points from 2D frames are mapped to 3D locations with successive iterations, and localization errors are minimized. Finally, the semantic information is connected to the scenario in closed-loop detection. However, modern state-of-the-art schemes are still challenged in acquiring the 3D structure of the environment and sense the motion of objects. The accuracy of the object becomes a crucial aspect in such cases. Another aspect of avatar generation is tracking the eye movements (the location of the pupils and orientation) captured from the real world. Through VR, the captured visual information is presented as joint positions mapped to enrich the interaction with other avatars. Eye-tracking algorithms also use gaze prediction and are based on continuous measurement of the distance between the center of the pupil and the cornea. Moreover, the gaze moves at certain angles; the range is termed vergence. Computer vision algorithms in eye tracking measure the angles and the distance concerning the pupil angle to accurately form the avatar’s eye movements. A big challenge is the estimation of 3D depth by the VR device, and recent research is aligned toward depth estimation in eye-tracking movements. Another challenge is precise distance estimation during incomplete gaze, and this opens up exciting opportunities for users to realistically model avatars based on gaze estimation.
5.1.2. Scenario Generation and Control
In this subsection, we discuss the understanding of scene generation, which is mainly based on object detection and semantic segmentation. For this, we need to estimate the distance between the referenced objects in the scene and the placed avatar. Another aspect is the stereo matching and the depth estimation. In semantic segmentation, we consider an image classified into various classes based on pixel-level information [
83]. This helps the developer understand the environment’s intricacies and the placed object. The AR-based headsets require semantic segmentation at ≈60 frames per second (fps), which becomes a challenging task. Thus, more real-time semantic segmentation algorithms are required for seamless interaction in the Metaverse. All objects are different and, thus, require additional meta-information to be trained. Another crucial aspect is object detection, which aims at determining the fundamental understanding of objects placed in the scene. For example, a face detection algorithm typically performs object detection in VR. For AR, projecting a new object onto an existing scene requires object detection. The 3D virtual object is placed, and the movements are synchronized with the physical object in the Metaverse. Algorithms such as Support Vector Machine (SVM) and Scale-Invariant Feature Transform (SIFT) are generally applied to AR and VR systems. Recent research has applied CNNs for the same in MR environments [
84]. These algorithms mainly work on detecting instances (faces, markers, and textual information) and generic category detection (chairs, tables, persons). Face detection algorithms should work properly with illumination and light conditions for avatar-to-avatar communication in the Metaverse. Shadow estimation is another challenging task and requires integrating illumination models in the training instances. Algorithms such as Faster region CNN (Faster RCNN) and You Only Look Once (YOLO) with a Single-Shot Detector (SSD) are applied currently to AR systems. However, there are challenges for small objects in the 3D environment. Projections (perspective) also hinder detection accuracy, as small objects might be outside the field of view of the camera lens. In such cases, the viewer needs to be adjusted, and the object detector algorithm needs to be trained accordingly. Another issue is the computational requirements of the Metaverse, as large-scale data might be distributed into many classes, which increases the computational complexity of the deep learning algorithm. Thus, research has been focused on the design of lightweight detection mechanisms. Finally, stereo-depth estimation determines the position of the objects in the created scenario. The estimation is based on the virtual object’s distance from the camera lens. Mainly, we consider the egocentric depth in such cases [
85]. Depth-measurement sensors are normally placed in headsets to cater to these requirements. Once the avatar is successfully projected into the scenario, image enhancement algorithms are applied to reduce the haze and luminosity of the virtual world. In such cases, image restoration and enhancement algorithms are applied to reconstruct a fresh image from a blurred image. This also includes noise removal from captured body movements, and thus, it would render the generated avatar in low resolution. In image restoration for VR, optimization-based methods have been proposed, which apply techniques such as color correction, texture restoration, and blur estimation to improve the image quality in the Metaverse. After restoration, image filters are applied for scenario enhancement, such that a super-resolution display is presented as the output. This involves techniques such as optical imaging and high-resolution display adapters to render the images in the Metaverse with accurate descriptions.
5.2. The Proposed Reference Architecture
The main components of the architecture are a contract interface, a provider, and the front-end for User Interaction (UI).
Figure 4 shows the reference architecture. The major functionalities and applications of this architecture are discussed in the next subsections.
5.2.1. Back-End: Contract Interface
The underlying business logic for any application is the back-end. Computation tasks and storing users’ data are performed at the back-end. In the proposed architecture, we envision a decentralized functional Metaverse, and the back-end is an SC interface that stores the application transactions on the BC. In this case, the BC functions as the ledger, which underlying applications can read, and transactions can be stored and retrieved later. The BC is designed to be a state machine, which means that the data can be written, but existing data cannot be modified. A contract interface is created to communicate with the back-end part. All the nodes on the BC collectively run a specific virtual machine, such as the Ethereum Virtual Machine (EVM). There are several Dockers
running on the EVM. Docker enables developers to create isolated applications configured to run many parallel applications and functionally bound data. On each Docker, we can execute
n SCs, represented as
, and the EVM records the state change. Each SC is attached to an ordering service
, which orders the transactions and assigns them to respective world state actions. The ordered transactions are then bundled into blocks for the consensus algorithm to verify the transactions. The back-end, in general, can be the public EVM or may be permissioned, which can execute a custom-designed consensus protocol to fit the application requirements [
86].
5.2.2. Front-End and Supportive Microservices
At the back-end, user applications in the Metaverse interact with the back-end through a Web 3.0 browser engine. The engine creates a service point for each application. Applications can thus access the front-end part using the service point. The front-end part of the application can be made using the Hypertext Markup Language, Cascading Style Sheets (CSS), and different JavaScript frameworks, which can be supported by the browser or tuned for a native application engine. For example, now developers are using a game engine with a prebuilt game component to be ready-to-use. Unity, Unreal, blender, Nuke, and Maya are popular tools for building 3D games for more than 25 different platforms. The front-end application can be hosted anywhere, including cloud services such as Amazon Web Services. However, this might create a single choke-point, considering the web service is centralized. To eliminate this centralization, the front-end app can be hosted using decentralized storage such as the IPFS or Swarm. Alternatively, to eliminate the monolithic architecture, which binds the application to a specific process, different services can be invoked through function calls, which allows the decoupling of application and service data. Such tiny services (microservices) are functionally independent and communicate through service end-points over the web reference framework.
5.2.3. Communication between Front-End and Back-End
The front-end app should be able to communicate with the SC on the BC to execute different functionalities. To achieve this, the front-end code needs to interact with one of the nodes on the BC. Hence, an application might set up its node on the BC to accomplish this task. However, setting up a node might take a few days and high storage, as all the previous blocks will be synchronized with this node. An alternative is to use a third-party provider that provides a BC node to interact with. Once an application is connected to the BC via a provider, it can read the data present on the BC. However, writing data on the BC requires the user’s signature using his/her private key. A signer node has users’ private keys stored in the browser and can access the front-end whenever it has to sign a transaction. Storing all the data and SCs on the BC makes sense for an application. However, it becomes extremely costly after some point, as every transaction needs to be validated, and it becomes costly for nodes to maintain the state of the BC. Hence, using off-chain decentralized storage such as the IPFS or Swarm is a viable solution. As the application scales, transactions increase, making it expensive for the application. Layer 2 (L2) scaling methods can be used, which verify transactions on the secondary chain and transmit an aggregate of validated blocks to the main chain.
5.2.4. Interaction of Web 3.0 with Diverse Application Verticals
Now, we present the schematics of Web 3.0 interaction with different industrial applications. However, the Metaverse focuses on combining all the applications into a single virtual space. The communication between the applications and the servers (BC) is carried out via different channels over the 6G network using a service gateway or Application Programming Interface (API). Every application uses different services of 6G specific to the requirements of the application. There is an IPFS storage local to the application. Data important for inter-application communication (such as avatars and transactions) are stored in a common IPFS storage. The details of application-specific communication are presented below:
Industry 5.0—With the advancement in technology, there has been much talk about Industry 5.0. Digital twins is one of the most-prominent technologies, which allows the control of the factory in a virtual environment without physical interaction with the machine. Digital twins of the machines can work in real-time due to the sensor data. All of these are managed and provided using AI algorithms. However, humans are to be brought back into the factories. This could be accomplished using cobots (collaborative robots that can work alongside humans). The applications for Industry 5.0 in the Metaverse require the 6G-umMTC service, as many devices and machines are to be connected.
Healthcare—One of the most-challenging areas to implement the Metaverse is the healthcare domain. Accuracy, reliability, and trust in technologies are very important. The most-debatable technology is AI, as there is no way to understand how and why AI makes a particular decision. However, with the advancement of Explainable AI (XAI), the AI algorithms will include interpretability, and this will change the dynamics of the AI model predictions, which will be trustable, i.e., the model allows you to understand the decision made by the AI. For example, in 3D tomography, VR systems are included to capture the organ’s view, and the collected data can be analyzed via XAI modules, which will improve the decision capability. A more salient feature would be flawless telesurgery using the interface provided by the Metaverse. Healthcare applications such as telesurgery rely on real-time communication; hence, the Tactile Internet will be useful. The Tactile Internet provides ultra-low latency and extremely high security, reliability, and availability.
Infotainment and gaming—The first industry to adopt the Metaverse concept is the gaming industry. Infotainment and gaming industries earn much from creating innovative ideas to present their product. Scenario generation with objects requires much information. The common avatar of a person throughout the Metaverse can be created and customized through one of the gaming applications. All of this requires a toolbox helping the XR technologies. A separate IPFS with Filecoin-enabled storage can be used. Filecoin provides incentives to the nodes that permanently store the data. However, users would have to pay to store their data. As much data are required for rendering scenarios as quickly as possible, the FeMMB and eURLLC services of 6G can be used [
87].
Vehicular networks—Autonomously driving vehicles are gaining popularity. The progress of vehicular networks will be fast-forwarded with advances in the Metaverse. Traffic management and optimal routes for many vehicles can be performed efficiently using faster connections and smarter devices [
88]. However, the computations required by vehicles for making a quick decision cannot all be performed on the core BC-assisted Metaverse back-end. This would require support from cloud platforms providing Edge-as-a-Service (EaaS). Low-latency communication is possible using edge servers. Low latency is essential for these networks, and hence, 6G-ERLLC rate-splitting services can be utilized to minimize the latency bounds.
Financial applications—The Metaverse aims to make all finances decentralized. Cryptocurrency plays the most-important role in this. NFTs are also an integral part of the Metaverse, as ownership of literally any object, land, and avatar is based on NFTs, i.e., with the help of NFTs, the ownership of the product can be transferred from one customer to another. Applications providing decentralized finances need low latency for faster communication. The 6G-ERLLC service of 6G can be used for this.
5.3. Flow Diagram between Users and Components in the Proposed Architecture
In this subsection, we discuss the flow diagram and actions performed when different application users want to interact with the Metaverse via an enabled AR headset, represented by
and
.
Figure 5 presents the flow details. Earlier, in
Figure 4, we demonstrated that the different industrial verticals communicate with the Web 3.0 cognitive layer, which has AI models to improve the immersive experience and quality of communication. We now explicitly present the communication flow and interaction between the users and associated components.
In
Figure 5, we consider a case study, where two users (in different applicative verticals) of our proposed architecture wish to communicate in the Metaverse (virtual space) or share AR data between each other via Web 3.0. We considered that both users operate through a resilient communication network (5G and Beyond), which uses Network Function Virtualization (NFV) as a key element to communicate with the web service. Essentially, NFV provides Virtual Network Service (VNF) points, which allow virtual machines to communicate with the base hardware networking components and control them through software functions [
89]. Thus, network service functions such as bandwidth assignment, load balancing, routing, and security (firewall) setup can be performed based on functions. The functions are highly modular in approach (in 5G and Beyond networks); thus, depending on the specific application requirements, they can be changed, allowing the virtual machine to communicate with the base network accordingly. VNF thus allows easy provisioning and automation of the network. For both
and
, we considered two VNF functions,
and
respectively. The functions interact with Web 3.0, which provides the semantic interpretation of the data and associated context through different AI models to provide a better experience to the digital user.
Web 3.0 supports massive machine-to-machine communications, which is interpreted at the semantic web engine (cognitive web layer), where different deep learning algorithms (mostly natural language and speech processing) allow the web to understand the provided inputs accurately. A Web 3.0 user would understand the user’s actions, choices, and content he/she likes based on the content presented. The cognitive layer might generate user recommendations once he/she opens the web platform (Netflix, Flipkart, or Amazon recommendation engines). The only difference is that, now, this would be performed by the web itself and will not be processed at the end application, which would improve the user experience.
In the flow diagram, we assumed that requests WebAR content from residing in another application. Web AR content normally has six Degrees of Freedom (6DoF) to track a 3D avatar in the Metaverse. The FOV of the camera provides the required projection of AR objects in the real space and also avatar generation in the virtual space (VR-based). The request is forwarded to , which selects an appropriate VNF to communicate with a base network and forwards it to the provider API of Web 3.0. At Web 3.0, semantic parsing of the context is performed, and the communication URL is secured via a service web API. A unified web service such as Representational State Transfer (REST) is used. The request is forwarded to , which selects the appropriate network service at the receiver application . It forwards the request to to respond with the AR content. Once a response is received, sets up a buffer and flow management to perform load balancing and control. It encrypts the HTTP headers (secure HTTP, SSL/TLS, or other) and sends them back to Web 3.0.
Once the content is received, the header is decrypted, and the data are parsed and stored in the local IPFS. The data might contain additional information (textual or noise), which is not useful for , and thus, mining algorithms (AI-based) are applied to identify precisely the AR content and DoFs, which helps the AR content to be projected into the real-world space accurately. The content, along with the IPFS hash, is shared with , which, via the 6G FeMBB service, forwards it to . Here, the bandwidth is the prime requirement, and thus, VNF appropriately selects the broadband service. During the transfer, the receiver follows the transport layer’s appropriate flow and buffer management. accesses the stored data by providing the IPFS content key authorized at the Web 3.0 layer. Once done, a bidirectional connection setup is established between the browser and the user, where bidirectional inputs can be provided and received to alter the interactions in the Metaverse.
Based on the content type and duration of AR content, pricing and offers are attached by different users. requests the price information, with associated offers to Web 3.0, which forwards it to . In this place, real-time communication is required, so the ERLLC service is selected. The final price and offers associated with the web AR content are forwarded to Web 3.0 by , which sets up a broker mechanism for communication (as there may be multiple requests from different clients, and a message queue is formed). It also notifies and about its wallet address and sets up a smart contract to execute the transactional payment. Once a contract is executed, the meta-information is stored on the BC by the Web 3.0 browser, and the content is streamed via a connection-oriented duplex service. This use-case is specifically for AR content transfer through Web 3.0, but the interaction flow is generic to support the transfer of VR and XR content among industrial applications.
6. Applications of the Metaverse in Industry
This section enlists the applications of the Metaverse in the context of diverse industrial scenarios. In addition, this section addresses the requirements of RQ 5 by presenting the Metaverse as a viable solution to assist different applications. The details are presented as follows.
Table 5 discusses the potential Metaverse applications, the integration components, the key challenges, and the potential Metaverse solutions.
6.1. Manufacturing
The Metaverse can enable the ease of manufacturing without physical simulation and testing, making the process efficient and easy to access. The Metaverse enables the simplified generation of user design with low cost, controlled customization, the integration of software for 3D modeling (e.g., computer-aided design), collaborative design and manufacturing using XR and immersive technologies, and knowledge sharing [
90]. It also improves visibility and transparency through 3D representation, thereby decreasing the time to market for products.
The Metaverse optimizes the building-block industrial structure of manufacturing factories. Integrated with digital twins, it enables structural and production-line reformation of smart factories in real-time to speed up the production process. Adopting VR in smart factories with human interaction and online multi-user scenarios creates a new approach to planning and designing production plants. The user can make visits and execute practices in the production environment of the plant. Autonomous mobile robots can automate the industrial process, thereby increasing production speed [
54]. Workers can optimize the working capacity, workforce, and necessary equipment/instrumentation to maximize production efficiency in the Metaverse [
91].
6.2. Internet-of-Senses
In the future, Internet-of-Senses technology will form an important integration with the Metaverse combined with 6G wireless technology. Advanced holography techniques will process and transmit digital representations of human senses and receptors using a 3D camera to provide an immersive multi-sensory experience under the backdrop of ultra-low latency, ultra-reliable, and extremely high-bandwidth 6G communication network. The ubiquitous 6G network (1 Gbps data rate, 0.1 ms latency) will improve various Metaverse applications such as telepresence, teleportation, and ubiquitous interaction. It will reduce the boundaries between the physical and virtual sensory experience, improving the perception of reality. The Internet-of-Senses will integrate XR over 6G communication channels, where AI-enabled cognition skills will be improvised. It eventually will make the model have deeper perception and awareness of the actual physical world [
92].
6.3. Marketing
The Metaverse provides an economic ecosystem to continuously buy and sell clothes and products from the production company. It also enables the utilization of potential by a particular company on social media. It enhances business models, in particular in the retail and entertainment sector utilizing AR and VR technology [
93,
94]. Furthermore, the Metaverse connects different people and agencies with virtual identities due to its immersive nature and empowered virtual economy. It enables peer-to-peer interactions, crypto transactions on good digital assets (including NFTs), user-generated content, and “world-building”.
6.4. Industry
The Metaverse increases productivity along multiple product life cycle phases by creating multiple accurate simulations at a lower cost instead of building physical samples. Using digital twins, the Metaverse enhances operational efficiency, increases product quality, reduces quality risk, and provides real-time interaction between customers and developers to upscale product life cycle design efficiency in the development cycle. A well-known example of the industrial Metaverse is NVIDIA Omniverse™ [
95]. The industrial Metaverses obtains data from various sensors and operational lines and provides sensible data analytics and decision-making to enhance the production efficiency of the space, reducing costs and maximizing sales value. Furthermore, adopting federated learning, AI and BC can optimize and enhance trust, decentralization, and privacy preservation among Industrial-IoT nodes to maximize data sharing efficiency [
56].
6.5. Education
The Metaverse can allow language experts to utilize language as a learning environment in the working, entertainment, and learning sectors compared to AR- and VR-based learning experiences. It enables professionals to utilize trainees to undergo authentic and creative training rather than short-term assignments. People with different disabilities can use AI to improve their functioning, and AI-supported tools will help the elderly and people who need special tools for intellectual communication. AI provides a futuristic technology in the educational field, such as adaptive personalization and intelligent tutorials with automatic assessment features [
96,
97]. The concept of virtual labs and the IoT provides an interactive, immersive, and safe experience to students learning at different levels [
98]. A technology-based simulation environment such as XR creates a powerful learning experience that provides an opportunity for collaborative teaching [
99]. The Metaverse’s Non-Player Characters (NPCs) remember user settings, emotions, behaviors, and audio–video interactions to enable fast growth. NPCs can help tutors and tutees interact in real-time to solve real-world problems. Similarly, NPC peers help social constructivism by providing real-time discussions and learning among colleagues [
100]. The Metaverse creators provide several learning opportunities to serve at the technical and managerial levels. This enables learners to develop skills and cognition in real-time threat situations. It also allows long-term involvement and practice, provides a low-cost, effective learning and simulation environment, increases collaboration, and improves professionalism in career opportunities.
6.6. Internet-of-Bio-Things
The Metaverse can enable an immersive experience for the users through a human–Metaverse interface. Various energy-harvesting self-powered technologies will utilize environmental energy and collect data. The output can address human motions, mechanical and sensory activities, haptic and virtual feedback, and other avatar and environment construction indicators. Recent advances in self-powered sensing technologies and the IoT can help provide a ubiquitous experience to create a cognitive digital twin representation in the Metaverse environment [
101]. With the assistance of physical and chemical self-powered biosensors, it would formulate a constructive environment to support the Internet-of-Bio-Things, which can capture avatar motions and emotions through bio-generated signals. These signals are converted to the electrical waveform for simulation devices. Thus, it would allow users to work from anywhere, which facilitates the creation of virtual offices and presentations (such as conferences, expos, and exhibitions). Moreover, data collected from biosensors can help the manufacturing/production sector test in the virtual world before implementation in the real world, thereby saving costs and shortening time to market.
6.7. Vehicle-to-Everything
Autonomous vehicles in V2X solve the real-time traffic conditions for time-sensitive and critical applications. Incorporating V2X in the Metaverse enables decision-makers to test the challenges of developed technology by simulating various vehicle traffic scenarios. As BC is an integral part of the Metaverse, the overall system would ensure reliability, fast connectivity, security, and trust. The Metaverse will ease autonomous vehicles to work as free-roaming robots or sharing taxis. Surveillance using the Metaverse on personal avatars for car-sharing customers can provide the detection of any fraudulent activity or damage to the vehicles, as implemented by Zipcar [
102].
6.8. Internet-of-Gaming
Using the VR and XR environment, the Metaverse enhances the immersive experience to improve the gaming experience. It provides 3D visualization among users and inbuilt gaming elements. The Metaverse generates AR to provide live game streaming, gaming with NFT mining/trading, and earning with cryptos, which builds the Internet-of-Gaming (IoT) ecosystem. In the IoG, users can invite friends from the real world to interact and develop relationships and connect with other players in the virtual world. Players/users can create a sub-game within a game in a virtual user-defined environment to perform various activities. Moreover, due to the Metaverse’s portable and interoperable architecture, players can port avatars/weapons from one environment to another through NFT ownership. The popular games based on the Metaverse include Decentraland, Otherside, Robolox, Star Atlas, Axie infinity, Illuviam, and Cryptovoxel. However, cyber gambling laws in many countries restrict their users or players from exchanging assets, which are not real, but involve real-world assets such as NFTs. Recently, Valve Steam, which is a video game service provider, has removed all the games from its platform that trade assets with the help of NFTs in the blockchain environment.
In terms of game engine platforms in the IoG, these are supported via a rich set of Web Graphics Libraries (WebGLs), which have a rich set of Javascript APIs to render 2D and 3D graphics in the browser itself [
103]. The gaming support of WebGLs is closely related to the OpenGL 2.0 specifications, where normal marker elements are used in HTML for design elements. Supported by rich API calls and hardware acceleration engines, a seamless experience is provided to the Metaverse gaming platforms. Recently, advanced game engines have been developed using the Unity and Unreal platforms. Unity uses C# as the language in editor mode while communicating with other applications, but the performance degrades with a large number of concurrent applications. Unity uses a Blueprint node-based logic engine, and users can inherit behaviors for game objects in the editor. Unreal uses C# and C++ as the programming languages. It is based on engine-specific macro calls, where specific AR and VR content can have a plug-and-play basis, making it more suitable for concurrent users.
In terms of 3D rendering, Unity has a more robust system than Unreal. A visual node editor named Shader Graph, which works on the High-Level Shader Language (HLSL), is provided by Unity, which has modules that have low abstraction layers, which provides higher control. Context switching and effects in Unity are a limitation for high-definition video. In Unreal, a customized shader code is built on top of the HLSL, accelerating the rendering process. Thus, high-cinematic experiences are better with Unreal. In terms of real-time Visual Effects (VFX), both Unity and Unreal have a vertex shader, which aligns the animation planes with the camera angles. Unity uses a visual effect graph with visual scripting, allowing the object (nodes) to work seamlessly with different mobile devices. Unreal, on the other hand, uses Niagara, a visual editor quite close to the native Shader Graph of Unity. Niagara uses stacked modules, and the game parameters of each module can be coded with a very high level of abstraction, providing higher VFX accuracy and control.
7. Challenges and Future Directions
As discussed in
Section 5, there are inherent challenges in the Metaverse’s integration with Web 3.0 services.
Table 6 depicts the key parameters, the challenges faced in the technologies, and the prospective future directions to overcome those challenges. Thus, this section discusses the potential challenges for the presented architecture and highlights the possible solutions. The section addresses RQ 6 and highlights the potential the Metaverse’s realization with current technologies.
7.1. Ownership of NFTs
With technological advancements, the risk boundaries of cyber threats and online fraud have also increased. For example, an intruder might exploit vulnerable variables of NFTs or steal the user’s private key to transfer ownership of NFTs, resulting in identity theft. Furthermore, once the NFT is minted on multiple BCs, counterfeit NFTs can be created and sold on the Internet. To avoid this situation, a unified system with chronological asset tracking must be developed to determine the true owners of any NFT [
17].
7.2. Security of NFTs
Most digital wallet holders assume NFTs are inherently secure as they are stored in a wallet and rely on BC. As NFTs combine confidential user data, there are the challenges of privacy-based attacks. Currently, there are no effective solutions to ensure the security of NFT tokens. Thus, privacy-preserving techniques such as anonymization, unlinkability, and the diversity of data must be incorporated in the back-end SCs to ensure NFT ownership security [
17].
7.3. Lack of Non-Repudiation
The lack of non-repudiation is a serious issue in the Metaverse. Parties can only withdraw once a transaction is confirmed on the BC. To avoid the issue of repudiation, multi-signature contracts can be implemented, which ensures the binding from both communicating sides who wish to transact via the Metaverse [
17].
7.4. High-Bandwidth and Low-Latency Requirement
Holographic communication in the Metaverse requires real-time operation and sufficient bandwidth and throughput. In addition, network security, reliability, and confidentiality must be enhanced while projecting a holographic image from a distant location. V2X networks and the IoG require low-latency protocols over wireless links to function seamlessly and efficiently. Thus, services such as 6G-ERLLC and 6G-uMMTC should be integrated into the Metaverse communication layers to ensure high service quality and the desired QoE in user applications.
7.5. Network Security
6G networks are also vulnerable to access control attacks such as password attacks, spoofing, and Distributed-Denial-of-Service (DDoS). Physical layer security techniques must be implemented to resolve this issue. Advanced techniques such as quantum coding and secured non-orthogonal access mechanisms must be included during communication in the Metaverse ecosystem.
7.6. Wide-Scale Adoption of Hardware
The hardware used presently for AR and VR is neither lightweight nor affordable, which makes the wide-scale adoption of these devices a critical issue. To handle this issue, new lightweight and affordable hardware devices must be created with flexible plug-and-play interactions among different devices.
7.7. Harmless Devices
Another issue with AR and VR technologies is creating devices that are not harmful to health after prolonged usage. As VR headsets can cause eye discomfort and blurred vision, manufacturers must resolve this with appropriate retina displays and pixel densities.
7.8. Bulky AR and VR Gadgets
Everyone wants a headset the size of normal glasses and that is easily portable, but the current AR and VR gadgets are bulky. Thus, future research towards fabricating lightweight and sleek AR and VR wearables is a prime requirement.
7.9. High-Quality Input Data
The systems such as face recognition, predictions, and voice and speech recognition give accurate results on high-quality data. Hence, users must ensure that the input data provided to these systems are of high quality in terms of accuracy, integrity, and completeness.
7.10. Accuracy in Predictions
Applications such as V2X networks, healthcare, education, and manufacturing pipelines require more intelligent control and accurate prediction modeling. With the advent of digital twins, virtual inputs can be supplied to prototype models, and AI models can be designed for accurate predictions [
104].
7.11. AI Is a Black Box
The most-prominent issue with AI systems is that they are black boxes, which involve complex programming that the user is unaware of. This issue can be resolved by implementing XAI modules in AI in the design phase to improve the transparency and auditability of AI systems. XAI will help the user and the developer understand the decision made by the model.
7.12. A Unified Reference Framework
The central aspect in the creation of the Metaverse is the Web 3.0 architecture. However, integrating all the necessary technologies into a single unified architecture requires much work. To resolve this issue, we need to develop generic protocols that may be utilized in different industrial domains per the user requirements.
7.13. Decentralized Web
The attraction point of Web 3.0 is that it is decentralized and will make the Metaverse decentralized. However, the challenge is to fully decentralize all the Metaverse components; only a successful realization of a decentralized web will come into existence. This is a challenge as web components and servers still function through legacy controls and protocols, and thus, a widespread change will be required in the backbone infrastructure. Thus, in the future scope, research is required in the design of decentralized protocols that function over the web, and open standards and APIs need to be integrated with the Web 3.0 engines, where the data can be seamlessly transferred to different endpoints.
8. Case-Study: BC-Assisted Metaverse Real Estate Management
This section proposes a case study based on Real Estate Management (REM) in the Metaverse. The section addresses RQ 7 by conceptualizing a practical use-case scenario in the Metaverse. The selection of the REM application was made as, at present, such schemes are operational through AR and VR in real setups. Thus, with the widespread adoption of the Metaverse in industrial applications, an important aspect would be integrating the Metaverse for buying and selling real estate and virtual properties. As per the report by
PR Newswire [
105], the real estate Metaverse market is expected to increase to USD 5.37 billion by 2026, with a CAGR of 61.74%. The real estate spaces in the Metaverse will be flexible, where users can virtually socialize, sell NFTs, and attend meetings in real-time from the comfort of their homes. These applications will allow buyers and sellers to interact, and fund transfers will be supported via SCs. The asset ownership transfer will be conducted via NFTs, which are recorded on BC ledgers.
Thus, in the case study, we present a Buyer–Broker–Seller (BBS)-based reference architecture of the Metaverse-assisted REM and divided our architecture into operational phases, namely the Metaverse engine, the communication interface, and the back-end business logic, which is supported through a Web 3.0 browser engine. We begin with the traditional approach followed by the proposed approach, where we present the application interface, back-end, and business logic of the Metaverse-assisted REM. Finally, the performance analysis of the proposed case study was considered against the traditional approach. The details are presented as follows.
8.1. Traditional Approach
The traditional Buyer–Broker–Seller (BBS) approach involves sellers handing over the land property to a broker to sell. The broker’s task is to find the potential buyers and the best deal for the seller, and for each successful transaction, some amount of brokerage is earned from both parties. The process is highly time consuming and requires trust and essential paperwork. The land registration is performed through a registry office, and it involves lawyers to form the legal documents, which are presented to the judiciary for legal transfers. In addition, proper seller verification is required such that the land is not sold to multiple buyers on a lease basis. Thus, a better solution to incorporate trust in the BBS ecosystem would require SCs to replace the legal formalities, where the asset transfer is recorded on the BC. However, the buyer still has to physically visit the property, which can be replaced by a virtual AR and VR tour, making it comfortable for the buyer. The traditional approach, however, does not consider buying and selling virtual lands on the Metaverse.
8.2. Proposed Approach
In the proposed BBS architecture, we considered a VR-integrated virtual REM touring service.
Figure 6 presents the details of the functional architecture.
To improve the QoE of the buyer, we considered the architecture on the backdrop of 6G-FeMBB communication to address the high-bandwidth requirements. Our proposed approach considered that avatars are created for each physical entity (buyer, broker, and seller, respectively), who meet in a 3D virtual Metaverse environment and interact and socialize. The proposed REM architecture has three main components: (1) application interface, (2) back-end and business logic, and (3) connection with the Metaverse. The virtual world in the Metaverse is rendered and deployed through VR, and land properties are available for trading on the EVM, where records are stored on the Ethereum BC. In the Ethereum chain, REM agreements are stored as transferable NFT assets, and BBS wallets are linked to seamlessly allow payment transactions over SCs. The BBS avatars and wallet references are linked over the Web 3.0 AI models, which makes the interaction with the Metaverse smooth and functional. We next present the details of the individual components.
Figure 7 shows the snapshot of SC asset transfer and interaction between a single buyer–seller, where the buyer is denoted as AHB1 and the seller is denoted as AHS1.
The contract logic for virtual land transfer (asset) is governed in the form of NFT tokens and is thus denoted as nft_asset in the pseudo procedures. The contract can be executed over a public blockchain network (where the functions and methods are public and visible) or might be kept permissioned to ensure privacy. We took the use-case of a permissioned network, where Dockers are utilized to execute the contracts (called chain codes) [
106]. An operational agent was set up for events listed in the contract, executed in Hyperledger Fabric. In the Fabric, we maintained the logic through a ledger state containing the endorsement policies. The chain code execution requires the contract nft_asset ownership transfer, and three main functions are executed, namely the query() function, the transfer() function, and the update() function. As Dockers allow isolation, the execution state of the contract is managed via functions in a world ledger. The flow is as follows. Initially, the seller AHS1 initiates an application request, which contains the buyer address, seller address, and transfer function call. The parameters for the transfer function are (nft_asset, seller address, buyer address). The call invokes an asynchronous call to the CreatePropertyNFTAsset, which contains the ledger counter ctx value, property identifier (prop ID), owner (AHB1), registration of virtual land identifier (regID), Metaverse environment identifier (metaID), and property value (in terms of ERC tokens). The state variables are initialized and added to the fabric buffer, where the asset contract is executed. In virtual land buying and selling (for example, Decentraland), the ERC-1155 token is normally used [
107], which is a multi-token standard and supports most contract implementations and compilers. The erc1155 identifier is set up in the contract asset, and the query() function invokes the get() method to know the value of the NFT token. In the transfer function, the nft_asset is transferred, and ownership is changed from AHS1 to AHB1. Finally, the ledger state is updated through the update() function, and the property value is debited (in the form of ERC tokens) from the AHB1 wallet address. It is credited to the AHS1 wallet address. The details are timestamped through the put() function. The ledger state is updated at a low level, and the fabric buffer is written with the new value. The channel state of the Hyperledger Fabric contains the newly created instance, the ledger detail, the transfer timestamp, and the chain code object. Once the contract is finished, all buffers and states are flushed, and the data are written to the transactional ledger, which is mined and stored on the BC.
8.2.1. Application Interface
The application interface provides a way to deal with two different entities, i.e., real and virtual properties. For real estate properties, the application offers a VR view, where buyers can feel and experience the visuals of the renovated property using AR technology. Virtual lands are developed on a computer in total synchronization with the real world. In case of any new development in the physical REM, the corresponding changes will be reflected on the virtual REM, and the meta-information of the land will be recorded on the BC. This technique allows real-time construction updates to be chronologically recorded and timestamped; thus, the buyer is updated on all seller developments. In the case of virtual lands, VR-enabled touring services are the only potential solution to find interested buyers.
8.2.2. Backend and Business Logic
At the BBS back-end, we considered b buyers, represented as , c sellers , and d brokers . In total, we considered a property lands (both virtual and physical), represented as , which can be shown to via . The details of are stored on the public Ethereum BC, and the private details of are stored on the local IPFS. The adoption of IPFS storage is to web-scale the application as the information will have many fields, and thus, it is not feasible to store on a public BC. Moreover, the mining rate on public chains is slow, and thus, we generate an IPFS content key , which is a 32-byte hash, and link it to the corresponding entry. The hash address is stored as transaction information on the BC, allowing many transactions to be appended on a single block only.
For every
, there is an associated SC, denoted as
.
Figure 8 presents the SC execution between the entities. Any
who wishes to sell his/her property to the registered
must register his/her
on the BC via the IPFS. The detailed information is stored on the IPFS, which can be accessed with the help of 32-byte content address
. To make a deal,
needs to execute the SC with parameter
. The transaction is recorded on the BC, and ownership is transferred from
to
. The wallet addresses of all entities are stored with the contract address to expedite future payments. Along with the SCs, an NFT
is created in the Metaverse to transfer asset ownership from
to
via
. A transaction is recorded once the deal is finalized. Multiple potential
book tours for
properties with minimal booking charges are included in the contract as an incentive function.
is responsible for the tours and has to pay for the BC resources utilized during the tour. When
finalizes the
for buying or renting
, a smart contract (
) is executed involving (
) entities with their wallets
,
, and
.
8.2.3. Connection with the Metaverse
The application connects
to the Metaverse through NFT ownership. Each
is associated with one unique NFT, which can be transferred from one owner to another. Any change in the NFT record is recorded on the BC as a transaction. Now, we envision that avatars
are rendered in the Metaverse
, which denotes the Metaverse ID of
.
initiates
and invites
to visit the virtual land. If
confirms the purchase,
is notified, and the brokerage amount is added to the SC. Now,
generates an NFT ownership token,
, which is linked to
, denoted as
. The wallets are checked for fund sufficiency, and the automated transfer of funds takes place from
to
for
and the brokerage amount to
. A similar brokerage amount is also debited from
to
as a selling incentive and fixing the costs of avatars to visit
. Once done,
’s ownership is transferred to
(which indicates the transfer of NFT rights). The details are publicly verifiable through the IPFS linked to
, which is accessible through a web-API gateway. Please note that all entity wallets are registered via Ethereum SCs. However, as SCs are vulnerable to code dependency and injection attacks [
108], a private or consortium version of SC execution can be considered via a Hyperledger Fabric channel. In this case, the contracts are executed over isolated Docker addresses, and endorsers’ addresses are authorized. Another benefit of including a Fabric channel is customized consensus, which is fine-tuned to execute with high throughput and low block mining time. Thus, the transactional finality rate decreases rapidly. The only downside is that the transactions are not publicly verifiable, and future legal claims would not hold as only authorized stakeholders have the view rights.
8.3. Performance Evaluation
This section presents the proposed case study’s performance evaluation and simulation results. We assumed that the Metaverse-enabled REM provides an avatar for that enables him/her to explore all listed on the public chain.
Figure 9 represents the cost comparison associated with the traditional way of selling a property where the broker provides limited options to a potential buyer and obtains the brokerage from both the seller and buyer. The graph shows the cost of visiting a particular place in a defined kilometer range with a traditional transportation system. On the other hand, a potential buyer can explore all the possible properties just by paying a small number of transaction fees for the standard BC. AR and VR provide the same constant cost of exploring lands in the Metaverse irrespective of the actual location as the 360° version of the land is available. The Metaverse-enabled REM is linked with the BC to provide trust between the potential buyer and seller. A buyer can purchase a property via executing the smart contract, which transfers the ownership of the NFT associated with the land to the buyer once the transaction is complete. The details are stored on the IPFS, and the meta-information is recorded on the BC ledger. This creates trust compared to the traditional way of buying land, where the same land can be sold to multiple buyers, and the process is tedious and time-consuming.
As the BBS ecosystem scales itself, the inherent trust in the ecosystem reduces.
Figure 10 depicts the conditions.
As the records stored in the BC are immutable, the trust remains constant, whereas the trust in the traditional approach (non-BC or distributed storage) drastically drops, with collusion among users. We introduced trust probability as an indicator, defined as the probability of verified transactions over the total number of transactions. As indicated in the diagram, the probability drops to 0.79 for 100 transactions, which shows that fraudulent transactions are authorized in the system. Similarly, for 500 transactions, the probability drops to , which is less than half, and it deems the ecosystem as non-trustable.
All land information is first stored in the IPFS, and a hashed content address is stored in the BC to reduce the mining cost of the system. Land information consists of the seller’s private information and the full land description. The information might take one or more blocks of the BC, which would increase the cost of storing the land record. The mining cost, however, remains the same if we store only the hashed IPFS content address of each land.
Figure 11 presents the details of the mining cost.
A piece of single block information consists of 4 bytes of block information, an 80-byte block header, and 2 bytes for the transaction counter. Each land content address is 4 bytes. For 1000 land transactions, the approximate size of the block is ≈3.66 KB. The current mining cost of one block of 5k transactions is processed in USD 5. We compared our work to Chopade et al. [
109], which can store ≈200 transactions with the exact cost.
9. Discussions
In industrial applications, decentralization in cyber–physical space has allowed Metaverse applications to share data through supported Web 3.0 engines. As discussed in earlier sections, the Future Internet has a prominent role in supporting the Metaverse’s content, mainly through assisted AI models, where the virtual world would move in tandem with the physical world. The Metaverse would blend the human element into the production units and pipelines in the manufacturing sector and allow assets and ownership among different stakeholders [
110]. The BC-assisted Metaverse would play a major role in the supply chain ecosystem, where logistics and goods ownership would be traded via NFTs and ownership transfer would be supported via SCs. Similarly, in the crypto and NFT gaming ecosystems, many games such as Decentraland, CryptoKitties, and the Sandbox are played using NFTs. The games use the BC to verify and track assets, making users trade and sell their game objects for real money. The games also allow rewards and incentives for players who perform many trades. However, the NFT game market has recently required a unified interface to enable business-to-consumer transactions and instantly support wallet transfers. Currently, the cryptocurrency exchanges are not in tandem with the NFT assets, which makes it difficult to trade items. Another issue is that, sometimes, the trading takes the gamers to external pages, which are not secure and can be hacked by cybercriminals. NFT trading should be supported in the game environment itself, and currently, many game makers are making trading through third-payment crypto-exchange gateways. Another issue is that play-to-earn gaming (as in the case of Axie Infinity) is quite popular, but the initial price of the game has increased. Thus, cracked versions and codes are being circulated among the gamer communities, which are not verified and trustable. With the lure of easy money, many gamers present sensitive information in the open domain. Another revolution to support the industrial sector would be the generation of automated digital twins [
111], which would involve rapid prototyping of real industrial plants in the Metaverse-enabled virtual spaces. A 3D representation of virtual plants would ensure the real-time integration of process, control, and operational data flow. This would greatly strengthen the industrial pipelines, and any change to the real world would be emulated in the twin plant. This would bring a revolution to the emerging Industry 5.0, which supports the principles of massive personalization and customization [
112]. The quest is towards responsive services in the healthcare domain, and interesting use cases such as telesurgery, telemedicine, and surgeon telepresence would be supported in the Metaverse-enabled health spaces. The progression would allow virtual clinic designs, remote monitoring, and patient avatar wellness. In telesurgery, remote surgical operations would be performed on the patient’s digital twins through a robotic arm, and the results would be analyzed [
113,
114].
In smart city infrastructures, IoT sensors are equipped to monitor the city’s resources and building models. The collected data can be trained through AI models, and digital twins can be constructed to monitor the resources. These twins can be hosted on a the Metaverse setup, which would facilitate the planning and management of urban resources. Similarly, the Metaverse can support V2X networks to improve driver interaction with road conditions and traffic, design virtual maps that can be projected in front of road spaces, and many more. Thus, there are numerous ways in which the Metaverse would contribute to industrial applications in the near future.
However, numerous challenges limit the spread of the Metaverse. The Metaverse needs to address the challenges of data leakage and privacy in any ecosystem, as most of the data are transferred through third-party integrations. The revenue model of the Metaverse needs to be formalized, and there are no content transfer and ownership regulations. The advertisements on the web store data in client systems as cookies, which can be used as a potential tool to leak sensitive information. Another problem is that every virtual world in the Metaverse has its regulation or laws and is controlled through autonomous AI agents. Thus, the governance role needs to be justified to ensure fairness in the AI ecosystems. No virtual model perfectly describes the real-world properties to their minutest details. Thus, AI models can become biased towards particular objects or users in the Metaverse. Mostly, the Metaverse worlds are restricted to gaming platforms, where the content of one game creator cannot be supplied to others. This brings interoperability issues while communicating between different parallel Metaverses. Another challenge that the Metaverse addresses are user addiction, where the users spend most of their time surfing these worlds through AR and VR platforms. Prolonged addiction to virtual Metaverses would be highly detrimental and might bring serious health issues. Thus, ways that limit the quota of users’ surfing on the Metaverse are required. Similar to social media platforms, in the Metaverse, cyberbullying is another unfortunate issue that needs to be detected and addressed [
115]. Most users who will interact in the Metaverse should be only those authorized; thus, user authentication before communication is a prime requirement [
116]. Only authenticated user avatars should be allowed to interact and socialize, which can prevent unpleasant cyberbullying incidents.
10. Concluding Notes
The Metaverse is the future communication and interaction medium for Web 3.0 users, and thus, it lays the founding principles on which the Future Internet will be built. High-tech companies have realized the potential of the Metaverse and, therefore, have made progressive steps towards its practical realization. In the research communities, novel schemes related to the Metaverse are proposed in different industrial applications, as it enhances the user’s experience in both the physical and virtual worlds. However, the Metaverse is not a singular concept. It would require radical security, communication, and computer vision efforts to make it practically feasible and enhance user interactivity. We systematically surveyed the recent Metaverse research through a proposed review method and studied different technologies, drivers, and principles that support the Metaverse. Based on our survey, we provided a sound discussion of the Metaverse and Web 3.0 interaction in industrial applications ranging from manufacturing, healthcare, V2X, digital twins, education, gaming, to other sectors. We presented a generic reference architecture that can functionally support these industrial verticals and proposed a case study of Metaverse-assisted REM ecosystems. However, there are still challenges in content generation, security and privacy, and AI model predictions, which must be addressed before the Metaverse becomes a reality and blends with our daily activities. We reflected on these potential challenges and discussed possible future directions.
As part of the future scope of the work, the authors will delve deeper into the privacy and security aspects of the Metaverse and propose a privacy-preserving and trusted Metaverse ecosystem for the manufacturing sector. We will look towards the design of automated digital twins and cobots that could communicate with human avatars in the virtual manufacturing spaces, improving the productivity, customizability, and scalability of the manufactured product in the industrial pipeline.