In the same way that the Internet is facing numerous security threats, Metaverse is no exception. It is impossible to discuss the future of the Metaverse without discussing cyber security challenges. Despite the fact that Internet threats and Metaverse threats are very similar, dealing with threats in a virtual environment can be extremely challenging and expensive. There are many security challenges faced by businesses and users in the Metaverse. There are also threats such as money laundering in virtual currency exchanges, security threats in online games, art forgery, and privacy risks such as theft of personal data, impersonation, avatar ownership, and thousands of others. As biometric data are increasingly used to verify Metaverse users’ identities, especially in social networks, it adds a large amount of personal information to the existing data, making it difficult to maintain and posing a threat to cyber security because they create something new. Our goal in this section is to categorize the existing security challenges based on their topics and provide researchers with interesting results.
3.1. Security in Metaverse Based on Biometric Data
A VR environment can be created with various type of visual stimuli and scenarios, which can be easily switched or repeated, while eye tracking shows exactly where the participant’s attention is at any given moment of the experience and what visual elements trigger certain responses. In particular, VR experiences can be enhanced by eye tracking. According to [44
], virtual reality devices store most of a user’s personal information, such as account numbers and biometric information, so they are constantly at threat of being attacked by hackers. In addition, these attacks can damage the headset’s vision. Many solutions have been proposed for authentication, but the most effective methods (e.g., PIN, pattern, biometric brain, and so on) require a significant amount of user time and are also highly insecure. Therefore, the authors of this study presented blinkey, a method for securing VR devices equipped with eye-tracking for user authentication. In this method, authentication is performed by blinking one’s eyes according to a rhythm that is only known to the user. This is a new method of authentication that uses a passcode. Instead of numbers, letters, or characters, users blink to different rhythms. To evaluate their blinkey method, the authors discussed the effects of the following commonly seen attacks: zero-effort attack, statistical attack, shoulder-surfing attack, and credential-aware attack. Their experiments were conducted using two machine learning algorithms for classification: support vector machine (SVM) and k
-nearest neighbors (k
NN). They achieved an average error rate (ERR) of 4% using this method, making it effective against all types of attacks. Compared with commonly used methods on mobile devices (e.g., passwords, PINs, and pattern locks), blinkey exceeded user expectations from the perspectives of security and usability. Due to a special genetic pattern controlling the pupil’s expansion and contraction, the accuracy and speed of the blinkey method are very high. In addition, the pupil size change between each blink is also as unique as a fingerprint.
An artificial-intelligence-driven deepfake is a picture, video, or audio recording that looks real but is actually a fake. Deepfakes can be created by AI technologies such as autoencoders (artificial neural networks that reconstruct the input from simpler representations) and generative adversarial networks (GANs). Metaverse users can create their own hyper-real avatars using their biometric data. However, deepfakes pose many privacy and cybersecurity challenges. Based on a deep convolutional neural network, ref. [45
] has proposed a lip-based speaker authentication system that defends against severe deep fake attacks. There are usually two types of deepfakes: (1) manipulation methods (e.g., face swapping) and lip syncing and (2) visual speaker authentication (VSA) systems are vulnerable to deepfake attacks, which mimic the original user’s pronunciation. It has been shown that the lips of people can be used as a biometric feature to distinguish speakers. Lip-based authentication, which has high security and can be used in most systems, contains both fixed and mobile data related to identity (i.e., the shape and appearance of the lips as well as their movement). Using VSA, the authors proposed a deep learning algorithm that can detect deepfake attacks without prior manipulation knowledge. Due to the vulnerability of static information to deepfakes, they have used dynamic information for authentication. To extract an image of the lip area from a video of a face, the authors used the Dlib detector (a detector which is used to extract the lip region from a video of a face), and to verify the spoken text, they used connectionist temporal classification (CTC). The proposed network consists of two subnets: (1) a low-level fundamental lip feature extraction subnet (FFE-Net) and (2) a high-level representative lip feature extraction and classification subnet (RC-Net). MOBIO (a data set of bimodal audio–visual data from 152 people) and GRID (including thirty-three speakers, each of whom speaks 1000 sentences containing commands, colors, prepositions, letters, numbers, and adverbs) were used as data sets. To verify whether the lip movement matches the user’s speaking habit, the speaker authentication network based on dynamic talking habit (SA-DTH-Net) was used. They used three deepfake attack methods: Faceswap (FS), Deepfacelab-Quick96 (DFL), and Faceswap-GAN (FS-GAN). According to their results, detection methods based on biometric features, such as SA-DTH-Net, performed better than other detection methods, especially in detecting fake videos produced by FS-GAN. Under the assumption that their method does not require any prior information about the deepfake spoofing method, it can be applied to defend against different kinds of deepfake attacks.
Online photos on social networks can be used to retrieve features from a user’s face, which poses a serious threat. In order to create security for users, face recognition tools should be strong and efficient enough. In [46
], a face authentication system was attacked, and by using images from social networks, models of the user’s face were created that weaken the security of the system. They deceived liveness detectors using a VR system and specific facial movements such as smiling. One of the most common attacks on face authentication systems is VR-based spoofing. Any system that uses color images and camera movement is vulnerable to these attacks. To extract 68 2D facial landmarks, the authors used the supervised descent method (SDM). Due to a median alignment error of 2.7 pixels, they used high-fidelity pose and expression normalization (HPEN) with a 3D morphable model (3DMM). In most cases, SDM works well even on low-resolution online images. For the sample, they spoofed True Key, BioId, and KeyLemon. For indoor logins, they achieved 98% accuracy and 100% accuracy, while for outdoor logins, they achieved 96% accuracy and 100% accuracy, respectively. According to their results, this method works well if the image height is 50 pixels; however, if the image resolution is less than 30 pixels, their system does not work, as they cannot obtain useful features from the image. Furthermore, there are three features that can knock down this approach: random projection of light patterns, detection of minor skin tone fluctuations related to pulse, and the use of illuminated infrared (IR) sensors. It is possible to avoid the first two with additional adversary efforts, but the third would require significant changes to their method in order to avoid.
According to [47
], in the Metaverse, human–machine interactions are fundamental, especially where augmented reality and virtual reality are combined, and sensors must be used to accomplish this. The authors used an all-in-one multi-point touch sensor (AIOM) with two electrodes to learn and recognize human–machine interactions using a deep learning method. Touch sensors are also used to protect security by enabling biometric verification, preventing the leakage of passwords. An Arduino Leonardo was used as a microcontroller in a circuit, which is connected to a computer by USB. In the AIOM, mechanical receptors convert the touch of fingers on the skin into a transient receptor potential by mimicking the function of biological sensory neural systems. Upon receiving this potential, the brain decodes it and reacts accordingly. In addition to detecting the spatial and temporal dynamics of stimulation by touching a different point, the touch sensor AIOM can detect the mechanical information of spatio-temporal dynamics. As a linear interactive control interface for playing piano, it is designed to validate linear touch sensors. Additionally, it can be used to control drones programmatically. To protect privacy, the authors also proposed a biometric approach utilizing an artificial neural network (ANN), in which the AIOM is mechanically stimulated by touch, and the mechanical signals are converted into digital signals, which are then used by neural networks to classify the features extracted from the digital signals. Key-pressing examples from three users and their dynamic characteristics were used as the training data set. There are three parameters to consider: holding time, interval, and signal magnitude. A back-propagation algorithm (BP) is used to calculate the output of each node in the ANN model. Furthermore, when entering the password of each user, the ANN model was able to identify them accurately. It was determined that about 98% of the identifications made by the algorithm were accurate based on the test results. A key component of the AI-based Metaverse is human–machine interaction, which, when combined with the variety of sensors and deep learning algorithms, greatly assists in the construction of all kinds of prosthetics, robotics, and so on.
It is important to keep in mind that privacy is one of the biggest dangers associated with augmented reality in the Metaverse. AR technologies are capable of observing what a person does, which puts their privacy at risk. Compared to other types of technology—such as social media networks—AR gathers a great deal of information about the user. The authors in [48
] described the development of an augmented reality app that uses computer vision and raw input data to deliver real-time attack guidance to an attacker’s phone. To establish an attack, this method mimics the unique typing behavior of users on a smartphone without modifying the device, installing software, or using special hardware. The attacker uses their own smartphone to run this app, positioning it such that the camera can see the user’s hands as they grasp their phone. By using this information, the attacker can immediately simulate the victim’s input behavior by superimposing instructions over the camera stream. In order to create an augmented reality prototype, they built a simple physical model. An invader uses audio beeps to time their taps and attaches a transparent film with spatial hints to the victim’s smartphone. In this study, the authors gathered 31 volunteers to test over 400 mimicking attacks. Their application utilized the OpenCV 2.4 library and Android KitKat (Android KitKat 4.4, Google, San Jose, CA, USA). Furthermore, spatial, temporal, and contact features were extracted and evaluated using an SVM classifier. Key-hold interval, inter-stroke interval, down pressure, down area, down x, and down y were the six target features identified in their study. In order to compare the results, they used both the proposed AR-based method and the audiovisual method. The data were divided into two parts for training and testing. Additionally, their data set was assessed using the zero-effort attacker model. In four minutes, they succeeded in 87% of the attacks. By using AR, the attack success rate improved from 6% to 73%. Input behavior-based biometrics can also be attacked using this method. Note that the challenge of capturing user behavior makes this attack relatively narrow.
], a user authentication system called GaitLock was introduced, which is an innovative method that can identify users based on their gait signatures. While typical previous methods for authentication through walking have been based on the use of sensors or pre-defined gestures performed by the users for identification, this system uses the onboard inertial measurement units (IMUs) present in virtually all popular VR/AR headsets. To this end, the inertial signals generated during walking should be analyzed to identify unique gait patterns. To extract walking patterns, dynamic time warping (DTW) and sparse representation classifier (SRC) were combined. Dynamic-SRC is a new model proposed for recognizing gait. Sensors employed by IMUs assist in identifying and stopping attackers. In this study, internal and external threats are considered. Six different walking detection methods were compared with dynamic-SRC, including dynamic time warping with nearest neighborhood (DTW+NN), time-delay embeddings with template matching (TDE+TM), nearest neighborhood (NN), and three variations of SRC approaches including zero padding, sparse fusion, and majority voting. Finally, they assessed the performance of GaitLock against zero-effort attacks and mimicking attacks. In comparison with other user authentication methods, their results showed an increase in accuracy of about 20%. Additionally, they achieved an equal error rate (EER) of 2.9%. In this work, sparse fusion modulates the detection accuracy by fusing sparse coefficient vectors from multiple sequential step cycles at the same time in order to enhance detection accuracy, as they must have been generated by the same subject. To conduct their experiment, GaitLock was implemented on Google Glass. Both internal and external threats were considered.
In augmented reality (AR) headsets, voice-based inputs can be used to recognize the user. However, attackers with unintelligible voice commands can attack devices that do not have a voice verification system. To defend against voice-spoofing attacks, a voice-spoofing defense system for AR headsets has been proposed [50
]. Two popular techniques used by older systems to identify sound are based on broadcast reverberation analysis and noise analysis. These techniques present unsatisfying performance, with a 17% false-positive rate. In order to combat voice-spoofing attacks, numerous liveness detection systems have recently been proposed. These systems use phoneme location, articulatory gestures, magnetic fields of loudspeakers, and throat voice to analyze the differences between the human voice system and loudspeakers. As all of these methods were primarily developed for smartphones, current liveness detecting technologies are generally not compatible with AR headsets. These proposed systems defend against voice spoofing by helping the human voices be propagated both internally and externally and using a low-cost contact microphone to collect body sounds in order confirm the user’s voice. Furthermore, voice spoofing can be prevented by detecting the common features in the frequency bands of human voices and/or by deploying a contact microphone on the user’s head to collect body sounds in order to confirm the user’s voice. The authors noted two key challenges to overcome: (1) the low signal-to-noise ratio (SNR) of the voice propagates to extract voice features from the raw time-domain signals and (2) determining a correlation between the internal body voice and the air voice of the user. The SNR issue can be resolved by transforming the raw signal into the time–frequency domain and utilizing spectrogram enhancement techniques. In order to find correlated high-energy blocks from both spectrograms, two voices are matched to estimate the correlation and similarity between them robustly. An obstruction attack was performed, in which a malicious user appears close to the normal user and issues a high-volume voice command. The next attack is the replay attack for voice-based authentication. It is assumed that an attacker can physically access a victim’s headset if they are not noticed, enabling them to record the user’s voice and replay it. Hidden Markov model (HMM)-based word segmentation techniques can be applied to each audio sample to segment it into different words. The proposed method has been implemented on a Raspberry Pi using an iRig HD 2 soundcard and an AXL contact microphone. Their system can correctly accept the normal user with a mean accuracy of 97% for all users and high accuracy of 92.3% for normal users. In terms of their defensive approach, they achieved mean accuracy of 99.2% and 98% against obstruction and replay attacks, respectively.
3.2. Security in Metaverse Based on Transportation Data
Drones, i.e., unmanned aerial vehicles (UAVs), are gaining popularity across a wide range of industries, including the Metaverse. Recently, a number of companies, including Walmart, Google-owned Wing, Magellan Health, and Brinker International, are experimenting with drone deliveries. Additionally, Drone Orange is currently building a giant Metaverse platform in South Korea using drones. In this massive project, drones are used to collect all the images and data. As a result, drones have become increasingly significant in the Metaverse. Drones require AI algorithms and come with many sub-challenges, such as privacy issues, identity theft, and security concerns. A fog-assisted Internet of Drones (IoD) was presented in [51
]. The fog node is responsible for analyzing the vast amount of data transferred by drones in the IoD. This volume of drone data collection inevitably leads to traffic generation and privacy leakage. Federated learning (FL) has been proposed as a means of protecting drone privacy. Even so, drone privacy can still be threatened through other means, such as eavesdropping. A power control scheme for drones was investigated in this paper in order to maximize FL system security. To resolve this issue, an algorithm was designed. As part of the FL system, the drones alternately download global model parameters, train them with their own data, and then send them to a fog node, where they are collected into a new global model. In order to avoid excessive power consumption, they proposed a power control in secure FL (PCSF) algorithm, as the drone’s transmission rate depends on air-to-ground and fog channels. For maximum system security, this algorithm counts all FL times, optimizes the wireless transmission power of drones, and selects the best FL time and optimal power control method. They compared their algorithm with the delay-aware algorithm (an algorithm to minimize FL time), and to determine whether their algorithm minimizes energy consumption, they also compared it with the energy-aware algorithm (an algorithm to minimize energy consumption). They considered N = 16 drones flying within a 1000 × 1000 m area. Based on the obtained results, as the number of drones increases, the security rate of the system increases. Compared to the delay-aware and energy-aware algorithms, the results of the proposed PCSF algorithm presented a greater increase in security rate. The FL training time remained the same regardless of how many drones were used. The PCSF algorithm training time was similar to that of the delay-aware algorithm but less than that of the energy-aware algorithm. The delay-aware algorithm operates similarly to the PCSF algorithm and outperforms it, when compared to the energy-aware algorithm, in terms of security rate. On the other hand, the energy-aware algorithm uses the most energy, as its training period is the longest. Additionally, due to the eavesdropping issue, the security rate of all three methods increases as the amount of eavesdropping grows. Regarding the effect on the quality of service (QoS), the security rate in each of the three algorithms falls as the QoS rises. The system’s security rate rises when the QoS is low, as more transmission power is needed to meet the QoS requirements. The results indicate that, in terms of battery capacity, security rate, and FL training time, the delay-aware algorithms are not related to improving battery capacity, as battery capacity restricts the minimum wireless transmission. However, as battery capacity increases, the security rate of the energy-aware algorithm decreases and the training time of its FL increases. Ultimately, the PCSF algorithm performed the best out of the three considered algorithms. In terms of accuracy, FL in PCSF and the delay-aware algorithm both experienced shorter training times as accuracy increased, while the energy-aware algorithm used no energy at all. This problem can be considered as a non-linear programming problem. Additionally, as eavesdroppers typically conceal their locations, increasing the channel power from the drone to the eavesdropping node is treated as a random parameter. It can be concluded that more wireless transmission power offers a higher level of system security.
Thanks to cutting-edge technology such as virtual reality, augmented reality, and the Internet of Things, automobiles should be built to be able to interact with the Metaverse. Upcoming vehicles—including planes, trains, trucks, and cars—will be based on computer platforms with the ability to receive and transmit data, on the basis of which they will perform their own functioning. Data transmission is made possible by the use of sensors, which are prone to hacking at all times. The issues surrounding safe data transmission between sensors were examined by the authors in [52
]. The authors looked at how jamming and eavesdropping attacks affect wireless network sensors. In order to deal with this problem as an optimization problem based on a Stackelberg game, they took into consideration both the single-antenna model and the multi-antenna model. Jammers have been employed in numerous studies to stop attacks. However, a jammer can also be used for harm. A jammer can lower the overall power usage of a cyber–physical transportation system (CPTS). This technology was referred to as a green cyber-physical transportation system (GCPTS) by the authors. The jammer can partially prevent eavesdropping assaults in addition to interfering with sensor and controller communications, as it broadcasts noise into the GCPTS system. The authors constructed two different kinds of communication method: a single-antenna sensor, in which the information is conveyed in a single channel, and a multi-antenna sensor, in which the information is separated into several packets and delivered at various channels. They used a Stackelberg game to describe the power distribution issue, with the sensor acting as the leader and the jammer as the follower. The Stackelberg game was used to mimic the power allocation problem, which may be thought of as an optimization problem. The sensor, as the leader, has priority, as they both want to make the most of their resources. A stochastic algorithm with feedback (SAF) and a newly developed intelligent simulated annealing (RISA) algorithm were suggested to achieve the Stackelberg equilibrium (SE) strategies. The common wiretapping model (CWM), the wiretapping model with friendly jammer (WMFJ), and the wiretapping model with malicious jammer (WMMJ) were all employed in their experiment. As a result, CWM always presented the lowest capacity for concealment and the smallest expansion window. There was not much difference between WMFJ and WMMJ in terms of their secrecy capacity or level. Additionally, allied sides have the same authority and ability for secrecy. Due to the power of the friendly jammer, WMFJ had significantly higher power than WMMJ. Moreover, while WMMJ used less power, it had a relatively high secrecy capacity.
3.3. Security in Metaverse Based on Virtual Learning
An innovative use of the Metaverse is the incorporation of a virtual world in an educational setting, which investigates its viability as an additional digital tool to the teaching–learning process in the context of a university, where the flexibility of access to synchronous and asynchronous information presents an alternative method of knowledge transmission and acquisition through technological means. In [53
], social virtual reality-based learning environments (VRLEs) were studied. A risk assessment method was proposed by the authors in this paper. In particular, to study young people with autism spectrum disorder (ASD), they used social VRLEs such as vSocial. It is important for VRLEs to provide a safe environment for young people with learning disabilities. VRLEs use emotion-tracking sensors, and their data are stored in a cloud environment. Attacks on these data can lead to negative effects, such as changing content and learning results (Figure 3
). VRLEs were assessed in this paper for the first time in terms of security and privacy concerns. All three aspects of security, privacy, and safety were considered. Three attack scenarios were presented in this study, each involving different aspects of security, including loss of nodes, packet sniffing, and malicious network changes. Tree structures depict attack scenarios using root nodes and leaf nodes as targets and attacker activities, which illustrate the relationship between possible system vulnerabilities and attack scenarios. Based on the likelihood of the threat occurring, the authors used this concept to analyze server threats to vSocial and risk scores. To create the tree, they use the SecurITree tool with countermeasure nodes, rate, and weights as inputs. Besides frequency rates (i.e., the number of times an attack occurs over a period of time), they also used the duration of attacks to calculate risk. Additionally, network discrepancy, packet loss, and sniffing attacks were used to attack the system, and they evaluated how these attacks impact storage, network, and VR rendering. As a result of this tree, they determined the probability of occurrence, which is a well-known measure of risk. SPS creates a risk score based on the probability of occurrence and impact of the threat. Generally speaking, the higher the risk score associated with a threat, the greater the risk. Based on the results, creating a defensive strategy for the system is made easier. Users connect to the virtual classroom using head-mounted display (HMD) devices, such as HTC Vive (HTC, New Taipei City, Taiwan) or Oculus Rift (Meta Platforms, Menlo Parks, CA, USA), through a cloud-based application hosted on the global environment for network innovations (GENI). They used Steam (an online game platform for simulation), Netlimiter (an internet traffic control tool to simulate DoS attacks), Wireshark (an open-source network protocol analyzer), and Clumsy 0.2 (a Windows-based tool for controlling network conditions). Their results indicated that any upload speed below 30 Kbps resulted in high-fidelity crashing, but the frame rate was not significantly affected. In terms of privacy, they simulated packet sniffing attacks and demonstrated that avatar information, confidential host information, and server details were completely exposed, implying that all user information could be captured and deciphered. From a safety standpoint, they demonstrated that reducing the bandwidth can result in abrupt changes in VR content. In terms of attack tree results, an ad hoc attack tree alternately under-represents system vulnerabilities, resulting in a lower risk score, where a lower risk score indicates low susceptibility to threats, necessitating stronger countermeasures. As a result, the quality of an ad hoc attack tree fails.
VRLEs were investigated from a security standpoint in [54
]. The authors developed a new framework for security and privacy by employing vSocial and a new attack-fault tree (AFT) in order to demonstrate the outcomes of cyber-attacks. AFTs can be used to model security issues such as loss of confidentiality (LoC), loss of integrity (LoI), and loss of availability (LoA) scenarios as well as privacy issues such as privacy leakage. To analyze the attack model, these attack-fault trees were converted into stochastic timed automata (STA) presentations. Finally, in a VRLE session, they demonstrated how their attack-fault tree model compounds suitable design fundamentals such as hardening, diversity, redundancy, and the principle of least privilege to ensure user safety. Through the use of AFTs, they generated graphical models of different cyber-attack/fault-attack scenarios and their corresponding consequences toward a common goal of system disruption. Their goals were as follows: using their framework, they measured cybersickness, and security/privacy attacks were evaluated in the context of cybersickness in the vSocial application; according to their results, a denial of service (DoS) attack and data leakage were the most likely causes of high levels of cybersickness in VRLE sessions. Moreover, they assessed the impact of the identified threat vectors on cybersickness levels in a social VRLE. Using the results of their previous work [53
], they modeled security, privacy, and attacks using an AFT. This AFT, which includes SPS attack scenarios and causes cybersickness, is called safety-AFT. Technical issues as factors of cybersickness, such as low bandwidth and network failure scenarios, are modeled in safety-AFT. They also used the simulation tools Clumsy 0.2 and Wireshark to perform a boundary test. VRLEs use distributed wearable devices and head-mounted displays such that the user experience is sensitive to distributed denial of service attacks. The result was obtained for a DoS attack scenario executed through packet tampering, packet duplication, and packet drop affecting the server, which showed that a packet drop can disrupt the communication between the user and VRLE server by as much as 80%. Moreover, a tamper rate of 20% can crash the VRLE server for VRLE users.
3.4. Security in Metaverse Based on Other Data
With the use of cryptocurrency in the Metaverse, stacks would be inclined to hold digital assets and conduct daily transactions in digital tokens. Since the advent of cryptocurrencies, security problems have also drastically increased. Bitcoin is a peer-to-peer cryptocurrency system for which it is very difficult to trace its transactions, leading to an increase in illegal activities such as money laundering in the Bitcoin system. According to [55
], the authors attempted to determine what combination of services actually prevents Bitcoin money laundering. A feature-based framework was introduced to identify the statistical features at three levels: networks, accounts, and transactions. The authors also described transaction models using attributed temporal heterogeneous (ATH) motifs. Furthermore, they tackled the mixed detection task as a positive and unlabeled (PU) learning problem and developed a detection model by leveraging the features that are considered. To analyze the transaction records, they created two kinds of temporal directed transaction networks: a homogeneous address–address interaction network (AAIN) and a heterogeneous transaction–address interaction network (TAIN). ATH motifs were proposed for the TAIN to analyze the complicated dynamic processes in the Bitcoin transaction network. Hybrid motifs with temporal homogeneous motifs in AAIN and ATH motifs in TAIN were employed as crucial features for detecting mixed services. The authors used three real data sets with labels from WalletExplorer, which provides label information of addresses by making transactions with some services and observing how Bitcoin flows combine. They used logistic regression (LR) as the classifier. For the training set in one stage, they selected 70% unlabeled addresses and 70% labeled addresses in order to obtain some reliable negative instances. In stage two, they used 70% reliable negative instances as well as the labeled addresses used in stage one. For the testing set, they selected the remaining 30% reliable negative instances and 30% labeled addresses to evaluate the model. They evaluated the performance of the model in terms of TPR, FPR, and geometric mean (G-Mean) in order to evaluate its classification performance in imbalanced data sets. A comparison was made between their model and the one-class support vector machine (OCSVM), the isolation forest (IF), the decision tree (DT), and the InterScore (IS). As a result, they found that OCSVM, IF, and IS, which are unsupervised anomaly detection techniques, obtained most of the positive instances but had a higher FPR than other techniques. The IS performance was significantly differentiated in different data sets. LR and DT, as supervised techniques, presented over-fitting and relatively poor performance. The results showed that, on extremely imbalanced data sets, the PU learning framework performed better for Bitcoin mixing detection, with a TPR exceeding 91% and an FPR below 4%. Despite the good performance of the proposed method, mixing service providers might have to update their methods to avoid detection. As the detection model is based on prior information, and as the data are unlabeled, detection is difficult.
In the Metaverse, extended reality is expected to enable users to have better experiences through the use of devices such as headsets, smart glasses, haptics, and so on. A key consideration in extended reality (XR) privacy and security is protecting the vast amount of data gathered by these tools, as they are vulnerable to attack. According to [56
], XR technology requires advanced human–computer interaction (HCI) devices for integration into the Metaverse. HCI devices use Internet of Things (IoT) wireless networks to communicate, in which a low-power and lossy networking protocol (RPL) is the backbone of an IPv6-based low-power and lossy network (LLN). IPv6 is a protocol that handles packets more efficiently in order to reduce the size of routing tables and increase security. An analysis of countermeasures to Sybil attacks on RPL-based networks was presented in this paper. A simulation environment with 100 nodes in 200 m of various areas was implemented using Contiki-NG and the MATLAB programming language. Additionally, two countermeasures were compared for analysis purposes. First, the Gini model—a countermeasure model based on the Gini index—was used to recognize and lessen Sybil attacks. Gini is a technique for propagating and sustaining code modifications in wireless sensor networks. In this attack, the malicious node broadcasts an destination-oriented-directed acyclic graph (DODAG) information object (DIO) with fake identities, which activates the trickle algorithm and causes the limited energy resources of genuine nodes to be depleted. On the other hand, Gini was emulated using ++OMNet, and to enable universal status awareness, a warning message was issued to every node. The second countermeasure was the ABC model, which uses a swarm-based meta-heuristic algorithm, does not have an alert system, and can identify Sybil nodes from the perspective of local nodes. Their findings indicated that the ABC model’s detection performance was inferior to that of Gini but had superior performance to Gini in terms of the average expected time. Due to the nature of Sybil assaults, detection takes time. Therefore, when designing a model, detection delay and routing stability should be equally taken into account.
While the Metaverse is dependent on many cloud technologies to function, the ability for the Metaverse to successfully operate is also related to physical world events. Edge computing is a part of the physical infrastructure that can be used to enhance the safety of the Metaverse by distributing infrastructure closer to the end user and moving the compute, store, and data processing functions to the edge, enabling improved network response times and reduced network bandwidth. In [57
], the authors presented a privacy-preserving framework for the wireless edge Metaverse. A Metaverse service provider (MSP) dedicates bandwidth to VR users in this framework, making access to the Metaverse from edge access points possible. To preserve privacy, a covert communication method (exchange of data using a covert channel) is used in the downlink. By using the “covert” definition, targeted advertising is used to promote bandwidth sales and prevent competitors from making counter-offers or invaders from interrupting services. They obtained an outstanding advertising plan with the help of the Vidale–Wolfe model and Hamiltonian function. Meta immersion, a novel metric for measuring the feelings of Metaverse users, was also introduced in this work. They considered a jamming-aided covert wireless edge Metaverse access system. To recognize the necessary bandwidth for normal-quality access, the detection error probability, downlink covert rate (CR), and uplink bit-error rate (BER) were extracted under different modulations. For system implementation, an MSP with K users was considered, where the VR users use a head-mounted display (HMD) through EAPs (employee assistance programs). The authors introduced a friendly jammer to assist in communication by actively generating jamming signals to stop the data transmission detected by a malicious supervisor. In order to determine how much bandwidth an MSP should allocate to its users, a new metric to represent user feelings in the Metaverse was proposed. The user experience and service indicators included three groups: downlink data rate (which should be highly sufficient), uplink-tracking bit-error rate, and virtual experience (SK), which is influenced by their subjective behavior, such as their activity within the Metaverse, their online time, their physical health, and so on. The results of analysis based on the bit-error rate (BER) demonstrated that the interference between EAPs comes from jamming signals. With frequency division multiplexing, these signals can be ignored. User channel conditions can greatly influence the Metaverse experience, and targeted advertising contributes to the sale of acceleration bandwidth.
Generally, a watermark appears as a logo, text, or pattern on top of another image. It makes it more difficult to copy or use the original image without permission. Clearly, current NFTs used in the metaverse are not as secure as some would have you believe. The introduction of watermark technology is a game-changer since it provides much more than just basic copy-pasting protection for your NFT; it also acts as a visual indicator of which NFTs are secure to buy. According to [58
], discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) are used for multiple watermarking in health applications. In this paper, to solve the authentication problem, three watermarks are used: medical Lump image, doctor signature/identification code, and the patient’s diagnostic information. In order to remove noise effects, they use a backpropagation neural network (BPNN). An arithmetic compression technique and a Hamming error correction code are used to encode the signs in the signature. This algorithm performs well against a variety of signal processing attacks. For increased security, multiple watermarks are placed in the system simultaneously. However, as the number of watermarks rises the peak signal-to-noise ratio (PSNR) performance suffers and the computation time grows. The watermark is compressed using a lossless arithmetic compression method. Additionally, the Hamming error-correcting algorithm is utilized to secure the watermark of the doctor’s signature and to lessen distortion. Furthermore, error correcting Hamming code (Hamming ECC) is employed to make up for the Bit Error Rate (BER) decline in the text watermark. The experiment’s findings demonstrate that in the absence of an attack, the PSNR and Normalized cross-correlation (NC) values without BPNN are 43.88 and 0.9344, respectively and With BPNN, the value of NC is equal to 0.9547 (considering the gain factor of 0.01). NC is equal to 0.9888 with BPNN (assuming a gain factor of 0.08). In the absence of the BPNN algorithm, the highest NC value is obtained, which is 0.9852. A Median filtering attack yields the lowest NC value at 0.0123 and a CROP attack yields the highest BER at 47.619% and 44.7% for Symptoms and Signature watermarks, respectively. Based on these results, DTC, DWT, and SVD can be combined to achieve the best performance.