Cross-Perspective Human Behavior Recognition Based on a Joint Sparse Representation and Distributed Adaptation Algorithm Combined with Wireless Optical Transmission

Yu, Xiaomo; Long, Long; Ou, Yang; Zhou, Xiaomeng

doi:10.3390/electronics12091980

Open AccessArticle

Cross-Perspective Human Behavior Recognition Based on a Joint Sparse Representation and Distributed Adaptation Algorithm Combined with Wireless Optical Transmission

by

Xiaomo Yu

^1,2

,

Long Long

^1,3,*,

Yang Ou

² and

Xiaomeng Zhou

²

¹

Guangxi Key Laboratory of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, Nanning 530001, China

²

Department of Logistics Management and Engineering, Nanning Normal University, Nanning 530001, China

³

College of Computer Science and Information Engineering, Nanning Normal University, Nanning 530001, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(9), 1980; https://doi.org/10.3390/electronics12091980

Submission received: 3 March 2023 / Revised: 19 April 2023 / Accepted: 21 April 2023 / Published: 24 April 2023

(This article belongs to the Special Issue Smart Electronics, Energy, and IoT Infrastructures for Smart Cities)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Traditional human behavior recognition needs many training samples. Signal transmission of images and videos via visible light in the body is crucial for detecting specific actions to accelerate behavioral recognition. Joint sparse representation techniques improve identification accuracy by utilizing multi-perspective information, while distributional adaptive techniques enhance robustness by adjusting feature distributions between different perspectives. Combining both techniques enhances recognition accuracy and robustness, enabling efficient behavior recognition in complex environments with multiple perspectives. In this paper, joint sparse representation has been combined with distributed adaptation algorithm to recognize human behavior under the fusion algorithm, and verify the feasibility of the fusion algorithm through experimental analysis. The research objective of this article is to explore the use of the combination of joint sparse representation technology and distributed adaptive technology in the recall and accuracy of human detection, combined with the cross perspective human behavior recognition of wireless optical transmission. The experimental results showed that in the process of human detection, the recall and precision of the fusion algorithm in this paper reached 92% and 90% respectively, which are slightly higher than the comparison algorithm. In the experiment of recognition accuracy of different actions, the recognition accuracy of the fusion algorithm in this paper was also higher than that of the control algorithm. It can be seen that the fusion of joint sparse representation and distributed adaptation algorithms, as well as wireless communication light technology, are of great significance for human behavior recognition.

Keywords:

human behavior recognition; wireless communication; joint sparse representation; fusion algorithm

1. Introduction

Cross-perspective human behavior recognition refers to the identification of human behavior through wireless communication light technology, which is not limited by the perspective. This technology can be achieved through a wireless communication optical sensor network (WOSN). WOSN is a sensor network based on wireless optical communication technology, consisting of multiple optical sensor nodes, capable of sensing objects and human behavior in the environment in real time [1]. Through the communication between the optical sensor nodes, the perceived information can be transmitted to the base station for processing and analysis, so as to optimize the efficiency of the identification. Traditional methods of human behaviour recognition focus on recognizing training objects through a single vision. When the visual difference is too large, the recognition results and the expected results are very different. In cross perspective human behavior recognition, the visual direction is often changeable, and both the direction of the human body and the position of the camera change. In this case, behavior detection through a single vision cannot achieve the desired effect. In order to solve the problem of cross perspective human behaviour recognition, there is a need to reduce data discrepancies between different modalities and to make effective use of the available modal data to improve recognition accuracy. The focus of this article is to explore the use of wireless communication optical technology to achieve signal data transmission of video images, aiming to improve the recognition performance of video images, in response to the motion errors generated by video images captured under different camera vision.

With the development of computer vision, people have more and more focus on human behavior recognition. Scholar introduced human behavior recognition using wireless communication technology channel state information, and proposed two new schemes, providing insights for the study of human behavior recognition [2]. Research worker studied the latest progress of channel state information in indoor human behavior recognition, and found that the movement of human body parts can lead to changes in wireless signal reflection, which may result changes in channel state information [3]. Investigator believed that the latest development of wireless technology has realized a new paradigm of behavior recognition, which can identify behavior in a device free and non-invasive way [4]. Yet another researcher believed that facial expression analysis is a prominent clue to determine human behavior. The occurrence matrix obtained from facial recognition and expression was used to predict human behavior, proving better performance in facial expression recognition and human behavior understanding [5]. However, the methods used in these studies are too traditional and not convincing enough.

Through the joint sparse representation and distribution adaptation method, the human behavior recognition process can be optimized. For this, the following scholars have conducted different studies. In the subject of human behavior recognition, researcher proposed a new sparse coding method to obtain the expressive sparse representation of video sequences [6]. Another researcher believed that human behavior recognition is essential for many practical applications, and proposed a new framework to project real world 2D video into view invariant sparse representation [7]. Fellow proposed a structured sparse learning machine based on the feature structure to obtain better classification performance [8]. Investigator believed that it is challenging to recognize human behavior in different views, and solved this problem by using a new method of joint sparse representation and distribution adaptation to hierarchically learn view invariant representation [9]. However, due to the lack of data sources, the above research is still in the theoretical stage and has no practicality.

In this paper, the cross perspective human behavior technology is analyzed by combining joint sparse representation and distributed adaptation algorithm with wireless communication optical technology. In the analysis of optical wireless communication methods, it is found that the optical efficiency of the field mirror as the receiving antenna is the highest, reaching 92%, and the optical performance such as optical signal-to-noise ratio and gain also meet the index requirements. In order to make the optical antenna have higher gain, it is necessary to continuously zoom the optical antenna to improve the performance of the optical wireless communication system. In the study of the fusion algorithm, the recall and precision of the fusion algorithm have reached 92% and 90%, respectively. In addition, in the recognition accuracy experiment of the algorithm, it is found that the average recognition accuracy of the fusion algorithm has reached 98.28%, while the average recognition accuracy of the two comparison algorithms has only reached 95.92% and 93.88%. This showed that the fusion algorithm can effectively improve the human behavior recognition ability.

2. Cross Perspective Human Behavior Recognition Method Design in Wireless Communication Optical Technology

2.1. Human Behavior Recognition

Human behavior recognition belongs to the research focus of computer vision, which refers to the process of analyzing video or image data collected by cameras and other devices and learning human actions [10]. At present, human behavior recognition is widely used in various fields [11]. The application scenario of human behavior recognition is shown in Figure 1.

The human behavior recognition process can mark the image or video data to affect the feature extraction process, while the feature processing and analysis process of the image or video are based on the feature extraction [12]. Traditional human behavior recognition methods mainly use design features to represent human behavior, and conduct various recognition processes according to different classification methods. Depending on the feature extraction method, traditional human behaviour recognition can be divided into overall and local representation methods [13]. The classification of traditional human behavior recognition is shown in Figure 2.

Traditional human behavior recognition can be mainly divided into two types of representation methods: overall and local [14]. Overall representation methods use either the silhouette method or the human joint points method. The silhouette method is characterized by simple key areas, rich information, and strong descriptive ability, and is primarily used for recognizing human behavior in simple backgrounds with low human obstruction. On the other hand, the human joint points method does not require extracting a large number of pixels or the human model, but it is sensitive to light and shooting angles and requires complex calculations. Local representation techniques include the spatiotemporal interest point method and the motion trajectory method. The spatiotemporal interest point method has a high degree of automation and adaptability to different scenes, does not require background pruning, and can be used for analyzing complex backgrounds. Meanwhile, the motion trajectory method is robust and can ignore background interference, preserving complete human behavior information [15,16].

2.2. Design of Cross Perspective Human Behavior Recognition Method

Human behavior recognition is primarily a classification and recognition process of captured images, which requires careful image processing. High standards for accuracy and clarity must be met when processing images, and different processing programs should be designed to achieve diverse processing methods. However, the enormous amount of information contained in images renders conventional image processing methods inadequate [17,18,19,20]. Meeting these requirements simultaneously is challenging, and thus this paper combines joint sparse representation algorithms with joint distribution adaptation algorithms as an attempt to improve the recognition of human behavior images.

Due to the large visual difference in human behavior recognition from multiple perspectives, the recognition performance may be degraded. It is necessary to learn the respective mapping for the source perspective and target perspective and map these perspectives to their respective subspaces in order to solve this problem, so that the learned representation can adapt to the change of perspective, thus establishing a joint sparse representation and distribution adaptation perspective invariant representation learning algorithm. The algorithm includes three stages. The first stage is shared feature learning. It is necessary to balance the information migration between different vision systems, and achieve richer feature representation through information sharing, so as to improve the accuracy of recognition. The second stage is transfer dictionary learning, that is, to learn their own dictionaries for training and testing perspectives respectively and to make the corresponding dictionaries get sparse representation of training and testing perspectives respectively. The third stage is unsupervised distribution adaptation. In order to reduce the differences between different perspectives, the edge and condition distribution differences between perspectives are reduced by distribution adaptation, and the domain differences between views are also reduced. The specific procedure of the joint sparse representation and distribution adaptation algorithm is as follows:

2.2.1. Joint Sparse Representation

The essence of sparse representation is to represent target samples by a linear combination of target sample

y

. The mathematical model representing human behavior recognition is:

y = Ax

(1)

In Formula (1), a₁, a₂, …an are row vectors themselves;

A = [\begin{matrix} a_{1} \\ a_{2} \\ a_{n} \end{matrix}]

is a matrix, which represents

n

training sample images of all kinds;

x = {[α_{1}, α_{2}, \dots, α_{n}]}^{T}

, which represents the coefficient corresponding to each training sample image. All training sample images are converted to column vectors. Assuming that there are

n

training sample images

X = [x_{1}, x_{2}, \dots, x_{n}]

of

k

categories, where

x_{i} \in R^{m}

, the data set can be expressed as:

W = [W_{1}, W_{2}, \dots, W_{n}] \in R^{m \times n}

(2)

W_{i} = [\begin{matrix} v_{i, 1} \\ v_{i, 2} \\ v_{i, m} \end{matrix}]

(3)

In Formula (3),

W_{i}

is a column vector.

v_{i, j}

means the

j

-th sample image of category

i

, and

n_{i}

represents the number of sample images of category

i

. According to the sparse representation theory, target sample

y

can be ideally represented linearly by training sample

W_{i}

, then Formula (4) can be obtained:

y = [\begin{matrix} a_{i, 1} v_{i, 1} \\ a_{i, 2} v_{i, 2} \\ \dots \\ a_{i, n_{i}} v_{i, n} \end{matrix}]

(4)

In Formula (4),

a_{i, j} \in R

is the sparse representation coefficient of

y

, and

j = 1, 2, \dots, n

. After removing the image noise, the above formula can be rewritten as follows:

y = Wx

(5)

To get the most sparse

x

, it is necessary to solve the problem of minimizing

l_{1}

norm:

x' = {argmin ∥ x ∥}_{1} s . t . Wx = y

(6)

In Formula (6),

x'

represents the thinnest

x

reconstructed, and

{∥ x ∥}_{1} = \sum_{i} |x_{i}|

represents the

l_{1}

norm. For each category, mapping function

δ_{i} (x') = |0, \dots, 0, x_{i}, 0, \dots, 0|

is constructed, and target sample

y

is assigned to the class with the minimum residual error:

class (i) = argminy ∥ y - {W δ}_{i} {(x') ∥}_{2}

(7)

In Formula (7),

∥ y - {W δ}_{i} {(x') ∥}_{2}

means

l_{2}

norm, and

i = 1, 2, \dots, k

.

In the process of classification and recognition, it is also necessary to deal with the problems of dictionary construction and sparse solution. For this, a joint sparse model needs to be introduced, through which multiple relevant signals can be processed to achieve better recognition results. The identification process through the joint sparse model is shown in Figure 3.

In the joint sparse model, for the test image, only public features and private features need to be extracted for reconstruction and recognition [21]. Assuming that all sample images are divided into

I

categories, and each category has

J

samples, the

j

-th sample of the

i

-th category is

y_{i, j}

, and then Formula (8) can be got:

y_{i, j} = z_{i}^{c} + z_{i, j}^{i}

(8)

In Formula (8),

z_{i}^{c}

represents the public feature, and

z_{i, j}^{i}

represents the private feature. All features are combined to build a joint feature dictionary:

D = |z_{1}^{c}, z_{2}^{c}, \dots, z_{k}^{c}, z_{1, 1}^{i}, \dots, z_{1, j}^{i}, z_{2, 1}^{i}, \dots, z_{k, 1}^{i}, \dots z_{k, j}^{i}|

(9)

2.2.2. Joint Distribution Adaptation

In the process of combining image features, joint distribution adaptation algorithm can be combined [22]. In the joint distribution adaptation algorithm, it can be assumed that the feature space and label space of the source domain and target domain are consistent. Because their edge probability and conditional probability are different, they can be adjusted in the dimension reduction process, that is, to find a transformation matrix

A

to make their edge probability and conditional probability distributions close. The specific process is as follows:

Assuming that

X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{m \times n}

;

H

are the centralization matrix; the covariance matrix is

{XHX}^{T}

, and

A

represents the transformation matrix, then there is Formula (10):

{maxA}^{T} A = tr (A^{T} {XHX}^{T} A)

(10)

The vectors corresponding to the

k

largest eigenvalues in

{XHX}^{T}

are taken to form an orthogonal matrix. Setting the mapped data as

Z

, then Formula (11) can be obtained:

Z = A^{T} X, Z \in R^{m \times k}

(11)

The distribution difference between source domain data and target domain data is minimized:

\min ∥ \frac{1}{n} \sum_{i = 1}^{n} A^{T} x_{i} - \frac{1}{n} \sum_{j = 1}^{n} A^{T} x_{j} ∥^{2} = mintr (A^{T} {XHX}^{T} A)

(12)

x_{i}

is a trainable parameter.

A^{T}

is the output of each step. With the above formula, the difference in edge probability between mapped source domain data and target domain data can be reduced. When calculating the conditional probability, it is necessary to obtain the data label of the target domain, so as to obtain the conditional probability distribution of the target domain data. Any category

c

of data label

\{1, 2, \dots, C\}

is taken to calculate the conditional probability distribution distance between the source domain and target domain on this category. The expression is:

\sum_{c = 1}^{C} ∥ \frac{1}{n} \sum_{i = 1} A^{T} x_{i} - \frac{1}{n} \sum_{j = 1} A^{T} x_{j} ∥^{2} = \sum_{c = 1}^{C} tr (A^{T} {XHX}^{T} A)

(13)

The final objective function is:

{minA}^{T} {XHX}^{T} A = \sum_{c = 1}^{C} tr (A^{T} {XHX}^{T} A) + λ tr (A^{T} A)

(14)

\sum_{c = 1}^{C} tr (A^{T} {XHX}^{T} A)

is the kernel matrix;

λ tr (A^{T} A)

is the sigmoid function;

{minA}^{T} {XHX}^{T} A

is the optimal value for the optimization problem. According to the above formula, the final transformation matrix

A

can be obtained, so as to reduce the difference between the source domain features and the target domain features, and the final recognition result of the human behavior image can be obtained.

The principle of human behavior recognition method based on joint sparse representation and joint distribution adaptation is as follows: The wireless communication optical technology is used to collect a variety of signals of human behavior, and some of the signals are labeled, so as to divide the collected signal data into original data and label signal data. Feature extraction of the two kinds of signals is carried out respectively, and signal image classification and recognition are carried out on this basis. The human behavior recognition process on the basis of joint sparse representation and joint distribution adaptation is shown in Figure 4.

3. Experimental on Cross Perspective Human Behavior Recognition

3.1. Methods of Optical Wireless Communication

Compared with optical fiber technology, optical wireless communication technology does not require wiring, and can avoid the cost and workload of laying optical cable in buildings or areas with complex terrain. Moreover, optical wireless communication devices are easy to deploy and can be quickly installed on buildings, cable poles or other high sites, thus enabling fast broadband access. While recognizing human behaviors, specific propagation signals can be formed at the human body through wireless communication optical technology, and specific behaviors can be identified according to the changes of signal propagation. Therefore, the analysis of optical wireless communication is extremely important. The signal transmission of optical wireless communication depends on the receiving end. Different receiving ends have different effects on the reception of specific propagation signals, as shown in Table 1.

Table 1 shows that the current receiver system mainly exists in the form of small receiving angle and directional reception. Due to the problems of spatial light signal coverage and signal processing in the process of human behavior image recognition, the performance of the docking receiving system is required to be high. However, in the actual communication process, when the communication distance changes, the detected signal may change, which can indirectly affect the reception efficiency, thus affecting the communication performance of the system.

In order to obtain better communication performance, the receiving antenna of wireless communication needs to be further analyzed. Three different receiving antennas were selected for comparison. Table 2 shows the comparison results.

According to the comparison data in Table 2, the optical efficiency of the field mirror as the receiving antenna was the highest, reaching 92%. Its distribution uniformity was the best, and the optical performance such as optical signal-to-noise ratio and gain also met the index requirements. Therefore, the field mirror was selected as the receiving antenna of the receiving end of optical wireless communication.

3.2. Comparison of Algorithms

In this paper, the joint sparse representation algorithm and the joint distributed adaptation algorithm were fused to realize the research on human behavior recognition. In order to evaluate the detection performance of the method in this paper for human behavior images, target detection data sets were collected according to the experimental requirements, and the detection targets included non occluded and partially occluded human bodies. It was necessary to label the partially occluded human body image to improve the detection effect after mapping so as to verify the mapping effect. At the same time, human behavior recognition based on joint sparse representation model (JSM) algorithm and human behavior recognition based on joint distribution adaptation (JDA) algorithm were selected for comparison to verify the feasibility of the fusion algorithm in this paper.

3.2.1. Recall and Precision

In order to prove that the fusion algorithm in this paper is helpful to improve the performance of target detection, the fusion algorithm in this paper and the two methods in the control group were used for multi view human detection, and the recall and precision were counted for performance evaluation. If the coincidence degree between the human body area and the real human body area located by the detection model under different perspectives was higher than 50%, it meant that the detection model can accurately locate the target human body area. The test results are shown in Figure 5.

It can be seen from Figure 5 that in the process of human body area detection, even if the detection target had significant visual changes and occlusion when establishing the experimental dataset, the detection effect of the target could still meet the detection requirements. Among them, the recall and precision of JSM algorithm were 78% and 80% respectively; the recall and precision of JDA algorithm were 83% and 69% respectively; the recall and precision of the fusion algorithm were 92% and 90% respectively. It can be found that the fusion algorithm in this paper can effectively improve the recall and precision of target detection. Moreover, JDA algorithm could maintain a high recall, but the precision was relatively low when detecting objects.

3.2.2. Running Time

Human behavior recognition requires not only image detection, but also video image detection. In this regard, this paper selected a segment of target image for experiment, and tested the running time of various algorithms of target image under different training samples. The experimental results are shown in Figure 6.

It can be seen from Figure 6 that with the increase of the proportion of training samples, the running time of various algorithms on the target image increased gradually. Among them, the running time of the fusion algorithm in this paper was 7.8 s, 13.8 s, 22.5 s and 30.1 s respectively when the training samples were 10%, 20%, 30% and 40%, which was obviously less than the other two algorithms, indicating the superiority of the fusion algorithm in this paper.

3.2.3. Training Speed

Since the number of training samples had a greater impact on the performance of various algorithms, in order to increase the number of training samples so that various algorithms can achieve adequate training detection, different methods were used to expand the number of samples in this experimental study. This paper adopted two groups of experiments to analyze the algorithm of target image, and compared the training speed of JSM algorithm and the fusion algorithm in this paper. The training speed comparison diagram is shown in Figure 7.

It can be seen from Figure 7A that it took 400 s for JSM algorithm to train the detection target, while it can be seen from Figure 7B that the fusion algorithm in this paper took no more than 300 s to train the detection target. The experiment showed that the fusion of JSM algorithm and JDA algorithm can effectively improve the training speed of target image. The difference in LOS values between JSM algorithm training detection and fusion algorithm training detection in this article is not significant, less than 1.2. At a training time of 400 s, the JSM algorithm trained and detected a LOS value of 0, while the fusion algorithm trained and detected a LOS value of 0 at a training time of approximately 275 s. Moreover, the fluctuation amplitude detected by the fusion algorithm training in this article is greater than that detected by the JSM algorithm training.

3.2.4. Recognition Accuracy of Different Actions

This experiment was to verify the recognition accuracy of different algorithms for different human actions. For this, some actions were selected as research objects in the human behavior database. The experimental results are shown in Table 3.

Table 3 shows that each algorithm could achieve 100% accuracy in human behavior recognition of simple actions such as nodding, shaking and standing. However, with the complexity of human behavior, the recognition accuracy of JSM algorithm and JDA algorithm was lower than that of the fusion algorithm in this paper. Among them, the average recognition accuracy of JSM algorithm was 95.92%; the average recognition accuracy of JDA algorithm was 93.88%, and the average recognition accuracy of the fusion algorithm in this paper was 98.28%. It can be seen that the accuracy of human behavior recognition can be improved through the fusion algorithm.

In conclusion, the fusion algorithm proposed in this paper has more powerful object detection ability, and at the same time, it has smaller computational running time and higher recognition accuracy of different actions, which fully demonstrates the excellence of the algorithm.

4. Conclusions

This article combines wireless communication optical technology with sensor devices, which can be applied in many fields. This article combines the current status of cross perspective human behavior recognition technology and applies it to cross perspective human behavior recognition with great significance. In order to explore methods for human behavior recognition, this article combines joint sparse representation with distributed adaptive algorithms to provide guarantees for cross perspective human behavior recognition. By combining sparse models, this paper designs a reasonable recognition process. Based on experimental analysis, this article combines joint sparse representation with distributed adaptive algorithms to improve the recall and accuracy of object detection, as well as the running time and training speed of training samples. Therefore, the importance of cross perspective human behavior recognition based on joint sparse representation and distribution adaptation can be verified. However, there are still some issues that need further research in this article. The sparse representation algorithm is more sensitive to noise and abnormal data, and may have overfitting or under fitting problems, which requires better regularization or parameter adjustment. In addition, high computational complexity requires optimizing algorithms to run more effectively in practical applications. In summary, the algorithms and ideas proposed in this article have their importance and value, but further improvement and refinement are needed to better apply them to fields such as human behavior recognition.

Author Contributions

X.Y. and L.L.: writing original draft; Y.O. and X.Z.: the acquisition, analysis, and interpretation of data for the work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No.62066032); Natural Science Foundation of Guangxi Province (No.2021GXNSFAA075019); The “Fourteenth Five Year Plan” of Guangxi Education and Science Annual Project in 2023 (No.2023A028); The “Fourteenth Five Year Plan” of Guangxi Education and Science special project of college innovation and entrepreneurship education (No.2022ZJY2727); Middle-aged and Young Teachers’ Basic Ability of Scientific Research Promotion Project of Guangxi (No.2021KY0130); Philosophy and Social Science Foundation of Guangxi (No.21FYJ041); Higher Education Undergraduate Teaching Reform Project of Guangxi (No.2021JGA243 and No.2022JGB175). The Open Research Fund of Guangxi Key Lab of Human-machine Interaction and Intelligent Decision (No.GXHIID2213). This study acknowledge the support of National First-class Undergraduate Major-The Major of Logistics Management, Demonstrative Modern Industrial School of Guangxi University-Smart Logistics Industry School Construction Project, the Logistics Engineering Innovation Laboratory, Logistics Engineering Technology Laboratory and Smart Logistics Exhibition Center of Nanning Normal University. The authors gratefully acknowledge the support of Construction project of Practice conditions and practice Base for industry-university cooperation of the Ministry of Education (No.202102079139).

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yücel, M.; Açikgöz, M. Optical Communication Infrastructure in New Generation Mobile Networks. Fiber Integr. Opt. 2023, 42, 53–92. [Google Scholar] [CrossRef]
Chen, L.; Chen, X.; Ni, L.; Peng, Y.; Fang, D. Human Behavior Recognition Using Wi-Fi CSI: Challenges and Opportunities. IEEE Commun. Mag. 2017, 55, 112–117. [Google Scholar] [CrossRef]
Yousefi, S.; Narui, H.; Dayal, S.; Ermon, S.; Valaee, S. A Survey on Behavior Recognition Using WiFi Channel State Information. IEEE Commun. Mag. 2017, 55, 98–104. [Google Scholar] [CrossRef]
Wang, Z.; Guo, B.; Yu, Z.; Zhou, X. Wi-Fi CSI-Based Behavior Recognition: From Signals and Actions to Activities. IEEE Commun. Mag. 2018, 56, 109–115. [Google Scholar] [CrossRef]
Sajjad, M.; Zahir, S.; Ullah, A.; Akhtar, Z.; Muhammad, K. Human Behavior Understanding in Big Multimedia Data Using CNN based Facial Expression Recognition. Mob. Netw. Appl. 2020, 25, 1611–1621. [Google Scholar] [CrossRef]
Tian, Y.; Kong, Y.; Ruan, Q.; An, G.; Fu, Y. Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition. IEEE Trans. Image Process. 2017, 27, 1748–1762. [Google Scholar] [CrossRef]
Zhang, J.; Shum, H.P.H.; Han, J.; Shao, L. Action Recognition From Arbitrary Views Using Transferable Dictionary Learning. IEEE Trans. Image Process. 2018, 27, 4709–4723. [Google Scholar] [CrossRef]
Shahroudy, A.; Ng, T.-T.; Gong, Y.; Wang, G. Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1045–1058. [Google Scholar] [CrossRef]
Liu, Y.; Lu, Z.; Li, J.; Yang, T. Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 2416–2430. [Google Scholar] [CrossRef]
Qu, J.; Qiao, N.; Shi, H.; Su, C.; Razi, A. Convolutional neural network for human behavior recognition based on smart bracelet. J. Intell. Fuzzy Syst. 2020, 38, 5615–5626. [Google Scholar] [CrossRef]
Gao, Z.; Xuan, H.-Z.; Zhang, H.; Wan, S.; Choo, K.-K.R. Adaptive Fusion and Category-Level Dictionary Learning Model for Multiview Human Action Recognition. IEEE Internet Things J. 2019, 6, 9280–9293. [Google Scholar] [CrossRef]
Dai, C.; Liu, X.; Lai, J.; Li, P.; Chao, H.-C. Human Behavior Deep Recognition Architecture for Smart City Applications in the 5G Environment. IEEE Netw. 2019, 33, 206–211. [Google Scholar] [CrossRef]
Wang, L. Three-dimensional convolutional restricted Boltzmann machine for human behavior recognition from RGB-D video. EURASIP J. Image Video Process. 2018, 2018, 120. [Google Scholar] [CrossRef]
Zheng, B.; Yun, D.; Liang, Y. Research on behavior recognition based on feature fusion of automatic coder and recurrent neural network. J. Intell. Fuzzy Syst. 2020, 39, 8927–8935. [Google Scholar] [CrossRef]
Saleem, G.; Bajwa, U.I.; Raza, R.H. Toward human activity recognition: A survey. Neural Comput. Appl. 2023, 35, 4145–4182. [Google Scholar] [CrossRef]
Kamel, A.; Sheng, B.; Yang, P.; Li, P.; Shen, R.; Feng, D.D. Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures. IEEE Trans. Syst. Man Cybern. Syst. 2018, 49, 1806–1819. [Google Scholar] [CrossRef]
Al-Kinani, A.; Wang, C.-X.; Zhou, L.; Zhang, W. Optical Wireless Communication Channel Measurements and Models. IEEE Commun. Surv. Tutor. 2018, 20, 1939–1962. [Google Scholar] [CrossRef]
Menaka, D.; Gauni, S.; Manimegalai, C.T.; Kalimuthu, K. Vision of IoUT: Advances and future trends in optical wireless communication. J. Opt. 2021, 50, 439–452. [Google Scholar] [CrossRef]
Maier, A.; Syben, C.; Lasser, T.; Riess, C. A gentle introduction to deep learning in medical image processing. Z. Für Med. Phys. 2019, 29, 86–101. [Google Scholar] [CrossRef]
Wiley, V.; Lucas, T. Computer vision and image processing: A paper review. Int. J. Artif. Intell. Res. 2018, 2, 29–36. [Google Scholar] [CrossRef]
Peng, J.; Sun, W.; Du, Q. Self-Paced Joint Sparse Representation for the Classification of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1183–1194. [Google Scholar] [CrossRef]
Vishwakarma, R.; Monani, R.; Hedayatipour, A.; Rezaei, A. Reliable and Secure Memristor-based Chaotic Communication Against Eavesdroppers and Untrusted Foundries. Discov. Internet Things 2023, 3, 2. [Google Scholar] [CrossRef]

Figure 1. Application scenario of human behavior recognition.

Figure 2. Classification of traditional human behavior recognition.

Figure 3. Identification process.

Figure 4. Human behavior recognition process.

Figure 5. Recall ratio and precision ratio of different algorithms.

Figure 6. Running time of different algorithms.

Figure 7. Comparison of training speed. (A). Training speed curve of JSM algorithm; (B). Training speed curve of fusion algorithm in this paper.

Table 1. Receiver of optical wireless communication.

	Conventional Receiver	2D Image Sensor Receiving Terminal	Compound Eye Structure Receiving Terminal
Field angle	60°	90°	180°
Gain	13	20	19
Communication distance	31.6 m	189.5 m	45 m
Detection range	directional	directional	omnidirectional

Table 2. Comparison of performance of different receiving antennas.

	Field Mirror	Immersion Lens	Light Cone
Optical efficiency	92%	87%	83%
Gain	22.1	12.5	11.4
Coefficient	7.8	8.0	7.1
Optical signal-to-noise ratio	44.2	38.6	35.4
Uniformity	good	common	poor

Table 3. Recognition accuracy of different algorithms for each action.

	JSM Algorithm	JDA Algorithm	Fusion Algorithm
having dinner	90.4%	89.2%	95.2%
applause	93.1%	90.7%	96.2%
nodding	100%	100%	100%
phoning	95.7%	92.1%	99.8%
shaking head	100%	100%	100%
waving	98.2%	95.2%	100%
embracing	96.2%	93.1%	99.6%
falling	93.5%	89.5%	96.8%
standing up	100%	100%	100%
running	92.1%	89.0%	95.2%
average accuracy	95.92%	93.88%	98.28%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, X.; Long, L.; Ou, Y.; Zhou, X. Cross-Perspective Human Behavior Recognition Based on a Joint Sparse Representation and Distributed Adaptation Algorithm Combined with Wireless Optical Transmission. Electronics 2023, 12, 1980. https://doi.org/10.3390/electronics12091980

AMA Style

Yu X, Long L, Ou Y, Zhou X. Cross-Perspective Human Behavior Recognition Based on a Joint Sparse Representation and Distributed Adaptation Algorithm Combined with Wireless Optical Transmission. Electronics. 2023; 12(9):1980. https://doi.org/10.3390/electronics12091980

Chicago/Turabian Style

Yu, Xiaomo, Long Long, Yang Ou, and Xiaomeng Zhou. 2023. "Cross-Perspective Human Behavior Recognition Based on a Joint Sparse Representation and Distributed Adaptation Algorithm Combined with Wireless Optical Transmission" Electronics 12, no. 9: 1980. https://doi.org/10.3390/electronics12091980

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Perspective Human Behavior Recognition Based on a Joint Sparse Representation and Distributed Adaptation Algorithm Combined with Wireless Optical Transmission

Abstract

1. Introduction

2. Cross Perspective Human Behavior Recognition Method Design in Wireless Communication Optical Technology

2.1. Human Behavior Recognition

2.2. Design of Cross Perspective Human Behavior Recognition Method

2.2.1. Joint Sparse Representation

2.2.2. Joint Distribution Adaptation

3. Experimental on Cross Perspective Human Behavior Recognition

3.1. Methods of Optical Wireless Communication

3.2. Comparison of Algorithms

3.2.1. Recall and Precision

3.2.2. Running Time

3.2.3. Training Speed

3.2.4. Recognition Accuracy of Different Actions

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI