SM-TCNNET: A High-Performance Method for Detecting Human Activity Using WiFi Signals

Li, Tianci; Gao, Sicong; Zhu, Yanju; Gao, Zhiwei; Zhao, Zihan; Che, Yinghua; Xia, Tian

doi:10.3390/app13116443

Open AccessArticle

SM-TCNNET: A High-Performance Method for Detecting Human Activity Using WiFi Signals

by

Tianci Li

¹

,

Sicong Gao

²,

Yanju Zhu

^1,3,*,

Zhiwei Gao

^1,3,

Zihan Zhao

¹,

Yinghua Che

¹ and

Tian Xia

¹

School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang 050043, China

²

School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 1466, Australia

³

Hebei Key Laboratory of Electromagnetic Environmental Effects and Information Processing, Shijiazhuang Tiedao University, Shijiazhuang 050043, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6443; https://doi.org/10.3390/app13116443

Submission received: 16 April 2023 / Revised: 8 May 2023 / Accepted: 23 May 2023 / Published: 25 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

Human activity recognition (HAR) is an important research area with a wide range of application scenarios, such as smart homes, healthcare, abnormal behavior detection, etc. Wearable sensors, computer vision, radar, and other technologies are commonly used to detect human activity. However, they are severely limited by issues such as cost, lighting, context, and privacy. Therefore, this paper explores a high-performance method of using channel state information (CSI) to identify human activities, which is a deep learning-based spatial module-temporal convolutional network (SM-TCNNET) model. The model consists of a spatial feature extraction module and a temporal convolutional network (TCN) that can extract the spatiotemporal features in CSI signals well. In this paper, extensive experiments are conducted on the self-picked dataset and the public dataset (StanWiFi), and the results show that the accuracy reaches 99.93% and 99.80%, respectively. Compared with the existing methods, the recognition accuracy of the SM-TCNNET model proposed in this paper is improved by 1.8%.

Keywords:

human activity recognition (HAR); channel state information (CSI); deep learning; temporal convolutional networks (TCN)

1. Introduction

Human activity recognition (HAR) is the process of automatic detection and classification of human motion states using sensor data, machine learning, and pattern recognition. In recent years, HAR has been applied in many different fields, such as smart homes [1], healthcare [2] (monitoring the health status of elderly and disabled people), and security [3] (biometric or face recognition). Traditional human motion recognition methods include computer vision [4], wearable sensors [5], and radar [6].

For the application of computer vision in human action recognition, Qi et al. [7] proposed a deep significant data-guided framework for recognizing 12 complex daily activities. Janardhanan et al. [8] proposed a deep learning neural network technique based on deep divisible convolution (DSC) and bidirectional long- and short-term memory (DSC-blstm). Aggarwal et al. [9] implemented a virtual mouse and keyboard with gesture recognition using computer vision techniques. Mujahid et al. [10] proposed a CNN-LSTM model for gesture recognition by using visual data processing and feature extraction. Adama et al. [11] reported that incorporation of transfer learning (TL) into vision-based HAR can improve the recognition performance of existing systems. However, there are still some limitations in the practical application of vision-based human activity recognition, such as not being friendly enough to user privacy and susceptibility to camera occlusion problems.

For wearable sensors for human action recognition, Lee et al. [12] integrated a human activity recognition system into outdoor clothing to solve the human activity recognition problem in outdoor terrain. Gao et al. [13] proposed an effective bilinear spatio-temporal attention network (Bi-STAN) that adaptively focuses on the important content and location in the original data, and the results performed well on three public datasets and on their own real datasets. Sun et al. [14] proposed an attentional long- and short-term memory (LSTM)-based network for wearable human activity recognition. Rosati et al. [15] had volunteers wear a MIMU-based device to perform seven different daily activities. The results showed that wear-based human activity recognition is feasible. Janarthanan et al. [16] proposed a deep learning assisted reconstruction encoder (UDR-RC) to optimize the data during preprocessing of wearable sensors on nodes to minimize the computation time of 11.25 ns for the test set size. These papers pointed out that there are two main limitations of wearable sensors for human action recognition: first, the device must be carried on the body, which is a poor user experience; second, the relatively high cost of the device makes it unsuitable for generalization.

Radar technology can be used for human action recognition. Yao et al. [17] used high-dimensional computing (HDC) for the first time in a radar system to recognize human activities and achieved 92.93% accuracy for six human activities using HDC. Cao et al. [18] constructed a source hypothesis migration learning system for building environmental adaptation mechanisms for cross-environment radar HAR. Chen et al. [19] proposed a time-domain three-dimensional convolutional neural network (3DCNN) with the addition of a temporal attention module to emphasize the sequential relationship between frames. Radhityo et al. [20] successfully detected three types of motion changes in people using FMCW radar. Janardhana et al. [21] combined radar and cameras to analyze time-frequency images and images of objects in order to classify objects and the activities they are performing. These papers pointed out that radar-based human activity recognition required a large amount of human and material resources for data acquisition, which was difficult.

Compared with the above methods, several studies [22,23,24,25] have shown that WiFi-based human action recognition has been favored by many researchers in recent years due to many advantages such as wide coverage, strong privacy protection, low price, the ability to detect non-visual distance, and no need to wear or touch. The WiFi-based human action recognition mainly relies on two signals: (1) the received signal strength indicator (RSSI); and (2) channel state information (CSI). The RSSI signal strength is more influenced by interference and multipath effects. In contrast, CSI is a fine-grained signal and has been shown to outperform RSSI signals in complex environments for applications such as micro-motion detection [26], pose estimation [27], and handwriting recognition [28]. Hao et al. [29] used a bidirectional long- and short-term memory attention mechanism (ABLSTM) to train an action model for human action recognition. Cheng et al. [30] proposed a dynamic antenna selection algorithm based on maximum distance to achieve human action classification using CSI signals. Alsaify et al. [31] proposed a multi-environment human activity recognition system based on CSI signals. Using an SVM classifier, they achieved an accuracy of 91.27%. Fard Moshiri et al. [32] converted CSI data into images and then used these images as input to a 2D convolutional neural network (CNN) classifier. Experiments showed that seven actions achieved approximately 95% accuracy. Showmik et al. [33] used an adaptive activity segmentation algorithm for classification using wavelet CNN, which performed well on a real dataset. The above methods using CSI data for human activity recognition still have limitations in feature extraction and recognition accuracy, and there is still room for improvement.

From the above discussion, we can clearly understand that there are various methods to recognize human behavior based on CSI signals, including feature extraction, machine learning, and deep learning. However, CSI data contains both temporal and spatial information, and a model with robustness is needed to learn the features of both aspects. In this paper, we propose the SM-TCNNET model, which can effectively use both kinds of information and has high performance for human activity recognition. The main work of this paper includes:

A deep-learning-based model (SM-TCNNET) is proposed. The SM module is mainly used to extract the spatial features in CSI signals. The TCN probes the temporal information in the CSI signals, which improves the expressiveness of the model and its ability to accurately identify human activities.
The model requires only a small amount of pre-processing of the CSI data, which makes it easily scalable to other activity recognition datasets.
It verifies the accuracy of the SM-TCNNET model at various distances between the transmitter and the receiver. When the distance between the transmitter and the receiver is known, the distance from the human body to the connection line between the two is different. It evaluates the reliability and validity of CSI data, and it investigates the law of signal characteristics changing with distance.
The performance of SM-TCNNET is evaluated using both the self-harvested dataset and the public dataset under various scenarios of line-of-sight (LOS) and non-line-of-sight (NLOS). In the self-harvested dataset, the accuracy of human activity recognition reaches 99.93% and 97.56% in different situations of LOS and NLOS, respectively; in the public dataset, our model detects human activity with 99.8% accuracy.

The main structure of this work Is organized as follows: Section 2 presents a detailed description of the used dataset. Section 3 presents the system design, including data pre-processing, model introduction, training, and testing presentation. Section 4 presents the experimental results of SM-TCNNET. Section 5 concludes the paper.

2. Dataset Description

Two datasets were used in this paper to validate the performance and robustness of the model. The first dataset was collected in an office with an area of 4 m × 8 m. We used an Intel 5300 NIC on an Ubuntu 14.04 system as the receiver and a regular home TP LINK router as the sender. The distance between the transmitter and receiver was 1 m, 2 m, and 3 m, respectively, and three sets of data were collected using the CSI tool, as shown in Figure 1. In addition, we also collected four sets of data, with the distance between the fixed transmitter and receiver being 2 m and the straight-line distance from the human body to the two connecting lines being 0 m, 0.5 m, 1 m, and 1.5 m, respectively, as shown in Figure 2.

In this paper, human activities in both LOS and NLOS cases were collected. To enrich the diversity of the sample, there were three volunteers involved in the data collection. They were two men and one woman of different sizes. Seven common human actions were involved, including sweeping, mopping, walking, trotting, falling, lying down, and raising hands. Each data set contained 512 action data, of which 384 were used for training and 128 for testing. Among these actions, sweeping and mopping, walking and trotting, and lying down and falling are similar actions used to verify the discriminative ability of the model. The intention of capturing hand-raising actions is to explore the feasibility of gesture recognition control in the smart home domain. We also collected data at different distances to evaluate the reliability and validity of CSI data and to investigate the pattern of signal characteristics with distance. In Figure 3, we show the simulation and state diagrams for the two cases of LOS and NLOS, respectively, at the time of data collection.

In order to validate the robustness of the model, a public dataset (StanWiFi) [34] was used in addition to the self-harvested dataset, which contains seven common actions in households, namely, lying, falling, raising hands, walking, running, sitting, and standing up. There are 4880 action data in total, of which 4392 and 488 are training data and test data, respectively.

3. System Design

Figure 4 depicts the general architecture of our proposed model, which is separated into four major parts: (1) data gathering; (2) pre-processing of data; (3) for training and testing, four-fold cross-validation is used; and (4) model training and evaluation. The paper’s data collection is explained in Section 2, and thus the discussion begins with data pre-processing.

3.1. Data Preprocessing

After collecting the data, it was necessary to perform further processing of the data. We chose the Hampel filter among the filters for outlier processing, which has the characteristics of robustness and accuracy and can effectively handle data with outliers without being disturbed by single or small numbers of extreme values. Compared with other filters, such as the mean filter or median filter, the Hampel filter is more robust in handling data containing outliers and can estimate the true value of the data more accurately. This is because the Hampel filter considers not only the neighboring values of each data point but also the values within a certain range around the data point. In this paper, the window size of the Hampel filter is set to 10; it uses the sliding window method to process the data, and it uses a threshold of 3 for MAD to identify outliers. The core mathematical relationship of the Hampel filter is as follows:

M A D_{i} = m e d i a n |x_{j} - x_{i}|, j \in [i - k, i + k]

(1)

r_{i} = | x_{i} - M A D_{i} |

(2)

In a dataset, if the residuals (i.e., the difference between the actual observed and predicted values) of some of these data points exceed a specified threshold T, then these data points are considered as outliers. To deal with these outliers, we can take the median replacement approach, i.e., replace it with the median of the data points around that outlier. The processed results are shown in Figure 5.

Since the acquisition of human activity CSI signals is usually affected by high-frequency noise, a low-pass filter is usually used to remove the high-frequency noise. We chose the Butterworth filter because of its good smoothness and low phase distortion. Compared with other filters, the Butterworth filter’s frequency response changes more smoothly, so it can effectively retain the low-frequency components of the signal without introducing additional distortion or oscillation. In addition, its phase response is linear and thus does not introduce any phase distortion, thus preserving the temporal information of the signal. The core mathematical relationship of the Butterworth filter is shown in Equation (3).

{|H (ω)|}^{2} = \frac{1}{1 + {(\frac{ω}{ω_{c}})}^{2 n}} = \frac{1}{1 + ϵ^{2} {(\frac{ω}{ω_{p}})}^{2 n}}

(3)

In this paper, the Butterworth cutoff frequency is set to 3 Hz; the sampling frequency is 1000 Hz; and the filter order is 25. The denoising results are shown in Figure 6:

After the above processing of the data, this time series data will continue to be processed using a sliding window. Based on the size of the data, we specified a window size of 125 and set a threshold size of 60%. If the number of behavior types in the current window exceeded the threshold, it was labeled as 1, otherwise it was labeled as 2. The threshold was set because the process between no behavior and doing an action was not easy to determine, and after verification and testing, we set the threshold size to 60%. After all the above preprocessing, we could use four-fold cross-validation to divide the training set and test set, and input the data into SM-TCNNET for training.

3.2. SM-TCNNET Model

3.2.1. Model Introduction

Our proposed SM-TCNNET was mainly divided into two parts, which are a spatial module and a TCN module, and the residual module was a part of the TCN. The model structure diagram is shown in Figure 7:

The first part is a spatial feature extraction module. Specifically, it takes time series data as input and applies techniques such as convolution, batch normalization, maximum pooling, and discard regularization to extract spatial features from the input data. These spatial features are contained in the output tensor, which will be used as the input of the next module for continuing the processing of the time series data. In this way, the network can pay more attention to the more important temporal features and ignore some irrelevant spatial features when performing subsequent convolution operations, thus improving the effectiveness and efficiency of the network.

The second part is the TCN module, which is composed of three residual modules, a dilated convolutional kernel, and a 1D convolutional layer. In the loop, each residual module consists of a 1D convolutional layer with dilation rate, a dropout layer, a 1 × 1 convolutional layer, and a jump connection to add the output of the current layer to the output of the previous layer. The overall role of the TCN is to model the data in time, allowing the model to take into account the temporal characteristics of the input data. The one-dimensional convolutional layer helps the model to quickly capture the long-range dependencies of the input data. The residual module helps the model to better learn complex time-series relationships. In Figure 7, the input to the residual module is

{\hat{Z}}^{(i - 1)}

, which indicates the output of the previous layer.

{\hat{Z}}^{(i)}

denotes the output of the network at layer i. Each of its elements is a normalized version of

{\hat{Z}}^{(i - 1)}

, the output of the previous layer. The dilated convolution kernel is compared with the standard CNN in order to obtain the global information of the whole sequence, as shown in Figure 8. The standard CNN obtains a larger perceptual field by adding pooling layers, and there is definitely a problem of information loss after the pooling layers. Dilated convolution is used to inject voids into the standard convolution as a way to increase the perceptual field. The advantage of dilated convolution is that the perceptual field is increased without the loss of information from pooling, so that each convolutional output contains a larger range of information. Among them, the mathematical relationship of the dilated convolution kernel is as follows:

F (s) = \sum_{i = 0}^{k - 1} f (i) \cdot X_{s - d \cdot i}

(4)

where F(s) is the representation of the output signal in the frequency domain; k is the length of the convolution kernel; X is the representation of the input signal in the frequency domain; f(

i

) is the weight of the convolution kernel, indicating the value at the

i

th position of the convolution kernel when convolving with the input signal in the time domain; d is the hole rate, indicating how many input values are sampled every other time in the convolution operation; and s is the index of the position in the output signal, indicating which position of the output signal is computed in the frequency domain.

The mathematical relationship of the residual blocks is as follows:

o u t p u t = A c t i v a t i o n (x + f (x))

(5)

The residual concatenation represents the summation of the input tensor x and the tensor obtained by the convolution operation, and the nonlinear transformation by an activation function Activation, resulting in the output tensor output.

3.2.2. Training and Testing

How well a model performs and trains depends on having a sufficient amount of data variation and choosing appropriate hyperparameters. The selection of hyperparameters implies choosing the number of epochs, batch size, activation function, and learning rate, among others. Hyperparameter selection is used in the training phase, while performance evaluation involves using the validation set. We empirically selected some possible ranges of hyperparameter values and used a grid search method to exhaust different hyperparameter combinations to evaluate their performance. Through methods such as cross-validation, we selected the best-performing hyperparameter combination as the final hyperparameter value. Therefore, we used the following hyperparameters for the self-picked dataset in the training phase: n_epochs = 200, batch size = 30, and learning rate α = 1 × 10⁻⁴. The StanWiFi dataset uses the following hyperparameters: n_epochs = 1500, batch size = 200, and learning rate α = 1 × 10⁻⁴. An important issue in the training phase was overfitting, where we applied the Adam optimizer to minimize the risk of overfitting and used a softmax-based cross-entropy loss function. It directly softmax transforms the input and then calculates the cross-entropy loss between the softmax-transformed probability distribution and the true labels, thus avoiding the problem of performing the softmax transformation manually. The mathematical relationship is shown in Equation (6). In addition, we used four-fold cross-validation on the self-picked dataset and ten-fold cross-validation on the StanWiFi dataset, using cross-validation to better evaluate the generalization ability and stability of the model.

l o s s = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} y_{i, j} \log_{} (\frac{e x p (z_{i, j})}{\sum_{k = 1}^{C} e x p (z_{i, k})})

(6)

The training was followed by a test and evaluation phase. We use well-known performance evaluation criteria, namely accuracy, F1-score, precision, and recall [34], to measure the recognition performance of the proposed system.

4. Validation and Evaluation

We show the experimental findings acquired using the SM-TCNNET model on two distinct datasets: the self-collected dataset and the StanWiFi dataset, in this section. In addition to confirming the performance results of our proposed SM-TCNNET model on the self-collected dataset, we conducted a comparison research analysis of the SM-TCNNET model on the StanWiFi dataset with other state-of-the-art models.

4.1. Experimental Results and Analysis Based on Self-Collected Data Sets

In this section, three aspects are verified on the self-picked dataset. First, the recognition activity accuracy results and analysis when TX and RX were at different distances. Second, the recognition activity accuracy results and analysis when TX and RX were at a certain distance and the human body was connected to the two at different distances. Third, the results and analysis of the LOS and NLOS cases when TX and RX are at a certain distance.

4.1.1. TX and RX at Different Distances, Activity Accuracy Results and Analysis

When the human body is between TX and RX, varying the TX and RX spacing, the schematic diagram is shown in Figure 2 of Section 2. The recognition accuracy of each action is shown in Figure 9. In general, the closer the AP is to the computer, the more accurate the results obtained. This is because, as the communication distance shortens, the reception of the received WiFi signal is enhanced, thus providing a more reliable CSI feature extraction to capture the different actions of the human body. However, the results were not optimal when the distance was at 1 m. This is because the multipath effect is smaller when the distance decreases, but the perturbation caused by irregular limb movements also increases. From the experimental results, it can be seen that the experimental results were better than at other distances when the distance between the transmitter and receiver was 2 m. Therefore, the following experimental distance was set as 2 m between the transmitter and the receiver in this paper.

4.1.2. Recognition Results and Analysis of Changing Human Position When the Distance between TX and RX Is Certain

The fixed TX and RX spacing was 2 m, and the changing human position was 0 cm, 50 cm, 100 cm, and 150 cm, respectively. The schematic diagram is shown in Figure 3 of Section 2. The recognition of each action was accurate, as shown in Figure 10. The average recognition rate of the action decreased with increasing distance when the distance was greater than or equal to 50 cm. This is because the strength of the signal decreased as the distance increased. When the distance was less than or equal to 50 cm, the average motion recognition rate increased. When the interval was 0 cm, the average motion recognition rate decreased. Since the multipath effect became smaller when the distance decreased but the interference caused by irregular limb movements also increased, for SM-TCNNET applications, a distance of about 20 cm was ideal.

4.1.3. Results and Analysis of the LOS and NLOS Cases When the Distance between TX and RX Is Certain

The experimental results of the SM-TCNNET model for human activity recognition on LOS and NLOS are shown in Table 1 and Table 2. The 1st, 2nd, 3rd, and 4th are the results of the four folds, respectively, and then the average value is calculated at the end. Our SM-TCNNET model shows better classification performance for all human activities. As can be seen from Table 1, the average scores of all four metrics reached above 99 on LOS. As can be seen from Table 2, on NLOS, the average scores of all four metrics reached above 97. It indicates that our SM-TCNNET model has excellent classification performance for the human activity recognition task.

The confusion matrix of the SM-TCNNET model proposed in this paper for the LOS case on the self-harvested dataset is shown in Figure 11, where the main diagonal line indicates the recognition accuracy of human activities. It can be seen from Figure 11 that all human activities were classified correctly, and similar actions could be distinguished perfectly. The confusion matrix in the NLOS case is shown in Figure 12; it caused reflections, scattering, and attenuation on the signal transmission path due to the presence of occlusion between TX and RX, making the signal propagation path more complex and diverse. Two of the sample actions for walking were incorrectly predicted. As a whole, the accuracy of the model can meet the requirement of identifying human activities. This indicates that the SM-TCNET model proposed in this paper can simultaneously capture the spatiotemporal characteristics within the WiFi CSI signal and accurately achieve the recognition of human activities.

4.2. Experimental Results Based on StanWiFi Dataset

The StanWiFi dataset was at LOS conditions with a distance of 3 m between Tx and Rx. Rx was equipped with a commercial Intel 5300 NIC with a sampling rate of 1 khz. The accuracy and loss functions of the SM-TCNNET model compared with the LSTM [25] model are shown in Figure 13.

The comparison of the SM-TCNNET model on the StanWiFi dataset with other works in addition to the LSTM model is shown in Table 3. The experiments using this model on the large-scale dataset show that it had high performance metrics and was more competitive with respect to other models.

This paper presents the performance of the SM-TCNNET model on the StanWiFi dataset with the confusion matrix shown in Figure 14. The confusion matrix showed that a standing action sample was misclassified, probably due to not considering the data frames containing motion signals in the data stream, resulting in the model not capturing all features of the motion signals, which affected the classification accuracy of the model. However, as a whole, the model was able to accurately classify various human activities and performed well on this dataset. Notably, the size of the StanWiFi dataset was larger than the self-collected dataset, but the SM-TCNNET model showed ideal performance on both datasets of different sizes, which further demonstrates the strong robustness of the model.

5. Conclusions

This study found that human activities based on WiFi CSI signals are time-series data with spatiotemporal characteristics. To automatically identify human activities from WiFi CSI signals, we propose the SM-TCNNET model, which is capable of extracting spatial–temporal features simultaneously. The model consists of a spatial module, which is used to extract spatial features, and a TCN module, which captures temporal dependencies using residual blocks of dilated convolution. The final classification output is obtained by a linear layer. We verified the effect of the TX and RX spacing distance on recognition accuracy and found that the spacing distance is optimal at 2 m. We also found that a distance of about 20 cm from the human body to the two connections is the best case for a certain TX and RX interval. We achieved 99.88% and 99.80% activity recognition accuracy in the self-selected dataset and StanWiFi dataset, respectively, and compared it with other methods to show that the method has high recognition accuracy and robustness.

Author Contributions

Conceptualization, Y.Z.; data curation, Z.Z.; formal analysis, Z.G.; investigation, S.G.; methodology, T.L.; project administration, Y.C.; resources, Y.Z.; supervision, S.G. and Y.C.; validation, T.L. and Z.Z.; visualization, T.X.; writing—original draft, T.L.; writing—review & editing, Y.Z. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Hebei Provincial Education Department under grant CXY2023005. This study was supported by the Hebei Provincial Science and Technology Program under grant 21350701D.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aloulou, H.; Abdulrazak, B.; de Marassé-Enouf, A.; Mokhtari, M. Participative Urban Health and Healthy Aging in the Age of AI: 19th International Conference, ICOST 2022, Paris, France, Paris, France, 27–30 June 2022, Proceedings; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Islam Md, M.; Nooruddin, S.; Karray, F. Multimodal Human Activity Recognition for Smart Healthcare Applications. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 196–203. [Google Scholar]
Liagkou, V.; Sakka, S.; Stylios, C. Security and Privacy Vulnerabilities in Human Activity Recognition systems. In Proceedings of the 2022 7th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Ioannina, Greece, 23–25 September 2022; pp. 1–6. [Google Scholar]
Qu, H.; Rahmani, H.; Xu, L.; Williams, B.; Liu, J. Recent Advances of Continual Learning in Computer Vision: An Overview. arXiv 2021, arXiv:2109.11369v2. [Google Scholar]
Uddin, M.H.; Ara, J.M.; Rahman, M.H.; Yang, S.H. A Study of Real-Time Physical Activity Recognition from Motion Sensors via Smartphone Using Deep Neural Network. In Proceedings of the 2021 5th International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 17–19 December 2021; pp. 17–19. [Google Scholar]
Li, X.; He, Y.; Jing, X. A survey of deep learning-based human activity recognition in radar. Remote Sens. 2019, 11, 1068. [Google Scholar] [CrossRef]
Qi, W.; Wang, N.; Su, H. DCNN based human activity recognition framework with depth vision guiding. Neurocomputing 2022, 486, 261–271. [Google Scholar] [CrossRef]
Janardhanan, J.; Umamaheswari, S. Vision based Human Activity Recognition using Deep Neural Network Framework. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 3117–3128. [Google Scholar] [CrossRef]
Aggarwal, K.; Arora, A. An Approach to Control the PC with Hand Gesture Recognition using Computer Vision Technique. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 23–25 March 2022; pp. 760–764. [Google Scholar]
Mujahid, A.; Aslam, M.; Khan, M.U.G. Multi-Class Confidence Detection Using Deep Learning Approach. Appl. Sci. 2023, 13, 5567. [Google Scholar] [CrossRef]
Adama, D.A.; Lotfi, A.; Ranson, R. A Survey of Vision-Based Transfer Learning in Human Activity Recognition. Electronics 2021, 10, 2412. [Google Scholar] [CrossRef]
Lee, H. Developing a wearable human activity recognition (WHAR) system for an outdoor jacket. Int. J. Cloth. Sci. Technol. 2023, 35, 177–196. [Google Scholar] [CrossRef]
Gao, C.; Chen, Y.; Jiang, X. Bi-STAN: Bilinear spatial-temporal attention network for wearable human activity recognition. Int. J. Mach. Learn. Cybern. 2023, 14, 2545–2561. [Google Scholar] [CrossRef]
Sun, B.; Liu, M.; Zheng, R. Attention-based LSTM Network for Wearable Human Activity Recognition. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8677–8682. [Google Scholar]
Rosati, S.; Balestra, G.; Knaflitz, M. Comparison of Different Sets of Features for Human Activity Recognition by Wearable Sensors. Sensors 2018, 18, 4189. [Google Scholar] [CrossRef]
Janarthanan, S. Optimized unsupervised deep learning assisted reconstructed coder in the on-nodule wearable sensor for human activity recognition. Measurement 2020, 164, 108050. [Google Scholar] [CrossRef]
Yao, Y.; Liu, W.; Zhang, G. Radar-Based Human Activity Recognition Using Hyperdimensional Computing. IEEE Trans. Microw. Theory Tech. 2022, 70, 1605–1619. [Google Scholar] [CrossRef]
Cao, Z.; Li, Z.; Guo, X. Towards Cross-Environment Human Activity Recognition Based on Radar Without Source Data. IEEE Trans. Veh. Technol. 2021, 70, 11843–11854. [Google Scholar] [CrossRef]
Chen, H.; Ding, C.; Zhang, L. Human Activity Recognition using Temporal 3DCNN based on FMCW Radar. In Proceedings of the 2022 IEEE MTT-S International Microwave Biomedical Conference (IMBioC), Suzhou, China, 16–18 May 2022; pp. 245–247. [Google Scholar]
Radhityo, D.; Suratman, F.; Istiqomah. Human Motion Change Detection Based on FMCW Radar. In Proceedings of the 2022 IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob), Bandung, Indonesia, 9–10 December 2022; pp. 1–6. [Google Scholar]
Janardhana, R.; Chinni, K.M. Performing Object and Activity Recognition Based on Data from a Camera and a Radar Sensor. US Patent US11361554B2, 14 June 2022. [Google Scholar]
Shafiqul, I.M.; Jannat, M.K.; Kim, J.W.; Lee, S.W.; Yang, S.H. HHI-AttentionNet: An Enhanced Human-Human Interaction Recognition Method Based on a Lightweight Deep Learning Model with Attention Network from CSI. Sensors 2022, 22, 6018. [Google Scholar] [CrossRef]
Kabir, M.H.; Rahman, M.H.; Shin, W. CSI-IANet: An Inception Attention Network for Human-Human Interaction Recognition Based on CSI Signal. IEEE Access 2021, 9, 166624–166638. [Google Scholar] [CrossRef]
Su, J.; Liao, Z.; Sheng, Z.; Liu, A.X.; Singh, D.; Lee, H.N. Human activity recognition using self-powered sensors based on multilayer bi-directional long short-term memory networks. IEEE Sens. J. 2022, 1–9. [Google Scholar] [CrossRef]
Li, H.; He, X.; Chen, X.; Fang, Y.; Fang, Q. Wi-motion: A robust human activity recognition using WiFi signals. IEEE Access 2019, 7, 153287–153299. [Google Scholar] [CrossRef]
Hoang, M.T.; Yuen, B.; Dong, X.; Lu, T.; Westendorp, R.; Reddy, K. Recurrent neural networks for accurate RSSI indoor localization. IEEE Internet Things J. 2019, 6, 10639–10651. [Google Scholar] [CrossRef]
Ren, Y.; Wang, Z.; Wang, Y.; Tan, S.; Chen, Y.; Yang, J. 3D Human Pose Estimation Using WiFi Signals. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 15–17 November 2021; pp. 363–364. [Google Scholar]
Guo, Z.; Xiao, F.; Sheng, B.; Fei, H.; Yu, S. WiReader: Adaptive air handwriting recognition based on commercial WiFi signal. IEEE Internet Things J. 2020, 7, 10483–10494. [Google Scholar] [CrossRef]
Hao, Z.; Kang, Y.; Dang, X. Wi-Exercise: An Indoor Human Movement Detection Method Based on Bidirectional LSTM Attention. Mob. Inf. Syst. 2022, 2022, 1–14. [Google Scholar] [CrossRef]
Cheng, K.; Xu, J.; Zhang, L. Human behavior detection and recognition method based on Wi-Fi signals. In Proceedings of the 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 17–19 June 2022; pp. 1065–1070. [Google Scholar]
Alsaify, B.A.; Almazari, M.M.; Alazrai, R.; Alouneh, S.; Daoud, M.I. A CSI-Based Multi-Environment Human Activity Recognition Framework. Appl. Sci. 2022, 12, 930. [Google Scholar] [CrossRef]
Fard Moshiri, P.; Shahbazian, R.; Nabati, M.; Ghorashi, S.A. A CSI-Based Human Activity Recognition Using Deep Learning. Sensors 2021, 21, 7225. [Google Scholar] [CrossRef]
Showmik, I.A.; Sanam, T.F.; Imtiaz, H. Human Activity Recognition from Wi-Fi CSI Data Using Principal Component-Based Wavelet CNN. Digit. Signal Process. 2023, 138, 104056. [Google Scholar] [CrossRef]
Yousefi, S.; Narui, H.; Dayal, S. A Survey on Behavior Recognition Using WiFi Channel State Information. IEEE Commun. Mag. 2017, 55, 98–104. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, L.; Jiang, C. WiFi CSI Based Passive Human Activity Recognition Using Attention Based BLSTM. IEEE Trans. Mob. Comput. 2019, 18, 2714–2724. [Google Scholar] [CrossRef]
Yadav, S.K.; Sai, S.; Gundewar, A. CSITime: Privacy-preserving human activity recognition using WiFi channel state information. Neural Netw. 2022, 146, 11–21. [Google Scholar] [CrossRef]
Salehinejad, H.; Valaee, S. LiteHAR: Lightweight Human Activity Recognition from WiFi Signals with Random Convolution Kernels. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 22–27 May 2022; pp. 4068–4072. [Google Scholar]

Figure 1. TX and RX different distances schematic.

Figure 2. TX and RX fixed distance, change the body position.

Figure 3. Data acquisition: (a) LOS simulation image; (b) LOS status image.; (c) NLOS simulation image; (d) NLOS status image.

Figure 4. System architecture diagram.

Figure 5. Outlier removal: (a) before outlier processing; (b) after outlier processing.

Figure 6. Noise removal: (a) before denoising; (b) after denoising.

Figure 7. Model structure diagram.

Figure 8. Convolution comparison: (a) standard CNN convolution; (b) dilated convolution.

Figure 9. Activity accuracy results for TX and RX at different distances.

Figure 10. Graph of the accurate results of recognition activities for changing the position of the human body when the distance between TX and RX is certain.

Figure 11. Confusion matrix in the LOS case.

Figure 12. Confusion matrix in the NLOS case.

Figure 13. Model comparison: (a) is the accuracy comparison graph; (b) is the loss comparison graph.

Figure 14. Confusion matrix for StanWiFi dataset.

Table 1. Indicators under LOS.

Metrics	Fold				Average
Metrics	1st	2nd	3rd	4th	Average
Accuracy	100.00	99.82	99.89	100.00	99.93
Precision	100.00	99.84	99.85	100.00	99.93
Recall	100.00	99.82	99.81	100.00	99.91
F1-score	100.00	99.82	99.83	100.00	99.91

Table 2. Indicators under NLOS.

Metrics	Fold				Average
Metrics	1st	2nd	3rd	4th	Average
Accuracy	97.57	97.76	98.28	96.63	97.56
Precision	97.44	97.65	98.19	96.45	97.43
Recall	97.45	97.55	98.21	96.45	97.42
F1-score	97.43	97.60	98.20	96.45	97.42

Table 3. Comparison with other studies.

Study	Method and Year	Metrics (%)
Study	Method and Year	Accuracy	Precision	Recall	F1-Score
Yousefi et al. [34]	LSTM (2017)	90.05	—	—	—
Chen et al. [35]	ABLSTM (2018)	97.30	—	—	—
Yadav et al. [36]	CSITime (2022)	98.00	99.16	98.87	99.01
Salehinejad et al. [37]	LiteHAR (2022)	93.00	—	—	—
proposed	SM-TCNNET	99.80	99.81	99.80	99.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, T.; Gao, S.; Zhu, Y.; Gao, Z.; Zhao, Z.; Che, Y.; Xia, T. SM-TCNNET: A High-Performance Method for Detecting Human Activity Using WiFi Signals. Appl. Sci. 2023, 13, 6443. https://doi.org/10.3390/app13116443

AMA Style

Li T, Gao S, Zhu Y, Gao Z, Zhao Z, Che Y, Xia T. SM-TCNNET: A High-Performance Method for Detecting Human Activity Using WiFi Signals. Applied Sciences. 2023; 13(11):6443. https://doi.org/10.3390/app13116443

Chicago/Turabian Style

Li, Tianci, Sicong Gao, Yanju Zhu, Zhiwei Gao, Zihan Zhao, Yinghua Che, and Tian Xia. 2023. "SM-TCNNET: A High-Performance Method for Detecting Human Activity Using WiFi Signals" Applied Sciences 13, no. 11: 6443. https://doi.org/10.3390/app13116443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SM-TCNNET: A High-Performance Method for Detecting Human Activity Using WiFi Signals

Abstract

1. Introduction

2. Dataset Description

3. System Design

3.1. Data Preprocessing

3.2. SM-TCNNET Model

3.2.1. Model Introduction

3.2.2. Training and Testing

4. Validation and Evaluation

4.1. Experimental Results and Analysis Based on Self-Collected Data Sets

4.1.1. TX and RX at Different Distances, Activity Accuracy Results and Analysis

4.1.2. Recognition Results and Analysis of Changing Human Position When the Distance between TX and RX Is Certain

4.1.3. Results and Analysis of the LOS and NLOS Cases When the Distance between TX and RX Is Certain

4.2. Experimental Results Based on StanWiFi Dataset

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI