Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information

Cui, Jian-Wei; Du, Han; Yan, Bing-Yan; Wang, Xuan-Jie

doi:10.3390/electronics11193078

Open AccessArticle

Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information

by

Jian-Wei Cui

^*,

Han Du

,

Bing-Yan Yan

and

Xuan-Jie Wang

School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(19), 3078; https://doi.org/10.3390/electronics11193078

Submission received: 2 September 2022 / Revised: 18 September 2022 / Accepted: 18 September 2022 / Published: 27 September 2022

Download

Browse Figures

Versions Notes

Abstract

:

A prosthetic hand is one of the main ways to help patients with upper limb disabilities regain their daily living abilities. Prosthetic hand manipulation must be coordinated with the user’s action intention. Therefore, the key to the control of the prosthetic hand is to recognize the action intention of the upper limb. At present, there are still problems such as difficulty in decoding information and a low recognition rate of identifying action intention with EMG signals and EEG signals. While inertial sensors have the advantages of low cost and high accuracy and posture information can characterize the upper limb motion state, visual information has the advantages of high information and being able to detect the type of target objects, which can be complementarily fused with inertial sensors to further grasp the human motion requirements. Therefore, this paper proposes an upper limb action intention recognition method based on the fusion of posture information and visual information. The inertial sensor is used to collect the attitude angle data during the movement of the upper limb, and according to the similarity of the human upper limb structure to the linkage mechanism, a model of the upper limb of the human body is established using the positive kinematics theory of a mechanical arm to solve the upper limb end positions. The upper limb end positions were classified into three categories: torso front, upper body nearby, and the initial position, and a multilayer perceptron model was trained to learn the classification relationships. In addition, a miniature camera was installed on the hand to obtain visual image information during upper limb movement. The target objects are detected using the YOLOv5 deep learning method, and then, the target objects are classified into two categories: wearable items and non-wearable items. Finally, the upper limb intention is jointly decided by the upper limb motion state, target object type, and upper limb end position to achieve the control of the prosthetic hand. We applied the upper limb intention recognition method to the experimental system of a mechanical prosthetic hand and invited several volunteers to test it. The experimental results showed that the intention recognition success rate reached 92.4%, which verifies the feasibility and practicality of the upper limb action intention recognition method based on the fusion of posture information and visual information.

Keywords:

prosthetic hands; computer vision; inertial sensors; intention recognition

1. Introduction

The upper limbs are an important part of the human trunk. Able-bodied upper limbs can complete tasks such as grasping, carrying, and kneading according to human intentions [1]. Due to the lack of hand function, patients with upper limb disabilities have difficulty in completing daily actions such as drinking, dressing, and wearing glasses, which cause inconvenience to life, study, and work and seriously affect the quality of life. Some hand functions can be achieved by wearing cosmetic prostheses or myoelectric prostheses, but most of them cannot follow human intentions and cannot meet the needs of daily life. Therefore, the study of handicapped manipulators has always been one of the important topics in robotics [2,3]. However, the key to realizing the assistive function of the manipulator is whether the manipulator can be coordinated with the action intention to help patients with upper limb disabilities smoothly realize actions such as drinking and dressing and, finally, realize the reconstruction of hand function.

The control of prosthetic hands depends on the recognition of the upper limb action intention, and the collection and analysis of upper limb movement information is the first step to recognizing the upper limb action intention. Generally, the information sources containing the upper limb action intention are divided into two categories: bioelectrical information, such as EMG signals and EEG signals, and general physical information, such as posture signals, visual signals, and force signals [4]. Myoelectric control is currently the most widespread method for controlling disability-assisted prostheses; it controls the device by collecting electrical impulses generated by muscles through sensors attached to the limb [5]. Song et al. recognized seven common human lower limb movement patterns in daily life based on a multilayer perceptron (MLP) and a long short-term memory (LSTM) network and achieved better recognition results [6]. Chai et al. proposed a closed-loop model based on sEMG signals that is composed of a long short-term memory (LSTM) network and a discrete-time zeroing neural network (ZNN). Experiments showed that the model has high accuracy in the intention recognition of a simple joint motion [7]. However, myoelectric control also has certain limitations. Some patients have a high degree of upper limb amputation and fewer limb muscles remaining, resulting in fewer sources of EMG signals. Moreover, the EMG signal is susceptible to interference by non-ideal factors such as motor offset, individuality differences, and muscle fatigue, which can have a serious impact on the performance of the EMG-controlled prosthetic hand.

The research on a brain–computer interface has mainly focused on decoding the brain neural activity information during human thought activity [8,9]. Brain neural activity information mainly includes EEG signals generated by physical movement, movement imagination, and sensory sensation, among which the study of movement imagination EEG signals was applied to upper limb action intention recognition. He and Sun et al. asked subjects to imagine left-hand movement and right-hand movement and used different feature extraction methods to classify the two types of EEG signals to judge the upper limb action intention; the experiments all achieved a certain accuracy [10,11]. However, the low signal-to-noise ratio of EEG signals and the non-obvious relationship between EEG signals and action intention still make it a challenge to extract effective information and interpret signals [12,13]. It can be seen that bioelectrical signals usually express the upper limb action intention more directly, but it is still challenging to obtain stable bioelectrical signals and decode them accurately.

In recent years, due to the remarkable success of deep learning in the field of computer vision [14], some research on action intention recognition and manipulator control based on computer vision has gradually emerged. Ghazaei G. et al. applied visual signals to prosthetic hand control and trained more than 500 grabbable object images by using a convolutional neural network (CNN) that could realize basic object grasping and movement functions. However, relying only on visual signals, it is difficult to continuously follow human intention to complete coherent and complex actions such as drinking and dressing [15]. Shi et al. used a convolutional neural network (CNN) to classify images of daily objects according to different grasping patterns and applied this vision-based pattern recognition method to dexterous prosthetic hand control. The prosthetic hand had good performance in the task of “reaching out and grasping”. Compared with the traditional EMG control method, the control effect of the prosthetic hand based on computer vision has been significantly improved [16]. Visual signals are usually obtained by computers capturing, interpreting, and processing visually perceptible objects. When the manipulator moves to the position of the target object in the upper limb action, the control system can obtain key information such as the category, shape, and purpose of the target object according to the visual signal; this provides a basis for judging the timing of the manipulator opening in the next action. However, visual signals cannot judge the motion state of the upper limbs, so it is difficult to accurately identify the intention of the upper limbs by relying only on visual signals.

Inertial sensors are widely used in various applications such as portable mobile devices, rehabilitation monitoring, and motion recognition due to their low power consumption, low cost, small size, and high accuracy [17,18,19,20]. Cui et al. put inertial sensors on upper limbs and proposed an arm motion recognition method based on a sub-motion characteristic matrix and a dynamic time warping (DTW) algorithm, with a recognition rate of 99.4% [21]. Xuan et al. fixed inertial sensors on the foot, outer calf, and outer thigh to collect acceleration signals during lower limb movements and to recognize lower limb motion intentions. The experiments achieved a 97% recognition rate in five steady-state modes: walking on flat ground, going upstairs, going downstairs, going uphill, and going downhill, and achieving smooth and stable control of the lower limb prosthesis [22]. The angular velocity and attitude angle of the limb can be obtained by wearing the inertial sensor on the limb. However, upper limb movements are often more complex and contain more abundant action intentions, so it is impossible to recognize upper limb action intentions based only on motion information obtained by inertial sensors. However, the inertial sensor data represent the motion state of the upper limb and the upper limb posture, which can exactly complement the visual information. Therefore, this paper proposes a new method of upper limb intention recognition based on the fusion of posture information and visual information and applies it to the field of prosthetic hand control.

2. Materials and Methods

2.1. Data Platform and Acquisition

We used a data glove model, WISEGLOVE7F+, produced by Beijing Xintian Vision Technology Co., Ltd., Beijing, China, to collect upper limb posture angle data. The data glove consists of three inertial measurement units, and the inertial measurement units include a three-axis accelerometer and a three-axis gyroscope. The sensors collect data at a frequency of 50 Hz, with an accuracy of 0.2 degrees. Image data were collected by a miniature camera model, HD810, produced by Shenzhen Weidafei Technology Co. The micro camera has a 2 million pixel resolution (1920 × 1080) and a focal length of 20~60 mm. The installation positions of the two types of sensors are shown in Figure 1. The three inertial measurement units of the data glove are worn on the upper arm, forearm, and opisthenar, and the micro camera is fixed at the finger sleeve of the middle finger of the data glove. The glove collects the posture angle data during the movement of the upper limb, and the posture angle data is directly transmitted to the computer through the serial port. All the data analysis processes are done on a computer. Through the analysis of the posture angle, a control signal is sent to the camera at the appropriate time to make the miniature camera capture the image of the possible target object in front of the finger.

2.2. Sliding Window Method

The upper limb movement data collected by the sensor are a time series, and it is not accurate to judge the upper limb state only according to the data of the current sampling point; hence, this paper uses the sliding window method to extract data. Considering that intention recognition is closely related to the control of the prosthetic hand, the real-time requirement is high, and the delay caused by the window length needs to be eliminated. If the length of the sliding window is too large, this will cause serious delay; if the window length is too small, this will be detrimental to the accuracy of the data; hence, this paper uses the sliding window method with the window lengths of 5, 10, 15 and 20 to process the data respectively. The results show that the sliding window with a window length of 10 has the best effect, so the forward sliding window with a length of

l

= 10 is adopted in this paper, where the window length refers to the number of sampling points. Over time, every time the sliding window slides back one point, the first data in the window is removed, and the value of the sliding window is used as the result of the last sampling point in the window. The calculation results are as follows:

V a l_{k} = \{\begin{matrix} 0 & k < l \\ \frac{1}{l} \sum_{i = - l + k}^{k} v a l_{i} & k \geq l \end{matrix}

(1)

k

represents the serial number of the current sampling point,

v a l_{i}

are the data in the sliding window, and

V a l_{k}

are the data extracted by the sliding window.

2.3. Upper Limb Kinematics Modeling

From a mechanical point of view, the upper limb structure is similar to a linkage mechanism. The three parts of the upper arm, forearm, and opisthenar can be regarded as the components of the mechanism, while the shoulder joint, elbow joint, and wrist joint are the motion pairs connecting various motion components. Therefore, this paper simplifies the upper limb into a linkage model, which can be easily analyzed and described by mathematical theory.

After determining a simplified model of the upper limb, it is necessary to establish the corresponding D-H coordinate system, which is a rectangular coordinate system built according to the rules of the D-H coordinate system [23] so that a position can be calculated along the coordinate system. The upper limb spatial position is a relative position relationship, and the upper limb kinematic model is established to calculate the position of the end of the upper limb relative to the body. Hence, we establish the reference coordinate system at the position above the chest level, with the shoulder. Based on the above criteria, the complete mathematical model of the upper limb established in this paper is shown in Figure 2. The five points of

O

,

A

,

B

,

C

,

D

represent the five positions of chest, shoulder, elbow, wrist, and opisthenar in the human body.

O_{O} - X_{O} Y_{O} Z_{O}

,

O_{A} - X_{A} Y_{A} Z_{A}

,

O_{B} - X_{B} Y_{B} Z_{B}

,

O_{C} - X_{C} Y_{C} Z_{C}

,

O_{D} - X_{D} Y_{D} Z_{D}

denote the base coordinate system of the chest and shoulder coordinate system, elbow coordinate system, wrist coordinate system, and opisthenar coordinate system, respectively.

L_{O A}

,

L_{A B}

,

L_{B C}

,

L_{C D}

denote the dimensions of each component, that is, half shoulder width, upper arm length, forearm length, and opisthenar length.

The chest and shoulder are not equipped with inertial sensors, so

O_{O} - X_{O} Y_{O} Z_{O}

, and

O_{A} - X_{A} Y_{A} Z_{A}

are fixed coordinate systems. The elbow coordinate system is parallel to the upper arm inertial measurement unit coordinate system, so the change of the attitude angle of the upper arm inertial measurement unit coordinate system is the change of the attitude angle of the elbow coordinate system. Similarly, the change of the forearm inertial measurement unit coordinate system and the change of the opisthenar inertial measurement unit coordinate system represent the change of the posture angle of the wrist coordinate system and the change of the posture angle of the opisthenar coordinate system, respectively.

In order to solve the transformation relationship between the opisthenar coordinate system and the upper extremity base coordinate system, taking the transformation of the shoulder coordinate system and the base coordinate system as an example, the solution formula of

A

in the

O_{O} - X_{O} Y_{O} Z_{O}

coordinate system is:

[\begin{matrix} P_{O}^{A} \\ 1 \end{matrix}] = {}^{o}T_{A} [\begin{matrix} P_{A}^{A} \\ 1 \end{matrix}]

(2)

P_{O}^{A}

is the vector coordinate of

A

with respect to the

O_{O} - X_{O} Y_{O} Z_{O}

coordinate system, and

{}^{o}T_{A}

is the homogeneous transformation matrix from

O_{O} - X_{O} Y_{O} Z_{O}

to

O_{A} - X_{A} Y_{A} Z_{A}

, which represents the translation and rotation relationship between the two coordinate systems.

{}^{o}T_{A} = [\begin{matrix} {}^{O}R_{A} & p_{A}^{O} \\ 0 & 1 \end{matrix}]

(3)

The equation is the

{}^{o}T_{A}

expression;

{}^{O}R_{A}

describes the rotation relationship from coordinate system

O_{O} - X_{O} Y_{O} Z_{O}

to coordinate system

O_{A} - X_{A} Y_{A} Z_{A}

, which is determined by the rotation angle between coordinate systems. Additionally, since the shoulder coordinates are fixed,

{}^{O}R_{A}

is constant. For the elbow, wrist, and opisthenar coordinates,

R

is determined by the attitude angle collected by the inertial sensor.

p_{A}^{O}

describes the translation between coordinate systems and the length

l

of the link member.

From

O_{O} - X_{O} Y_{O} Z_{O}

to

O_{D} - X_{D} Y_{D} Z_{D}

, there are four coordinate system changes. According to the forward kinematics theory of mechanical arms, the forward kinematics equations of upper limbs can be obtained as long as the homogeneous transformation matrices of four transformations are obtained respectively and multiplied successively.

[\begin{matrix} P_{O}^{D} \\ 1 \end{matrix}] = {}^{o}T_{A} {}^{A}T_{B} {}^{B}T_{C} {}^{C}T_{D} [\begin{matrix} P_{D}^{D} \\ 1 \end{matrix}]

(4)

P_{O}^{D}

denotes the vector coordinates of the opisthenar relative to the coordinate system

O_{O} - X_{O} Y_{O} Z_{O}

, that is, the position of the end of the upper limb relative to the human chest.

2.4. Upper Limb Position Prediction Based on Multilayer Perceptron

During continuous movement of the upper limb, the end position changes constantly, and we only need to focus on the position where the manipulator may open or close. Usually, we need to control the opening and closing of the manipulator when we reach in front of the torso to grasp or put down objects and when we put on or unload objects near the body. Therefore, this paper divides the upper limb end positions into three categories: torso front, upper body nearby, and the initial position, shown in Figure 3. Among them, torso front refers to the direction of the face, a space far from the torso; upper body nearby refers to the shoulders and the area of the body above the shoulders; the initial position is the area where the arm is naturally drooping and the end of where the hand can reach.

In this paper, a multilayer perceptron is used to classify the upper limb end position, which is described as follows:

Z = W_{o}^{T} f (W_{h}^{T} x + c) + b

(5)

where

x \in ℝ^{d}

is a set of input of sample data, and

W_{h}^{T}

and

W_{o}^{T}

are weight parameter matrices acting between the input layer and the hidden layer and between the hidden layer and the output layer, respectively.

c

and

b

are bias terms, and

f

is the rectified linear unit that is used as the activation function in this model:

f (W_{h}^{T} x + c) = \max \{0, W_{h}^{T} x + c\}

(6)

The upper limb end positions have been calculated in Section 2.4. The position data are a group of three-dimensional coordinates, so the number of input neurons of the multilayer perceptron is set to 3. The number of hidden layers of the model is set to 2, and the number of neurons in each layer is set to 8 and 6. The number of neurons in the output layer corresponds to the number of categories and is set to 3. Figure 4 shows the structure of the multilayer perceptron model. Finally, the model output is normalized using the Softmax function, the expression of which is

y_{k} = \frac{\exp (Z_{k})}{\sum_{k = 1}^{m} \exp (Z_{k})}

(7)

y_{k}

is the probability that the input sample is classified into the kth class, and

Z_{k}

is the output value of the corresponding output layer neuron. The cross-entropy function is chosen as the loss function, and the multilayer perceptron model is trained by minimizing the cross-entropy function using the stochastic gradient descent algorithm.

2.5. Object Detection Based on YOLOv5

The implementation of the target detection function in our system is important, and the target information captured by the camera fixed to the data glove is one of the important types of feedback for us to control the movement of the manipulator. We focus on upper limb intention recognition for people with upper limb disabilities and work on applying this approach to prosthetic systems. The selected target detection and recognition algorithm should meet the requirements of real-time and high robustness in order to improve the user experience during practical application. Traditional target detection and recognition methods are often executed in multiple steps, region selection, feature extraction, and classification [24]. Region selection is performed to localize the target location, and this step is often used to traverse the entire image using sliding windows of multiple sizes. This is a reliable but inefficient method, with too much time complexity to meet our real-time requirements. Additionally, commonly used handcrafted feature extraction methods, such as Hog [25], are not very robust for detecting changes in target diversity. Moreover, the use of two-stage neural-network-based methods [26] makes it difficult to meet the real-time requirements, so the use of end-to-end one-step deep learning schemes is considered necessary.

The current state-of-the-art one-step learning solution is considered to be the YOLO series [27]. The YOLO models are simultaneously characterized by high accuracy and fast recognition speed. As the latest version of this series, YOLOv5 has four submodels: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The difference between the four models is the different settings of depth and width. Among them, YOLOv5l is the recommended standard version. For the YOLOv5m model, although the class average accuracy mAP@0.5 is lower than YOLOv5l by 1.4%, the inference speed of single image detection is much faster; this is the best model, with superior detection speed and accuracy, that meets our needs.

We use the YOLOv5m model to process key frames captured by the camera rather than processing the entire video stream all the time. This approach is sufficient for intent recognition while improving resource efficiency for deployment on wearable devices. The capture of keyframes is controlled by gating signals sent by the MCU, which is responsible for controlling the camera and communicating with the computer. The motion state of the upper limb is determined by analyzing the sensor data captured by the micro-measurement unit (we will describe this section in Section 2.6). When the upper limb changes from motion to rest, the MCU is commanded to send a low-level signal to the camera that will control the camera to capture the current image. This image will be fed to the YOLOv5m model for processing.

2.6. Information Fusion Decision-Making Method

Firstly, the upper limb action curve of a healthy person is used as the research object to analyze the upper limb action intention. As shown in Figure 5, the angular velocity curves of a healthy person performing the drinking action and the glasses-wearing action are shown, respectively. The angular velocity values were obtained by differentiating the posture angle data. The value of angular velocity fluctuates greatly when the upper limbs move and is close to 0 when the upper limbs stop moving and are at rest. In Figure 5a, the

A B

and

C D

segments are the periods when the human hand picks up the cup and puts it down. In Figure 5b, the

A_{1} B_{1}

and

C_{1} D_{1}

segments are the periods when the human hand picks up the glasses and puts on the glasses. During these four periods, the upper limbs are in a static state. For stable control, the hand usually grasps the object only after the upper limb is at rest. Therefore, the dynamic and static state of the upper limb is one of the important bases for the recognition of the intention of the upper limb movement.

The action of human upper limbs to perform operations on common objects can be broken down into the two stages of grasping and subsequent processes. The type of target usually determines the intention of the upper limb action [28]. In this paper, the target objects are divided into two categories according to their uses: wearable objects and non-wearable objects. Wearable objects are objects that can be worn on the human body or removed from the human body. When the target object is a wearable object and the hand is positioned close to the body, the robot needs to open or close the hand to put on or take off the object. When the target object is a non-wearable object and the hand is positioned close to the human body, the current process is only a transition process for the whole upper limb action, and the manipulator does not need to operate. While the position of the hand is located in front of the torso and the upper limb is stationary, it is generally necessary to grasp or put down the object, at which time the manipulator should be executing an open or close operation. According to the above description, this paper uses the upper limb motion state, target object type, and upper limb end position to jointly decide the intention of the upper limb action; the specific logic is shown in Figure 6.

The Figure 7 shows a complete description of the prosthetic hand control system based on upper limb action intention recognition. Data gloves are worn on human upper limbs to collect the attitude angle data during the movement of the upper limbs. The pre-processed attitude angle data is differentiated to obtain the angular velocity value to judge the state of the upper limbs and serve as the input of the kinematics model of human upper limbs to calculate the position of the upper limbs. The upper limb end position is classified into three categories by the multilayer perceptron model. The miniature camera is used to capture the target object image, which is classified into two categories after the specific type is detected by the YOLOv5 model. Then, the action intention of the upper limb is determined by the combination of the upper limb motion state, the target object type, and the upper limb end position to achieve manipulator control.

3. Results and Discussion

3.1. Dataset

Two datasets are used in this paper: a coordinate dataset used to train a multilayer perceptron to classify upper limb end positions and a picture dataset used to train the YOLOv5 model.

Twenty volunteers were invited to participate in the experiment, including ten females and ten males, ranging in age from 18 to 40 years old. All of them met the standard of healthy individuals who could accurately control their body movements. Each volunteer wore data gloves on their right upper limbs (dominant side) and performed the actions of grabbing objects or lowering objects in front of the torso and wearing objects or removing objects near the upper body; each action was repeated 20 times. When the hand is open or closed, the upper limb will remain stationary for a moment. At this time, the attitude angle of the upper limb is transmitted to the computer and converted into position information through the upper limb kinematics model, that is, the three-dimensional coordinates of the end of the upper limb in the chest coordinate system. The final coordinate data set consisted of 22,500 data points and was divided into a training set and a test set according to a 9:1 ratio.

A 2-megapixel camera was used to collect 3000 pictures of each item under different lighting and angles as an image data set. The xml files recording the category information and location information were generated by labeling the images with the Labeling tool (an open-source image annotation tool), and we divided them into a training set and a test set according to the ratio of 8:2.

3.2. Experimental Equipment

The experimental equipment is shown in Figure 8, including four parts: a data glove, a miniature camera, an MCU (microcontroller unit), and a mechanical prosthetic hand. The data glove is responsible for collecting the posture angles of the upper arm, forearm, and opisthenar during upper limb movement; the miniature camera is fixed at the finger of the manipulator and connected with the MCU. The MCU triggers the camera to capture images at a low level. The mechanical prosthetic hand is also connected with the MCU. After the computer analyzes and calculates the data, it sends instructions to the single-chip microcomputer to control the opening or closing of the manipulator.

3.3. Upper Limb Model Validation

In this section, experiments are designed to verify the correctness of the upper limb kinematics model. The experimental method is to preset four fixed trajectories, then let the upper limb end move smoothly along the preset trajectories, and, finally, compare the actual motion trajectories with the preset trajectories to complete the verification. The four preset trajectories and the experimental process are shown in Figure 9.

Each trajectory is described separately below:

(1): Horizontal trajectory: The end of the upper limb draws a horizontal line. In the Y-Z plane, the end of the upper limb takes the shoulder as the origin and moves 25 cm in the direction of the increasing Y-axis coordinate values.
(2): Vertical trajectory: The end of the upper limb draws a vertical line. In the Y-Z plane, the end of the upper limb takes the shoulder as the origin and moves 25 cm in the direction of the growth of the Z-axis coordinate value.
(3): 45° oblique trajectory: The upper limb draws a 45° oblique line. In the Y-Z plane, the end of the upper limb takes the shoulder as the origin and moves 25 cm in the direction of 45°.
(4): Half-circle trajectory: The upper limb draws a half-circle line. The end of the upper limb draws a half-circle line with the shoulder as the center of the circle and a radius of 65 cm in the Y-Z plane; the start point is (0 cm, −65 cm), and the end point is (0 cm, 65 cm).

We invited ten volunteers to repeat the execution of the above four trajectories three times, and the comparison between the preset trajectory and the trajectory solved by the method in this paper is shown in Figure 10.

From Figure 10, it can be observed that the solution results of three experiments for each trajectory are slightly different from the preset value, but the trend of the solution trajectory is basically consistent with the preset trajectory, which proves the correctness of the upper limb model and the upper limb position solution method in this paper. In terms of accuracy, this paper places more emphasis on the accuracy of the key position point solution results. The error analysis of the above four trajectory start and end points is used to verify the accuracy of position solving, and the error formula is as follows.

e r r_{p} = \sqrt{{(y_{c} - z_{c})}^{2} + {(y_{d} - z_{d})}^{2}}

(8)

where

(y_{c}, z_{c})

is the position coordinates of the solved position and

(y_{d}, z_{d})

is the position coordinates of the preset trajectory. The formula is essentially a calculation of the linear distance between the solved position and the preset position as the representation value of the error. The error values of the four trajectories performed by each of the ten volunteers were averaged, and the results of the error calculation are shown in Table 1.

As can be observed from Table 1, the error values of the experimental solutions are all within 7 cm, which is within an acceptable range; this proves that the accuracy of the upper limb position solution is good.

3.4. Multilayer Perceptron Model Analysis

In the training stage of the multilayer perceptron model, the learning rate was set as 0.005 and the learning rate decay rate was set as 0.99. Batch size was set to 200, training a total of 100 epochs. L2 regularization was used to reduce model overfitting, and the regularization coefficient was 0.1. In order to further reduce the risk of overfitting, a dropout layer was added after the two hidden layers, a certain number of neurons were randomly deactivated, and this probability was uniformly set to 0.2. The trained model was evaluated for performance on a pre-divided coordinate test set. The prediction results were made into a confusion matrix, as shown in Figure 11. In this paper, three evaluation indicators: accuracy, F1 score (macro), and the kappa coefficient, are used to evaluate the multi-classification effect. The indicator scores are shown in Table 2. It is verified that the model has good multi-classification performance.

3.5. YOLOv5 Model Training Analysis

mAP (mean average precision) is the average value of AP (average precision), which is the main evaluation indicator of the target detection algorithm. The higher the mAP value, the better the detection effect of the target detection model on the data set. Its calculation formula is:

m A P = \frac{\sum_{i = 1}^{N} P (i) Δ R (i)}{K}

(9)

where K is the number of categories in the detection task, N is the total number of samples in the test set,

P (i)

is the precision when

i

samples are simultaneously detected, and

Δ R (i)

is the difference score of the recall rate when the number of samples changes from

i - 1

to

i

.

In this paper, the YOLOv5 learning platform is built in the Linux environment. The test environment is NVIDIA GeForce PTX 3080, with 16 G memory GPU, the CUDA version is 11.3, and the cuDNN version is 8.2.1. The YOLOv5m training model was adopted. The model batch size was set to 30, and 100 epochs were trained. Finally, the mAP of the model reached 0.9601, indicating that the detection effect of the YOLOv5m model was good.

3.6. Upper Limb Action Intention Recognition Method Validation

In this section, experiments are designed to verify the feasibility of the upper limb action intention recognition method. We invited 10 volunteers, including 5 men and 5 women, aged from 18 to 40 years, to do the experiment. The implementation included seven daily life actions, and each volunteer performed each action 20 times. The volunteers wore data gloves and held manipulators to simulate the situation of patients with upper limb disabilities wearing prosthetic hands.

The experimental process is as follows: First, the user controls the manipulator to approach the target object, and the miniature camera on the manipulator collects the image of the target object. After the image is collected successfully, the action intention recognition method of the upper limb on the PC end is triggered, and the manipulator sends instructions to the MCU for control through the PC. In each experiment, the upper limb end position recognition and target object recognition results were recorded at the PC end.

As shown in Figure 12, lines 1 to 7 correspond to the action flow of drinking water, combing hair, answering the phone, putting on a hat, wearing glasses, taking off glasses, and moving a cup, respectively. The first column corresponds to the state of action preparation, and the second column corresponds to the state of pictures collected by the micro camera on the hand of the mechanical assistant.

The criterion for the success of the experiment is that the position of the end of the upper limb and the type of the target object are identified successfully during the experiment and the experiment process is implemented completely. The experimental results are shown in Table 3.

Each action was performed 200 times. The recognition accuracy of the position of the upper limb end reached 100%. Due to the uncertainty of the hand position during image acquisition, the field of view of the camera installed on the hand may only contain a small part of the target object, resulting in the failure of target object recognition; the success rate of target object detection was 92.4%. Finally, the success rate of the upper limb action intention recognition experiment reached 92.4%, which verified the feasibility and generality of the proposed upper limb action intention recognition method.

4. Conclusions

In this paper, we proposed an upper limb action intention recognition method based on the fusion of posture information and visual information to solve the problem of action intention recognition from a new perspective. We collected attitude angle data during the upper limb movement to determine the motion state of the upper limb. We used positive kinematic theory to build an easy-to-analyze mathematical model of human upper limbs to obtain the end positions of upper limbs and designed experiments to verify the correctness of the model. Then, we used a multilayer perceptron model to classify the end positions into three categories, and the model classification rate reached 95.78%. In addition, we mounted a miniature camera on the hand to obtain image information of the target object and used the YOLOv5 model to classify the object, in which the trained YOLOv5 model had good recognition performance, with an mAP of 0.9601. According to the purpose of the target object, the objects recognized by the YOLOv5 model are further classified into wearable and non-wearable objects. Finally, according to the upper limb motion state, the upper limb end position, and the target object type, the mechanical prosthetic hand operation is jointly decided. The intention recognition method in this paper was applied to the control of a mechanical disabled hand, and seven experiments were completed, including drinking water, combing hair, answering the phone, putting on a hat, wearing glasses, taking off glasses, and moving a cup. The success rate of the experiments reached 92.4%, which shows that the intention recognition method proposed in this paper is feasible and performs well.

Although the method designed in this paper has good results for upper limb action intention recognition and mechanical prosthetic hand control, there are still some problems that need further improvement and analysis. In this paper, the micro camera is installed on the finger of the mechanical prosthetic hand to obtain the image of the target object, and the position of the prosthetic hand, when grasping the object, has higher requirements to ensure that the camera has a good field of vision. There is a burden on the user to ensure that the object is in the center of the camera’s field of view. Therefore, future research will consider the use of a global camera that can ensure that the target object is in the camera’s field of view in the first-person perspective and can obtain more abundant information about the environment, the state of the prosthetic hand, the state of the upper limb, and so on.

Author Contributions

Project management and experimental guidance, J.-W.C.; Manuscript writing and data analysis, H.D.; data collection, B.-Y.Y. and X.-J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under-grant (61873063).

Informed Consent Statement

Written informed consent was obtained from the patient(s) to publish this paper.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Brahmi, B.; Saad, M.; Rahman, M.H.; Ochoa-Luna, C. Cartesian Trajectory Tracking of a 7-DOF Exoskeleton Robot Based on Human Inverse Kinematics. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 600–611. [Google Scholar] [CrossRef]
Gong, Z.; Fang, X.; Chen, X.; Cheng, J.; Xie, Z.; Liu, J.; Chen, B.; Yang, H.; Kong, S.; Hao, Y.; et al. A soft manipulator for efficient delicate grasping in shallow water: Modeling, control, and real-world experiments. Int. J. Robot. Res. 2020, 40, 449–469. [Google Scholar] [CrossRef]
Chwa, D.; Kwon, H. Nonlinear Robust Control of Unknown Robot Manipulator Systems with Actuators and Disturbances Using System Identification and Integral Sliding Mode Disturbance Observer. IEEE Access 2022, 10, 35410–35421. [Google Scholar] [CrossRef]
Ahmadizadeh, C.; Khoshnam, M.; Menon, C. Human Machine Interfaces in Upper-Limb Prosthesis Control: A Survey of Techniques for Preprocessing and Processing of Biosignals. IEEE Signal Process. Mag. 2021, 38, 12–22. [Google Scholar] [CrossRef]
Rodriguez-Tapia, B.; Soto, I.; Martinez, D.M.; Arballo, N.C. Myoelectric Interfaces and Related Applications: Current State of EMG Signal Processing—A Systematic Review. IEEE Access 2020, 8, 7792–7805. [Google Scholar] [CrossRef]
Song, J.; Zhu, A.; Tu, Y.; Huang, H.; Arif, M.A.; Shen, Z.; Zhang, X.; Cao, G. Effects of Different Feature Parameters of sEMG on Human Motion Pattern Recognition Using Multilayer Perceptrons and LSTM Neural Networks. Appl. Sci. 2020, 10, 3358. [Google Scholar] [CrossRef]
Chai, Y.; Liu, K.; Li, C.; Sun, Z.; Jin, L.; Shi, T. A novel method based on long short term memory network and discrete-time zeroing neural algorithm for upper-limb continuous estimation using sEMG signals. Biomed. Signal Process. Control 2021, 67, 102416. [Google Scholar] [CrossRef]
Nicolas-Alonso, L.F.; Gomez-Gil, J. Brain computer interfaces, a review. Sensors 2012, 12, 1211–1279. [Google Scholar] [CrossRef]
Abiri, R.; Borhani, S.; Sellers, E.W.; Jiang, Y.; Zhao, X. A comprehensive review of EEG-based brain-computer interface paradigms. J. Neural Eng. 2019, 16, 011001. [Google Scholar] [CrossRef]
Sun, H.; Ang, J.; Yun, F.; Chuan, W.; Xiong, X.; Zheng, T. Identification of EEG Induced by Motor Imagery Based on Hilbert-Huang. Acta Autom. Sin. 2015, 41, 1686–1692. [Google Scholar]
He, Q.; Shao, D.; Wang, Y.; Zhang, Y.; Xie, P. Analysis and intention recognition of motor imagery EEG signals based on multi-feature convolutional neural network. Chin. J. Sci. Instrum. 2020, 41, 138–146. [Google Scholar]
Buerkle, A.; Eaton, W.; Lohse, N.; Bamber, T.; Ferreira, P. EEG based arm movement intention recognition towards enhanced safety in symbiotic Human-Robot Collaboration. Robot. Comput.-Integr. Manuf. 2021, 70, 102137. [Google Scholar] [CrossRef]
Zhang, D.; Yao, L.; Chen, K.; Wang, S.; Chang, X.; Liu, Y. Making Sense of Spatio-Temporal Preserving Representations for EEG-Based Human Intention Recognition. IEEE Trans Cybern 2020, 50, 3033–3044. [Google Scholar] [CrossRef]
Le, N.; Rathour, V.S.; Yamazaki, K.; Luu, K.; Savvides, M. Deep reinforcement learning in computer vision: A comprehensive survey. Artif. Intell. Rev. 2021, 55, 2733–2819. [Google Scholar] [CrossRef]
Ghazaei, G.; Alameer, A.; Degenaar, P.; Morgan, G.; Nazarpour, K. Deep learning-based artificial vision for grasp classification in myoelectric hands. J. Neural Eng. 2017, 14, 036025. [Google Scholar] [CrossRef] [PubMed]
Shi, C.; Yang, D.; Zhao, J.; Liu, H. Computer Vision-Based Grasp Pattern Recognition with Application to Myoelectric Control of Dexterous Hand Prosthesis. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2090–2099. [Google Scholar] [CrossRef] [PubMed]
Gardiner, P.V.; Small, D.; Munoz-Esquivel, K.; Condell, J.; Cuesta-Vargas, A.; Williams, J.; Machado, P.M.; Garrido-Castro, J.L. Validity and reliability of a sensor-based electronic spinal mobility index for axial spondyloarthritis. Rheumatology 2020, 59, 3415–3423. [Google Scholar] [CrossRef]
Monoli, C.; Fuentez-Perez, J.F.; Cau, N.; Capodaglio, P.; Galli, M.; Tuhtan, J.A. Land and Underwater Gait Analysis Using Wearable IMU. IEEE Sens. J. 2021, 21, 11192–11202. [Google Scholar] [CrossRef]
Zucchi, B.; Mangone, M.; Agostini, F.; Paoloni, M.; Petriello, L.; Bernetti, A.; Santilli, V.; Villani, C. Movement Analysis with Inertial Measurement Unit Sensor after Surgical Treatment for Distal Radius Fractures. Biores. Open Access 2020, 9, 151–161. [Google Scholar] [CrossRef]
Ashry, S.; Ogawa, T.; Gomaa, W. CHARM-Deep: Continuous Human Activity Recognition Model Based on Deep Neural Network Using IMU Sensors of Smartwatch. IEEE Sens. J. 2020, 20, 8757–8770. [Google Scholar] [CrossRef]
Cui, J.; Cao, E.; Lu, P.; Li, Z. Arm motion recognition method based on sub-motion feature matrix and DTW algorithm. J. Southeast Univ. 2021, 51, 679–686. [Google Scholar]
Xuan, B.; Liu, Z.; Chen, L.; Yang, P. Motion intention recognition and control of above knee prosthesis. J. Southeast Univ. 2017, 47, 1107–1116. [Google Scholar]
Yang, Q.; Fang, Q.; Ma, X.; Zhao, X.; Zhao, J.; Qian, L.; Song, B. Kinematics and workspace analysis of 7-DOF upper-limbed rehabilitation robot. In Proceedings of the IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, Shenyang, China, 8–12 June 2015. [Google Scholar]
Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
Pang, Y.; Yuan, Y.; Li, X.; Pan, J. Efficient HOG human detection. Signal Process. 2011, 91, 773–781. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
Li, D.; Wang, R.; Chen, P.; Xie, C.; Zhou, Q.; Jia, X. Visual Feature Learning on Video Object and Human Action Detection: A. Systematic Review. Micromachines 2022, 13, 72. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, H.; Geng, J.; Jiang, W.; Deng, X.; Miao, W. An information fusion method based on deep learning and fuzzy discount-weighting for target intention recognition. Eng. Appl. Artif. Intell. 2022, 109, 0952–1976. [Google Scholar] [CrossRef]

Figure 1. Sensor installation positions.

Figure 2. Mathematical model of the human upper limb.

Figure 3. Upper arm end position classification. (a) Initial position. (b) Torso front. (c) Upper body nearby.

Figure 4. Multilayer perceptron model structure.

Figure 5. Angular velocity curve of the movement of a healthy person. (a) Angular velocity curve of drinking. (b) Angular velocity curve of wearing glasses.

Figure 6. Upper limb action intention recognition logic.

Figure 7. Complete control system structure.

Figure 8. Experimental equipment.

Figure 9. Four preset trajectories schematic.

Figure 10. Comparison results of solution trajectory and preset trajectory.

Figure 11. Multilayer perceptron confusion matrix.

Figure 12. Seven kinds of action flow. A₁–A₆ represents the action of drinking water; B₁–B₆ represents the action of combing hair; C₁–C₆ represents the action of answering the phone; D₁–D₆ represents the action of putting on a hat; E₁–E₆ represents the action of wearing glasses; F₁–F₆ represents the action of taking off glasses; G₁–G₆ represents the action of moving a cup.

Table 1. Upper limb position solution error results.

Types of Trajectory	$Starting Point e r r_{p} Value / cm$	$End Point e r r_{p} Value / cm$
Horizontal trajectory	3.743	5.103
Vertical trajectory	3.208	5.452
45° oblique trajectory	3.778	4.201
Half-circle trajectory	4.034	6.423

Table 2. Multilayer perceptron model evaluation indicator score.

Evaluation Indicators	Accuracy	F1 Score (Macro)	Kappa Coefficient
Scores	0.9578	0.9595	0.9316

Table 3. Experimental results.

Action	Number of Experiments	Number of Correct Recognition of Upper Limb Position	Number of Correct Recognition of Target Object Type	Number of Successful Experiments Accuracy Rate	Accuracy
Drinking water	200	200	187	187	93.5%
Combing hair	200	200	177	177	88.5%
Answering the phone	200	200	191	191	95.5%
Putting on a hat	200	200	193	193	96.5%
Wearing glasses	200	200	174	174	87%
Taking off glasses	200	200	182	182	91%
Moving a cup	200	200	189	189	94.5%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, J.-W.; Du, H.; Yan, B.-Y.; Wang, X.-J. Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information. Electronics 2022, 11, 3078. https://doi.org/10.3390/electronics11193078

AMA Style

Cui J-W, Du H, Yan B-Y, Wang X-J. Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information. Electronics. 2022; 11(19):3078. https://doi.org/10.3390/electronics11193078

Chicago/Turabian Style

Cui, Jian-Wei, Han Du, Bing-Yan Yan, and Xuan-Jie Wang. 2022. "Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information" Electronics 11, no. 19: 3078. https://doi.org/10.3390/electronics11193078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Platform and Acquisition

2.2. Sliding Window Method

2.3. Upper Limb Kinematics Modeling

2.4. Upper Limb Position Prediction Based on Multilayer Perceptron

2.5. Object Detection Based on YOLOv5

2.6. Information Fusion Decision-Making Method

3. Results and Discussion

3.1. Dataset

3.2. Experimental Equipment

3.3. Upper Limb Model Validation

3.4. Multilayer Perceptron Model Analysis

3.5. YOLOv5 Model Training Analysis

3.6. Upper Limb Action Intention Recognition Method Validation

4. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI