Next Article in Journal
Improvement of Distance Measurement Based on Dispersive Interferometry Using Femtosecond Optical Frequency Comb
Next Article in Special Issue
Validation of a Sensor-Based Dynamic Ski Deflection Measurement in the Lab and Proof-of-Concept Field Investigation
Previous Article in Journal
Evaluation of Mesoporous TiO2 Layers as Glucose Optical Sensors
Previous Article in Special Issue
Proposal of an Alpine Skiing Kinematic Analysis with the Aid of Miniaturized Monitoring Sensors, a Pilot Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detecting and Visualizing Stops in Dance Training by Neural Network Based on Velocity and Acceleration

Division of Information and Electronic Engineering, Muroran Institute of Technology, Muroran 050-8585, Japan
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(14), 5402; https://doi.org/10.3390/s22145402
Submission received: 23 June 2022 / Revised: 13 July 2022 / Accepted: 15 July 2022 / Published: 20 July 2022
(This article belongs to the Special Issue Sensor Technology for Sports Monitoring)

Abstract

:
Various genres of dance, such as Yosakoi Soran, have contributed to the health of many people and contributed to their sense of belonging to a community. However, due to the effects of COVID-19, various face-to-face activities have been restricted and group dance practice has become difficult. Hence, there is a need to facilitate remote dance practice. In this paper, we propose a system for detecting and visualizing the very important dance motions known as stops. We measure dance movements by motion capture and calculate the features of each movement based on velocity and acceleration. Using a neural network to learn motion features, the system detects stops and visualizes them using a human-like 3D model. In an experiment using dance data, the proposed method obtained highly accurate stop detection results and demonstrated its effectiveness as an information and communication technology support for remote group dance practice.

1. Introduction

There are various genres of dance around the world, such as ballet, hip hop, and Japanese dance. Like sports, dance is popular for its strong exercise benefit [1,2,3,4,5,6,7]. Dance also has artistic aspects linked to the culture of a region and country, and it is effective for revitalizing the community of a local region [8,9]. To perfect dance techniques, it is essential for a performer to repeatedly practice choreography composed specifically for a song. In general, the performer practices by getting advice on choreography from an instructor and watching sample videos of the choreography [10,11]. However, it is challenging to practice face-to-face because of the spread of COVID-19 infection in recent years. In addition, both the number of instructors and the time available for practice are limited. However, information and communication technology (ICT) and artificial intelligence (AI) technologies can now be used to develop techniques to improve the efficiency of dance practice.
The Yosakoi Soran Festival is a typical Japanese dance event held in Hokkaido. This annual event has an economic effect of more than 20 billion yen (around $173 million USD) because many tourists visit it (https://app.yosakoi-soran.jp/news/view/324, accessed on 16 March 2022). Since Yosakoi Soran [12] is a group dance in which ordinary citizens participate, it is important not only to choreograph individual dances but also to synchronize the troupe members’ movements. In particular, abrupt stops during a dance are important in choreography and are especially important for members of a troupe to synchronize with each other. Abrupt stops are an important component not only of Yosakoi Soran but also in traditional Asian dances in China, India, and elsewhere. Figure 1 shows how a stop is performed. Although synchronization of stops among troupe members is key to the successful performance of Yosakoi Soran, it is difficult for performers to notice slight differences in timing between themselves and the other performers. Therefore, it is necessary to construct a dance practice system that detects the timing of stops in a remote environment or in an on-demand format. Our proposal supports individual stopping practice and thus can contribute to improving dance skills.
Various researchers have recently proposed analyses of dance movements using motion capture (MoCap) for teaching and training [13,14,15,16,17,18,19,20]. Those analyses have focused on the performer’s pose during a dance. In [13,14], a dance practice support system using MoCap was developed. MoCap-based systems obtain the dance movements of experts and nonexpert performers, calculate the differences between the two, and feed those differences back to the latter. These studies show the effectiveness of using MoCap to support dance practice. In addition, it has been reported that performers prefer dance practices based on an interactive system rather than conventional practice methods such as watching choreography videos and direct instruction by instructors [21]. For Yosakoi Soran in particular, an interactive practice system must help performers synchronize their starts and stops with those of the other troupe members. The importance of synchronized stops during a group dance is shown in [22]. Therefore, it is necessary to analyze the velocities of the joints of a performer during a dance. Moreover, since the choreography of a stop involves the sharpness of movement, analysis focusing on acceleration is also important.
This paper proposes a new stop detection method based on a neural network using a motion sensor’s velocity and the acceleration of joint position. An overview of the proposed method is shown in Figure 2. First, the motion data of a Yosakoi Soran performer is obtained by using MoCap. Next, motion features based on velocity and acceleration are calculated from the time-series data of the 3D coordinates of the right fist. Then, a stop is detected based on a feedforward neural network (NN) [23] using those features. Finally, the detected stops are visualized with a 3D humanoid model. The proposed method contributes to the automatic evaluation of Yosakoi Soran movements because it detects the choreographed timing of stops. In addition to streamlining individual repetition practice for Yosakoi Soran, this method is expected to be adapted for various other dance forms that use stops.
This paper is organized as follows. In Section 2, the measurement method of the dance movement by MoCap is explained. In Section 3, the proposed method is explained. Specifically, the method for calculating features based on velocity and acceleration from dance motions and the method for visualizing detected stops are described. In Section 4, the experiments are described. Specifically, the effectiveness of the proposed method is shown by comparing the accuracy of the detection and the visualization of stops between the proposed method and methods for comparison. In Section 5, the conclusions of this paper and future work are described.

2. Recording of Dance Motion Data by MoCap

In this section, we explain how to record a performer’s motion data by MoCap to detect stops in Yosakoi Soran. Using Perception Neuron 2.0 (https://www.noitom.com/, accessed on 16 March 2022) (PN), which is NOITOM’s motion capture shown in Figure 3, the performer’s dance motion is recorded while the performer wears the MoCap. The MoCap consists of multiple small sensors that measure inertia, such as a gyroscope and an accelerometer. Based on the data acquired from these sensors, the posture and position of the performer are estimated.
There are two main advantages of PN. While PN is less accurate than optical MoCaps in estimating posture and position, PN costs only about 1/100 to 1/1000 the cost of an optical MoCap. From this point of view, the practical application of the proposed system becomes realistic. The other advantage of the proposed system is that it has fewer restrictions on its use. For example, an optical MoCap is highly accurate but requires a dedicated studio with multiple cameras and can measure motion only within a limited area. In contrast, PN can be used anywhere if nearby metal or magnetic objects are kept away. On the other hand, PN has some disadvantages: over a long period, errors in a PN gradually increase due to problems in posture- and position-estimation methods. Moreover, the accuracy of position estimation is low. However, if the PN is calibrated periodically and used in a large space where the influence of electromagnetic waves is minimized, the error can be minimized to the extent possible. Actually, Yosakoi musical pieces are short (about 3–10 min each), so the effect of errors caused by continuous use is considered small. Instead, PN is used to analyze various movements such as surgical simulation [24] and the analysis of sports movements [25]. Within this background, we adapt PN for our analysis of dance movement.
In dance, the part of the body that has the widest range of movement is the hand. Therefore, in this study, as shown in Figure 3, the MoCap is attached to the performer’s hand, and the dance movement, including stops, is recorded. This study describes a method of detecting stops using the movement trajectory data on 3D coordinates of the back of the right hand among the recorded data.

3. Detection and Visualization of Stops by NN Based on Velocity and Acceleration

This section describes the proposed method of detecting stops based on NN and the visualization of detected stops. First, the motion features consist of velocities, and acceleration is calculated using the movement trajectory data of the back of the right hand as recorded by the MoCap. Section 3.1 describes the method of calculating the motion features of a stop. Then, the detection model is constructed based on NN using those features in Section 3.2, and Section 3.3 describes the method of visualizing the detected stops. The details are shown below.

3.1. Calculation of Motion Features

Since stopping is an operation that pauses choreographic movement for a moment, the characteristics of the velocity and acceleration of the hand are calculated. First, we define the movement trajectory data f i , j ( i = 1 , 2 , , N ; j { x , y , z } | N is the number of series of movement loci) of the back of the right hand from MoCap.

3.1.1. Calculation of Velocity

Since the sensor mounted on the MoCap is extremely sensitive, the obtained movement trajectory data contain high-frequency components such as noise. It is desirable to apply smoothing to the movement trajectory data to remove noise in advance because the calculation of velocity and acceleration requires differentiation of the movement trajectory data. Therefore, the moving average processing [26] is applied to f i , j , as shown in Equation (1).
f i , j ma = 1 2 n + 1 k = i n i + n f k , j ,
where, n is an arbitrary natural number. Note that n and the sampling rate of f k , j were empirically set to 5 (i.e., window size being 12) and 60 Hz, respectively, based on prior equipment preparation for data acquisition. Then, the velocity f i , j is calculated by the 5-point approximation formula of the derivative in Equation (2).
f i , j = f i 2 h , j ma 8 f i h , j ma + 8 f i + h , j ma f i + 2 h , j ma 12 h ,
where, although h is a minute width of differentiation, h = 1 is set to obtain the displacement for each frame in this paper. If a high-frequency component is also generated in the calculated velocity f i , j , the velocity f i , j is applied to the subsequent calculation of the acceleration. Therefore, it is necessary to smooth the transition of f i , j . However, the velocity f i , j may change drastically, and if moving average processing is applied, the size characteristics may be impaired. Therefore, the following root mean square f i , j rms is calculated from the velocity f i , j in Equation (3).
f i , j rms = 1 2 n + 1 k = i n i + n ( f k , j ) 2 .
Then, the one-dimensional velocity f i abs is calculated from the three-dimensional velocity as shown in Equation (4).
f i abs = ( f i , x rms ) 2 + ( f i , y rms ) 2 + ( f i , z rms ) 2 .
Finally, the change in velocity f i abs is smoothed by reapplying Equation (2) to the result obtained in Equation (4). Note that the velocity v i norm is calculated by normalization in the range of 0 to 1. Moreover, the minimum and maximum values used in normalization are determined independently for each set of motion data acquired from each subject. An example of the calculated velocity v i norm is shown in Figure 4. The green, red, and blue circles indicate frames that have been visually confirmed to be stopped. It can be confirmed that some of the frames whose velocities are minimal or approach 0 are stops (frames of stop possibility, i.e., of the possibility of stopped motion). The green, red, and blue circles are defined as short, normal, and long stops, respectively. Moreover, among the stop possibility frames, a frame that shows no stop is defined as no stop. Table 1 shows the characteristics of each stop. However, since the characteristics of short, normal, and long stops differ, they are labeled separately to improve the NN’s accuracy of stop detection below.

3.1.2. Calculation of Acceleration

To detect stopping, we calculate acceleration, which is strongly related to the sharpness of movements. Specifically, using the calculated velocity v i norm , the first derivative a i of the velocity v i norm is calculated by the 5-point approximation formula, as shown in Equation (5).
a i = v i 2 h norm 8 v i h norm + 8 v i + h norm v i + 2 h norm 12 h ,
where, to obtain the displacement for each frame, h = 1 is set in this paper. Then, a i norm is calculated by normalizing the result obtained by Equation (5) in the range of 1 to 1, and the minimum and maximum values are determined the same way as the calculation of v i norm . At this time, the frame at the moment when a i norm becomes a value from negative to zero or more is the frame when velocity v i norm becomes the minimum. However, the frame at the moment when the velocity v i norm approaches zero is slightly delayed with this method. Therefore, the acceleration a i fil is calculated from a i norm as shown in Equation (6).
a i fil = 0 if 0.05 a i norm 0.05 , a i norm else .
Note that the threshold (=0.05) is set to suppress minute discrepancies, determined empirically from test data from multiple experiments. By calculating a i fil , it is possible to find all stop possibility frames. An example of the acceleration a i fil is shown in Figure 5. The green, red, and blue circles are frames whose accelerations change from negative to 0 and which correspond to each stop or stop possibility frame in Figure 4.

3.1.3. Construction of Training Data Set for Stop Detection

The NN’s training data are constructed by using velocity and acceleration. The frame number of each stop possibility frame in Figure 4 and Figure 5 is called t. In addition, the operation’s stop time tended to be about 25 to 50 frames. Therefore, the characteristics related to before and after the stop possibility frame t are calculated from Equations (7)–(10) by using v i norm and a i fil . As a result, the NN’s training data for four elements r t = [ V back ( t ) , V forward ( t ) , A back ( t ) , A forward ( t ) ] T are obtained. These are approximations of the graph area of Figure 4 and Figure 5 in the range of 25 frames before and after the t frame.
V back ( t ) = 1 2 k = t 25 t 1 ( v k norm + v k + 1 norm ) ,
V forward ( t ) = 1 2 k = t t + 24 ( v k norm + v k + 1 norm ) ,
A back ( t ) = 1 2 k = t 25 t ( a k fil + a k + 1 fil ) ,
A forward ( t ) = 1 2 k = t t + 25 ( a k fil + a k + 1 fil ) .

3.2. Construction of a Stop Detection Model of the NN

This section describes how to detect a stop by forwarding the propagation type of the NN using the motion features calculated in the previous section. The NN used in the proposed method consists of an input layer, a hidden N-layer, and an output layer. The motion features of each frame calculated in the previous section are used as a vector with r t as input data. In the output, the values c = 1 , , C of the class indicating the detection result are associated. Let the number of nodes in the input, hidden, and output layers be M , L , and C, respectively. The bias is set by x 0 = 1 . Let the vector of motion features r t correspond to ( x 1 , , x M ) . The following propagation equation is obtained at the node of the l in the middle layer, as shown in Equation (11).
h l ( r t ) = h m = 0 M w l , m 1 x m ,
where, w m , l 1 is the weight between the input layer and the hidden layer. Note that the rectified linear unit (RELU) function [27] is used as the activation function h ( · ) . The linear sum y c ( r t ) of the output of the hidden element, including the bias and the weight w l , c 2 between the hidden and output layers, is obtained as shown in Equation (12). y c ( r t ) is the c-th output value in the output layer.
y c ( r t ) = l = 0 L w c , l 2 h l ( r t ) .
Moreover, the following s c can be obtained as the probability value indicating the class by conversion with Equation (13) (softmax function [28]) using all the outputs in the output layer.
s c = exp { y c ( r t ) } c = 1 C exp { y c ( r t ) } .
In the NN’s training, the weights and biases for each layer that minimize the cross-entropy error [29] are determined by using the training data set created in the previous section. Finally, the short, normal, long, and no stops are classified by inputting the same test data as the training data format to the trained NN.

3.3. Visualization of a Stop by Human-like 3D Model in Virtual Reality

This section describes the visualization of a stop by the proposed system. The system described below was developed using Unity (https://unity.com, accessed on 16 March 2022), a virtual reality (VR) development environment. Unity is used to visualize motion analysis research [20,30] and is also useful for actual application development. So, we used the development engine of this virtual environment. For effective practice, it is important for the system to visualize the differences in stop timing between the expert and the performer. Our proposed system can easily confirm the detection results, side by side, of stops made by the expert and by the performer. In the following sections, we describe the functions of the visualization system in detail.
First, using the stop detection model constructed in the previous section, we obtain the stop detection results from the motion data for testing. Next, we prepare the two human 3D models shown in Figure 6 in the VR space and adapt the dance motion. By clicking the Load button below the 3D model, the recorded dance motion is applied to each 3D model. By adopting the dance motion to the human 3D model, the model moves in the same way as the dance motion recorded by motion capture and the color of the model changes to red only when a stop is performed during the dance movement, as shown in Figure 7. The details of the various user interfaces (UI) of the visualization system are shown in Table 2. Specifically, the UI allows the user to fast-forward and rewind the motion data at 0.5×, 1×, and 2× speeds. The upper-right corner of the screen displays the current frame number, giving the performer an idea of the timing of the choreography he or she wants to check. One of them is the dance movement of the performer, and the other is the dance movement of the expert. This allows the performer to visually learn the difference in posture and timing of the stops of the performer and the expert.

4. Experimental Section

This section verifies the effectiveness of stop detection by the proposed method. It also verifies the visualization accuracy of detected stops.

4.1. Verification of Stop Detection Accuracy

In this section, the accuracy of the proposed method in detecting stops is verified by using comparative methods.

4.1.1. Verification of the Effectiveness of the Proposed Method

In this experiment, the stop detection accuracy is verified. First, the data used in this experiment are described. To construct a high-quality dataset of Yosakoi Soran motion in the first 90 s of the song "Yochore", a total of five dance movements performed by three experienced dancers were recorded and used in the experiment. The details of each dance movement and about the three expert dancers (Subject A–Subject C) are shown in Table 3. In the experiment, we confirmed that stable measurement was achieved with data acquired multiple times by calibrating the PN in a space free of electromagnetic radiation. The choreography of “Yochore” used in the experiment is available on YouTube (https://www.youtube.com/watch?v=FRMOpCPw2xA&t=0s, accessed on 16 March 2022). The number of expert performers was set to an appropriate value based on the [13,14,15,17,20,31], which performed the motion analysis. The choreography of the Yosakoi dance was composed for ordinary people to earn and reproduce the characteristics of each Yosakoi team, and many various dance processes have been generated at the festival. It is not easy to correct many samples for a partial process of each dance. According to the team’s situation of the Yosakoi festival, 679 data were prepared for the detection of the stops in the dance process.
To evaluate the stop detection accuracy, six methods (Comp. 1–Comp. 6) were used to compare the stop detection accuracy with that of the proposed model (PM). The outline of each method is shown in Table 4. Note that the batch size in the proposed method is set to 581 and the training rate is set to 0.001. The parameters of each comparative method were determined experimentally. Since long short-term memory (LSTM) [32] is analyzing series data in generally, the input data shapes in Comp. 2 and Comp. 3 are changed. Specifically, backwards and forwards are combined in the time direction for the velocity and acceleration features calculated from Equations (7)–(10), respectively. Moreover, the same preprocessing is performed on the time-series data. For this reason, the temporal window size of LSTM is 2, and the number of input nodes is half that of the NN. The effectiveness of feature data is examined by using time-series data. Specifically, the effectiveness of the NN at detecting stops is shown by using LSTM, which is said to be effective for time-series analysis. The effectiveness of the PM is shown below.
  • The effectiveness of the feature data is confirmed by comparing the detection accuracy of the stops by the PM and Comp. 1.
  • The effectiveness of stop detection by the NN is confirmed by comparing the detection accuracy of stops by the PM and Comp. 2.
  • The effectiveness of combining feature data and NN in detecting stops is confirmed by comparing the accuracy of the PM and Comp. 3.
  • We verify that the PM is more effective than traditional supervised learning-based methods by comparing the detection accuracy of stops by the PM and Comp. 4–Comp. 6.
Based on the previous section, 679 feature data are constructed from these dance motion data and used as training data for the PM, Comp. 2, and Comp. 4–Comp. 6. Moreover, in the velocity v i norm and acceleration a i fil in the previous section, let t be the stop possibility frame in Figure 4 and Figure 5. In this case, 679 time-series data consisting of v i norm and a i fil corresponding to the interval of t 25 i t + 25 are used as training data for Comp. 1 and Comp. 3. Moreover, the time-series data are divided into training data, verification data, and test data as shown in Table 5.

4.1.2. Explanation of Evaluation Index

Next, the evaluation index is explained. In stop detection, it is important to detect a greater number of correct stops. For this reason, it is necessary to evaluate the recall for stop detection. On the other hand, it is also important to reduce false positives. Therefore, it is necessary to evaluate precision as well. For this reason, the F-measure, which is the harmonic mean of precision and recall, is used as the evaluation index for stop detection. The goal of the experiments is to detect stops, and it is impossible to classify short, normal, and long stops accurately, but this does not matter. Therefore, short, normal, and long stops are collectively referred to as stops. At this time, the F-measure is calculated by Equations (14)–(16).
Precision = TP TP + FP
Recall = TP TP + FN
F-measure = 2 × Precision × Recall Precision + Recall
where, true positive (TP) is a set composed of the predictions (stops) and correct answers (stops); false positive (FP) is a set composed of the prediction (stops) and correct answer (no stop); and false negative (FN) is a set composed of the prediction (no stops) and correct answer (stops). The four stop classes are prepared in our detection method via NN architecture. According to our basic concept of detecting the stops in Yosakoi dance, the evaluation is based on two classes: no stop and stops.

4.1.3. Results and Discussion

The stop detection results of all methods are shown in Table 6. The accuracy is based on the value of the test data when the loss of the validation data stops decreasing. The transition of the loss function of the training data and the verification data in the NN-based methods is shown in Figure 8. Note that all data were recognized as no stop, so the precision and F-measure are not calculated in Comp. 3.
First, from Table 6, the F-measure of the PM stops is shown to be higher than those in Comp. 1–Comp. 6. From this, it can be said that the PM is effective in stop detection. Specifically, first, it can be seen that the PM has succeeded in improving the recall and F-measure when compared with Comp. 1. This shows the effectiveness of introducing feature data. Next, by comparing the PM with Comp. 2, it can be seen that the recall, precision, and F-measure have been successfully improved. This indicates the effectiveness of introducing the NN in stop detection. Next, by comparing the PM with Comp. 3, it can be seen that the recall, precision, and F-measure have been successfully improved. This indicates the effectiveness of combining feature data and the NN in stop detection. Moreover, by comparing the PM and Comp. 2 with Comp. 4–Comp. 6, it can be seen that the precision and F-measure have been successfully improved. This indicates the effectiveness of the NN-based method in stop detection. From the above, the effectiveness of the PM was shown. From Figure 8, it can be seen that the learning stopped before the tendency of the loss function of the verification data began to increase. From this, it can be said that overfitting was successfully suppressed. From Table 6, when feature data are used in the same machine learning method, the precision, recall, and F-measure tend to be higher than when time-series data are used. From this, it can be said that the feature data can express the difference between stops and no stops better than the time-series data. From the PM results, it can be concluded that the most accurate stop detection model has been constructed from among the tested methods.

4.2. Confirmation of Visualization Timing of Stops

In this section, the visualization timing of the detected stops is confirmed.

4.2.1. Confirmation Method

The method by which we confirm the visualization accuracy of stops is explained. By applying the various methods used in the previous section to actual dance motion data, the visualization accuracy of the stops is evaluated. The dance motion data to be used in this experiment are 90 s of dance motion performed by another skilled dancer with 15 years of dance experience. This dance motion includes 32 stops (14 short, 15 normal, and 3 long stops). These dance motion data show that various models visualize only the stops performed by the right fist.
At this time, the number of times a correct stop is visualized is the number of detections, the number of times a no stop is visualized is the number of false positives, and the number of times a correct stop is not detected is the undetected number. The visualization accuracy of stops is quantitatively evaluated based on the number of detected stops, the number of false positives, and the number of undetected stops.

4.2.2. Results and Discussion

The visualization result of the stops is shown in Figure 9. However, the green part in the figure is a visualized short stop, the red part is a visualized normal stop, the blue part is a visualized long stop, the black part is a visualized no stop, and the gray part is a point in time where neither a stop nor a not stop was visualized. The numbers of stop detections, nondetections, and false positives are shown in Table 7.
First, Figure 9 and Table 7 show that the PM visualized more stops than Comp. 1–Comp. 4 and Comp. 6. Moreover, the number of false positives in the PM is less than that in Comp. 5. From this, we can judge that the PM is most effective in stop visualization. Then, the characteristics of the stops visualized in the PM are discussed. From Figure 9, a normal stop tends to be more easily visualized than a short or long stop. A normal stop has a longer stopping time than a short stop. These results suggest that the duration of a stop is a major factor in the criteria for judging a stop. In addition, the reason for the low visualization accuracy of a long stop is considered to be the insufficient number of data compared to the other labels.
Next, the false-positive behavior of the PM is discussed. When we checked the falsely detected motions, we found that folding motions such as waving tended to be falsely detected as stops. For example, consider a choreographed movement where the hands are waving. The apparent motion appears to stop only for a short time during the moment when the direction of the wave reverses. In particular, turning movements tended to be similar to those of short stops. From these points, it is considered that folding motions were mistakenly detected as short stops. However, considering the practical application of the stop detection model, the future challenge is how to increase recall while maintaining high precision. This problem is expected to be solved by increasing the number of high-quality data from people with dance experience.
Finally, we discuss aspects of different dance analysis research approaches. Table 8 shows the analysis and visualization functions in the related studies. As the table shows, all the methods perform movement analysis based on the characteristic elements of each dance. First, the proposed method and those in the literature [15,17,20] perform motion analysis based on features characteristic of a dance. In addition, the proposed method and those in the literature [13,14] provide motion analysis and training applications. From the above, it is important to link motion analysis and visualization technologies to develop from fundamental analysis to practical applications, such as dance practice and stage use.

5. Conclusions

In group dance, stops are important to improve the synchronization of the troupe’s movements. However, due to the effects of COVID-19, school, work, and so on, it is difficult to practice dancing in groups. This study proposes a dance practice support system based on detecting stops in a remote environment. The most accurate stop detection among machine learning methods was achieved by training the NN with velocity and acceleration features. Experiments showed the effectiveness of the proposed method. In addition, the detected stops were visualized using a humanoid 3D model. With these, remote control construction of a future dance practice environment was proposed.
In the experiment, the NN’s detected stops more accurately than comparative methods. On the other hand, the comparative methods also showed relatively high detection results. From this point of view, it can be said that it is possible to detect stops using various models, and it is expected that a system can be developed at low cost and have a direct effect on dance practice. In addition, because stops are an essential technique not only in Yosakoi Soran but also in a wide range of various genres of dance, the proposed system can be easily applied to other forms of dance by obtaining training data sets from other experienced dancers.
It is difficult to estimate 3D human pose with high accuracy, including depth, based on video images. Therefore, although MoCap was used in this study, the proposed method can be used if the coordinate information can be obtained in three dimensions, including depth. If video-based 3D human pose estimation technology including depth [36,37,38] can be put to practical use, a better training environment will be constructed.

Author Contributions

Conceptualization, Y.J., G.S. and H.S.; Data curation, Y.J.; Funding acquisition, G.S. and H.S.; Methodology, Y.J., G.S. and H.S.; Software, Y.J.; Validation, Y.J., G.S. and H.S.; Visualization, Y.J.; Writing—original draft, Y.J.; Writing—review & editing, G.S. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the ethics committee of Muroran Institute of Technology 16000018.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ICTInformation and Communication Technology
AIArtificial Intelligence
MoCapMotion capture
NNNeural network
PNPerception Neuron
ReLURectified Linear Unit
VRVirtual Reality
UIUser Interfaces
PMProposed method
LSTMLong Short-Term Memory
TPTrue Positive
FPFalse Positive
FNFalse Negative

References

  1. Lindqvist, G. The relationship between play and dance. Res. Danc. Educ. 2001, 2, 41–52. [Google Scholar] [CrossRef]
  2. Alpert, P.T. The health benefits of dance. Home Health Care Manag. Pract. 2011, 23, 155–157. [Google Scholar] [CrossRef]
  3. Ward, S.A. Health and the power of dance. J. Phys. Educ. Recreat. Danc. 2008, 79, 33–36. [Google Scholar] [CrossRef]
  4. Huddy, A.; Stevens, K. The teaching artist: A model for university dance teacher training. Res. Danc. Educ. 2011, 12, 157–171. [Google Scholar] [CrossRef]
  5. Pedro, R.; Stevens, K.; Scheu, C. Creating a cultural dance community of practice: Building authentic Latin American dance experiences. Res. Danc. Educ. 2018, 19, 199–215. [Google Scholar] [CrossRef]
  6. Green, J. Power, service, and reflexivity in a community dance project. Res. Danc. Educ. 2000, 1, 53–67. [Google Scholar] [CrossRef]
  7. Olvera, A.E. Cultural dance and health: A review of the literature. Am. J. Health Educ. 2008, 39, 353–359. [Google Scholar] [CrossRef]
  8. Hast, D.E. Performance, transformation, and community: Contra dance in New England. Danc. Res. J. 1993, 25, 21–32. [Google Scholar] [CrossRef]
  9. Jackson, J.B. The opposite of powwow: Ignoring and incorporating the intertribal war dance in the Oklahoma stomp dance community. Plains Anthropol. 2003, 48, 237–253. [Google Scholar] [CrossRef]
  10. Van Rossum, J.H. The dance teacher: The ideal case and daily reality. J. Educ. Gift. 2004, 28, 36–55. [Google Scholar] [CrossRef] [Green Version]
  11. Hong, J.C.; Chen, M.L.; Ye, J.H. Acceptance of YouTube applied to dance learning. Int. J. Inf. Educ. Technol. 2020, 10, 7–13. [Google Scholar] [CrossRef] [Green Version]
  12. Akiba, F. YOSAKOI SORAN as a site of re-localization and its relationship to Japanese pop culture. In Proceedings of the 13th World Congress of the International Association for Semiotic Studies (IASS/AIS), Kaunas, Lithuania, 26–30 June 2017; pp. 653–661. [Google Scholar]
  13. Chan, J.C.; Leung, H.; Tang, J.K.; Komura, T. A virtual reality dance training system using motion capture technology. IEEE Trans. Learn. Technol. 2011, 4, 187–195. [Google Scholar] [CrossRef]
  14. Hachimura, K.; Kato, H.; Tamura, H. A prototype dance training support system with motion capture and mixed reality technologies. In Proceedings of the 13th IEEE International Workshop on Robot and Human Interactive Communication, Kurashiki, Japan, 22–24 September 2004; pp. 217–222. [Google Scholar]
  15. Shiratori, T.; Nakazawa, A.; Ikeuchi, K. Detecting dance motion structure using motion capture and musical information. In Proceedings of the 10th International Conference on Virtual Systems and Multimedia, Ogaki, Japan, 17–19 November 2004; Volume 4, pp. 1287–1296. [Google Scholar]
  16. Nakazawa, A.; Nakaoka, S.; Ikeuchi, K.; Yokoi, K. Imitating human dance motions through motion structure analysis. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; Volume 3, pp. 2539–2544. [Google Scholar]
  17. Yoshimura, M.; Murasato, H.; Kai, T.; Kuromiya, A.; Yokoyama, K.; Hachimura, K. Analysis of Japanese dance movements using motion capture system. Syst. Comput. Jpn. 2006, 37, 71–82. [Google Scholar] [CrossRef]
  18. Solberg, R.T.; Jensenius, A.R. Optical or inertial? Evaluation of two motion capture systems for studies of dancing to electronic dance music. In Proceedings of the Systems, Man, and Cybernetics Conferences, Budapest, Hungary, 9–12 October 2016; pp. 469–474. [Google Scholar]
  19. Camurri, A.; El Raheb, K.; Even-Zohar, O.; Ioannidis, Y.; Markatzi, A.; Matos, J.M.; Morley-Fletcher, E.; Palacio, P.; Romero, M.; Sarti, A.; et al. WhoLoDancE: Towards a methodology for selecting motion capture data across different dance learning practice. In Proceedings of the 3rd International Symposium on Movement and Computing, Thessaloniki, Greece, 5–6 July 2016; pp. 1–2. [Google Scholar]
  20. Aristidou, A.; Stavrakis, E.; Charalambous, P.; Chrysanthou, Y.; Himona, S.L. Folk dance evaluation using laban movement analysis. ACM J. Comput. Cult. Herit. 2015, 8, 1–19. [Google Scholar] [CrossRef]
  21. Wang, Z. Modern social dance teaching approaches: Studying creative and communicative components. Think. Ski. Creat. 2022, 43, 100974. [Google Scholar] [CrossRef]
  22. Patrona, F.; Chatzitofis, A.; Zarpalas, D.; Daras, P. Motion analysis: Action detection, recognition and evaluation based on motion capture data. Pattern Recognit. 2018, 76, 612–622. [Google Scholar] [CrossRef]
  23. Zhang, Z.; Li, J.; Manikopoulos, C.; Jorgenson, J.; Ucles, J. HIDE: A hierarchical network intrusion detection system using statistical preprocessing and neural network classification. In Proceedings of the IEEE Workshop on Information Assurance and Security, St. Petersburg, Russia, 21–23 May 2001; Volume 85, p. 90. [Google Scholar]
  24. Kim, H.S.; Hong, N.; Kim, M.; Yoon, S.G.; Yu, H.W.; Kong, H.J.; Kim, S.J.; Chai, Y.J.; Choi, H.J.; Choi, J.Y.; et al. Application of a perception neuron® system in simulation-based surgical training. J. Clin. Med. 2019, 8, 124. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Akamatsu, Y.; Maeda, K.; Ogawa, T.; Haseyama, M. Classification of expert-novice level using eye tracking and motion data via conditional multimodal variational autoencoder. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 1360–1364. [Google Scholar]
  26. Box, G.E.; Pierce, D.A. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
  27. Agarap, A.F. Deep learning using rectified linear units (ReLU). arXiv 2018, arXiv:1803.08375. [Google Scholar]
  28. Memisevic, R.; Zach, C.; Pollefeys, M.; Hinton, G.E. Gated softmax classification. Adv. Neural Inf. Process. Syst. 2010, 23, 1603–1611. [Google Scholar]
  29. Kline, D.M.; Berardi, V.L. Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput. Appl. 2005, 14, 310–318. [Google Scholar] [CrossRef]
  30. Lin, C.Y.; Yang, Z.H.; Zhou, H.W.; Yang, T.N.; Chen, H.N.; Shih, T.K. Combining leap motion with unity for virtual glove puppets. In Proceedings of the IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), Taichung, Taiwan, 10–12 December 2018; pp. 251–255. [Google Scholar]
  31. Guo, H.; Sung, Y. Movement estimation using soft sensors based on Bi-LSTM and two-layer LSTM for human motion capture. Sensors 2020, 20, 1801. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar]
  33. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  34. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef] [Green Version]
  35. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  36. Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar] [CrossRef] [Green Version]
  37. Chen, C.H.; Ramanan, D. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7035–7043. [Google Scholar]
  38. Wang, L.; Chen, Y.; Guo, Z.; Qian, K.; Lin, M.; Li, H.; Ren, J.S. Generalizing monocular 3d human pose estimation in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
Figure 1. Demonstration of flow of motion in Yosakoi Soran dance. From left to right in the first line: no stop, stops, no stop. The stops factor is incorporated into the continuous dance process as a mid-period.
Figure 1. Demonstration of flow of motion in Yosakoi Soran dance. From left to right in the first line: no stop, stops, no stop. The stops factor is incorporated into the continuous dance process as a mid-period.
Sensors 22 05402 g001
Figure 2. Overview of the proposed system. The dance motion data are recorded by MoCap (Section 2). The proposed system consists of three phases (Section 3). First, the motion features based on velocity and acceleration are calculated (Section 3.1). Stops are detected by a neural network model (Section 3.2). Stops are visualized using a humanoid 3D model via virtual reality spaces (Section 3.3).
Figure 2. Overview of the proposed system. The dance motion data are recorded by MoCap (Section 2). The proposed system consists of three phases (Section 3). First, the motion features based on velocity and acceleration are calculated (Section 3.1). Stops are detected by a neural network model (Section 3.2). Stops are visualized using a humanoid 3D model via virtual reality spaces (Section 3.3).
Sensors 22 05402 g002
Figure 3. Attachment of PN to the performer is presented with 18 small sensors at hand, arm, shoulder, leg, head, and waist that measure inertia, such as a gyroscope and an accelerometer. The relative positions among the sensors are measured, and the 3D positions of the sensors are obtained.
Figure 3. Attachment of PN to the performer is presented with 18 small sensors at hand, arm, shoulder, leg, head, and waist that measure inertia, such as a gyroscope and an accelerometer. The relative positions among the sensors are measured, and the 3D positions of the sensors are obtained.
Sensors 22 05402 g003
Figure 4. Example of velocity transition.
Figure 4. Example of velocity transition.
Sensors 22 05402 g004
Figure 5. Example of acceleration transition.
Figure 5. Example of acceleration transition.
Sensors 22 05402 g005
Figure 6. 3D model for stop visualization system.
Figure 6. 3D model for stop visualization system.
Sensors 22 05402 g006
Figure 7. Motion data playback.
Figure 7. Motion data playback.
Sensors 22 05402 g007
Figure 8. Transition of loss function.
Figure 8. Transition of loss function.
Sensors 22 05402 g008
Figure 9. Comparative example of visualization timing of stops.
Figure 9. Comparative example of visualization timing of stops.
Sensors 22 05402 g009
Table 1. Stops characteristics.
Table 1. Stops characteristics.
Types of StopsDetails
Short stopOperation stop time is about 25 frames.
Normal stopOperation stop time is about 50 frames.
Long stopOperation is stopped for a while.
Table 2. Details of each UI in visualization system.
Table 2. Details of each UI in visualization system.
UIDetails
Now FrameDisplay the current frame.
Load (Student)Read the dance movement data of the expert dancer.
Load (Teacher)Read the performer’s dance movement data.
StartStart playback of dance movement data.
StopPause playback of dance movement data.
FinishEnd playback of dance movement data.
0.5×Play back dance movement data at 0.5× speed.
−0.5×Rewind dance movement data at 0.5× speed.
Play dance movement data at 1× speed
−1×Rewind dance movement data at 1× speed
Play back dance movement data at 2× speed.
−2×Rewind dance movement data at 2× speed.
Table 3. Details of each dance movement.
Table 3. Details of each dance movement.
SubjectGenderAgeHeightDance ExperienceNo. of Samples
AMale23 years old164 cm9 years128
B1Male23 years old168 cm15 years135
B2Male23 years old168 cm15 years138
B3Male23 years old168 cm15 years135
CFemale36 years old164 cm19 years143
Table 4. Overview of each method.
Table 4. Overview of each method.
ModelDataInput SizeNo. of Hidden LayersNo. of Hidden NodesOutput Size
PMNNFeature data41164
Comp. 1NNTime-series data10211284
Comp. 2LSTM [32]Feature data2 × 21164
Comp. 3LSTM [32]Time-series data51 × 211284
ModelDataInput SizeKernelOutput Size
Comp. 4Nonlinear Support Vector Machine [33]Feature data4Radial basis function4
ModelDataInput SizeNo. of NeighborsOutput Size
Comp. 5k-Nearest Neighbor [34]Feature data451
ModelDataInput SizeNo. of Trees in the ForestOutput Size
Comp. 6Random Forest [35]Feature data41154
Table 5. Breakdown of the number of data sets for each training.
Table 5. Breakdown of the number of data sets for each training.
TrainingVerificationTesting
short stop501010
normal stop511212
long stop933
no stop3637878
Total473103103
Table 6. Stop detection accuracy of each method.
Table 6. Stop detection accuracy of each method.
PrecisionRecallF-Measure
PM0.9380.6000.732
Comp. 11.0000.4000.571
Comp. 20.8130.5200.634
Comp. 3-0-
Comp. 40.6980.5390.546
Comp. 50.7150.4720.536
Comp. 60.6410.5750.558
Table 7. Quantitative evaluation of visualization accuracy.
Table 7. Quantitative evaluation of visualization accuracy.
No. of Stop DetectionsNo. of NondetectionsNo. of False Positives
PM11214
Comp. 16261
Comp. 210223
Comp. 30320
Comp. 42303
Comp. 5112112
Comp. 682412
Table 8. List of analyses and visualization features in related studies.
Table 8. List of analyses and visualization features in related studies.
LiteratureNo. of Subjects/Dance GenresAnalysis ExamplesApplication for Motion Visualization
PM5 / Yosakoi (JPN)NN-based stop detectionHighlighting a teacher and a student stop with VR
Chan et al. [13]6 / Hip-hop and a-go-go (USA)Motion matching from motion databaseHighlighting incorrect movement joints with VR
Hachimura et al. [14]5 / Street dance (USA)-Overlay of the computer graphics characteristics of a trainer with AR
Shiratori et al. [15]2 / Aizu-bandaisan (JPN)Segmentation of motion sequence based on the music rhythm-
Yoshimura et al. [17]5 / Fuji Musume (JPN)Proposal of coordinate system considering local moving for motion tracking-
Aristidou et al. [20]3 / Bachatta dance (DMA)Proposal of Laban Movement Analysis motion features for LabanOnly playback of tracked motion with VR
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jin, Y.; Suzuki, G.; Shioya, H. Detecting and Visualizing Stops in Dance Training by Neural Network Based on Velocity and Acceleration. Sensors 2022, 22, 5402. https://doi.org/10.3390/s22145402

AMA Style

Jin Y, Suzuki G, Shioya H. Detecting and Visualizing Stops in Dance Training by Neural Network Based on Velocity and Acceleration. Sensors. 2022; 22(14):5402. https://doi.org/10.3390/s22145402

Chicago/Turabian Style

Jin, Yuuki, Genki Suzuki, and Hiroyuki Shioya. 2022. "Detecting and Visualizing Stops in Dance Training by Neural Network Based on Velocity and Acceleration" Sensors 22, no. 14: 5402. https://doi.org/10.3390/s22145402

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop