Detecting Parkinson’s Disease through Gait Measures Using Machine Learning

Li, Alex; Li, Chenyu

doi:10.3390/diagnostics12102404

Open AccessArticle

Detecting Parkinson’s Disease through Gait Measures Using Machine Learning

by

Alex Li

¹ and

Chenyu Li

^2,*

¹

Stanford Center for Professional Development, Stanford University, Stanford, CA 94305, USA

²

Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA

^*

Author to whom correspondence should be addressed.

Diagnostics 2022, 12(10), 2404; https://doi.org/10.3390/diagnostics12102404

Submission received: 19 July 2022 / Revised: 29 September 2022 / Accepted: 1 October 2022 / Published: 3 October 2022

(This article belongs to the Special Issue Artificial Intelligence and Pattern Recognition Methods for the Automatic Detection and Evaluation of Neurological Disorders)

Download

Browse Figures

Versions Notes

Abstract

:

Parkinson’s disease (PD) is one of the most common long-term degenerative movement disorders that affects the motor system. This progressive nervous system disorder affects nearly one million Americans, and more than 20,000 new cases are diagnosed each year. PD is a chronic and progressive painful neurological disorder and usually people with PD live 10 to 20 years after being diagnosed. PD is diagnosed based on the identification of motor signs of bradykinesia, rigidity, tremor, and postural instability. Though several attempts have been made to develop explicit diagnostic criteria, this is still largely unrevealed. In this manuscript, we aim to build a classifier with gait data from Parkinson patients and healthy controls using machine learning methods. The classifier could help facilitate a more accurate and cost-effective diagnostic method. The input to our algorithm is the Gait in Parkinson’s Disease dataset published on PhysioNet containing force sensor data as the measurement of gait from 92 healthy subjects and 214 patients with idiopathic Parkinson’s Disease. Different machine learning methods, including logistic regression, SVM, decision tree, KNN were tested to output a predicted classification of Parkinson patients and healthy controls. Baseline models including frequency domain method can reach similar performance and may be another good approach for the PD diagnostics.

Keywords:

Parkinson’s disease; machine learning; gait measures

1. Introduction

Parkinson’s disease causes uncontrollable movements which are due to brain disorders. With the progression of this disease, patients can suffer from difficulty walking and talking, and mental and behavioral changes. The nerve cells in the patients’ basal ganglia who suffer from Parkinson’s disease become impaired and/or die, and this leads to less production of dopamine which is the key chemical messenger of the nervous system to control various functions of the body [1,2,3]. Four main symptoms exist in Parkinson’s patients, including tremor in arms, legs, heads and hands, muscle contraction for a long time, slowness of movement, impaired balance with falls sometimes. Among every 100,000 who are 80 or older, 1900 patients suffer from Parkinson’s disease. Though Parkinson’s disease is the most common among elder people, there are no specific tests to diagnose Parkinson’s disease [1,2,3]. The current state-of-the-art method is based on a patient’s medical history, a review of signs and symptoms, together with a neurological and physical check by a neurologist [4,5,6]. Though a dopamine transporter scan (DaTscan) can be a supportive test to identify integrity of the striatal dopaminergic system, the final diagnosis is still based on symptoms and neurologic examination [7,8,9,10]. Diagnostic accuracy is still not optimal with clinical features, diagnostic tests or biomarkers [11]. The early detection of PD is largely unmet while therapies can have a better chance for success in the early stage [12,13].

The gait dataset was used in one previous study by Hausdorff JM’s group [14] where they collected the data from sensors and described the gait speed characteristics in PD and controls using statistical methods. Another study from the same group [15] tested the comparison of gait speed on level ground and treadmill of PD and controls but mostly used descriptive analysis. Previous studies also used a statistical approach to test rhythmic auditory stimulation function on PD gait traces and concluded the potential usage of rhythmic auditory stimulation (RAS) as an intervention to improve mobility and reduce fall risk of PD patients [16,17]. Yogev G and colleagues [18] investigated the associations between executive function and gait variability and suggested a decline in executive function in PD.

One recent publication [19] showed a successful approach using a random forest classification method to identify familial hypercholesterolemia with electronic health record data. This reveals the potentially important application of machine learning methods in disease diagnosis and high-risk patient identification. Some previous machine learning approaches focused on magnetic resonance imaging [20], motor features [21] and non-motor features [22]. There have been some machine learning trials tested on Parkinson’s disease. Machine learning methods are tested to PD datasets, including handwritten patterns [23], neuroimaging [24], cerebrospinal fluid [16], serum [25] and voice [26]. Jeon et al. [27] applied Principal Component Analysis (PCA) to the Spatial-Temporal Image of Plantar Pressure and used Support Vector Machine (SVM) to classify Parkinson gait and normal gait, but the sophisticated foot pressure system may be less accessible. Zhao et al. [28] and El Maachi et al. [29] proposed Convolutional Neural Network (CNN) approaches in classifying PD patients and healthy controls using the same dataset as us. While favorable results were achieved, the complexity of the deep neural networks is high and the associated training process can be expensive. Our study may reveal a possible usage of baseline machine learning classifiers for PD patient diagnosis that is cost effective, and provide insights on novel criteria for PD.

2. Materials and Methods

2.1. Dataset and Features

Gait in Parkinson’s Disease dataset [30,31] published by Jeffrey Hausdorff on PhysioNet is a database that contains measures of gait from 306 subjects. 214 of them are PD patients and 92 are healthy controls. The records include the vertical ground reaction force of subjects as they walked at their usual, self-selected pace for approximately 2 min on level ground. One’s gait is measured via 8 sensors underneath each foot that measure the vertical ground reaction force at a rate of 100 samples per second per sensor while the subject walks for a short period of time at their own pace. Therefore, each subject is in a multi-dimensional time series representation of k by 16 matrices where k is the number of force values collected in 0.01 s granularity for this particular subject.

For baseline model, we used the full dataset. For frequency domain, decision tree, KNN methods, since the gait of each patient was measured for slightly different amount of time, we filtered the dataset by keeping the same time series length (8000 data points, or 80 s) and drop subjects who have fewer, resulting in a smaller dataset of 270 subjects with 186 being PD patients and 84 being healthy controls. In order to tune hyperparameters, the training set is further divided into training and validation sets using a 80:20 split with the same strategy. The study was repeated 10 times with different random selections of PD patients and healthy controls to obtain the mean and standard deviations of all metrics.

Figure 1 shows a subset of the time domain and frequency domain data from one sensor of a healthy control and a PD patient. With a spectral analysis, the peaks in the frequency domain indicate the dominant frequency of approximately once per second, which matches the time domain and is within our expectation of normal walking speed [32]. The plot of the PD patients shows greater uncertainty and noise in the frequency domain, likely indicative of motor system-related issues [33].

2.2. Baseline Model

Our baseline models draw inspiration from Yogev G., et al.’s prior work. Coefficient of Variation is a measure of variability as ratio of the standard deviation to the mean, and thus motor-system related symptoms are likely to be reflected on such metrics [34]. Since it measures the dispersion of the data with respect to its mean, it works well on the data series obtained from different patients. The force data are preprocessed to obtain the Coefficient of Variation

C V_{j}^{(i)} = \frac{σ_{j}^{(i)}}{μ_{j}^{(i)}}

for each sensor j of subject i, where

σ

is the standard deviation and

μ

is the mean of the data. Thus, each subject is represented by a one-dimensional vector of

x^{(i)} = C V^{(i)} \in R^{16}

. The CV metrics are then trained using Logistic Regression and Support Vector Machine with the Radial Basis Function (RBF) kernel, both with L2 regularization. The former minimizes the loss function where

h (x) = g (θ x) = 1 / (1 + e x p (- θ x))

is the sigmoid function and

θ

are learned parameters. The regularization strength λ is a hyperparameter to be optimized.

\sum_{i} y^{(i)} \log h (x^{(i)}) + (1 - y^{(i)} \log (1 - h (x^{(i)})) + λ \sum_{j} θ_{j}^{2}

The latter solves the optimization problem for w and b in the hypothesis

g (w^{T} x + b)

where g is the sign function. The regularization strength C and the hyperparameter γ of the RBF kernel are optimized using the validation set.

m i n_{w, b, ξ} \frac{1}{2} {| | w | |}^{2} + C \sum_{i} ξ_{i}^{2} s . t . y^{(i)} (w^{T} x^{(i)} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, \dots, n

2.3. Frequency Domain

Walking can be seen as a periodic motion, so each patient’s data can be treated as a finite sequence of equally-space samples of some function [35]. Therefore, the vertical reaction force time series data can be approximated using the following model and turned into its frequency domain using discrete Fourier transform, where N = 8000 is the length of the time series and i is the imaginary unit. Since the force data are real values, we will use the modulus of only k = 0, 1, …, 3999. Therefore, the frequency domain representation of each patient has 16 × 4000 real non-negative values and will be evaluated using Logistic Regression and SVM.

X_{k} = \sum_{n = 0}^{N - 1} x_{n} (c o s \frac{2 π k n}{N} - i s i n \frac{2 π k n}{N}), k = 0, 1, \dots, N - 1

2.4. Decision Tree, K-Nearest Neighbors, Convolutional Neural Networks

Decision Tree is a non-parametric supervised learning method that learns the decision rules, which can be visualized in a tree-like structure for binary classification. Filtered dataset of 8000 datapoints for each of the 16 sensors was utilized in training this model. The input dataset for each subject is one-dimensional and 8000 × 16 in length. Sklearn DecisionTree Classifier function was applied and the related parameters were tuned to get the best model performance, including criterion, splitter, max depth, min samples split, random state and max features [36]. Decision trees tend to be unstable with small datasets [37], which was confirmed in our experiments.

K-Nearest Neighbors (KNN) is another non-parametric supervised learning method suitable for binary classification which relies on an assumption that similar data points are close enough [38]. In the KNN model, the same dataset with 8000 time points for each of the 16 sensors was utilized. The input dataset for each subject is one-dimensional and 8000 × 16 in length. SKlearn function KNeighborsClassifier was used and parameters were turned to get the best model performance (n neighbors, weights, algorithm, leaf size, metric and n jobs). Compared with the Decision Tree method, KNN reduces the overfitting issue and improves accuracy.

2.5. Limited Dataset

As an extension, we also experimented with the case where there is only a subset of sensors available. If the models can still reliably predict Parkinson gaits versus regular gaits, it could help facilitate even simpler equipment in collecting vertical reaction force data and diagnosing PD. Specifically, the following two schemes were chosen (Figure 2).

Sides-Only: all except sensors 1, 8, 9, 16
Exclude-Diagonals: sensors 1, 4, 5, 8, 9, 12, 13, 16

The intuition behind these choices has to do with the possible correlation of the forces exerted by each region of the foot. These two subsets of the original data were to be evaluated using Logistic Regression and SVM.

Figure 2. Relative placement of 16 under-foot sensors (scale is arbitrary).

3. Results

3.1. Baseline Model

With the healthy control as the negative samples and PD patients as the positive samples, we evaluate the predictions for this classification task with

-: True Negative (TN): healthy control correctly identified
-: False Positive (FP): healthy control incorrectly identified as PD
-: True Positive (TP): PD patient correctly identified
-: False Negative (FN): PD patient incorrectly identified as healthy control
-: And the metrics
-: Precision = TP/(TP + FP)
-: Recall = TP/(TP + FN)
-: False Positive Rate = FP/(FP + TN).

The SVM method on Coefficient of Variation achieved better training and test accuracy than the Logistic Regression (Table 1, Supplementary Table S2). Both models had close to zero false negative rate, but Logistic Regression had more trouble predicting over the negative examples compared to SVM, achieving about 50:50 between true negatives and false positives. Hyperparameters λ = 1 for Logistic Regression and C = 0.1, γ = 10 for SVM were found via grid search (Table 1, Supplementary Table S1).

Since the dataset has an unbalanced positive and negative example, it is important to analyze beyond just accuracy. The near perfect recall score indicates good performance on the positive examples.

It is also valuable to look at its discriminatory ability, i.e., how well it can separate the positive and negative classes. The True Positive Rate (TPR) is the same as the Recall value, and the False Positive Rate (FPR) is higher than desired, which indicates less than optimal performance over the negative examples. By varying the decision thresholds, we can visualize the trade-off between TPR and FPR, the proportion of PD patients correctly classified and the proportion of healthy controls incorrectly classified, in the Receiver Operating Characteristic (ROC) curve. Area Under Receiver Operating Characteristic (AUROC) indicates the degree of separability; the higher the value, the more certain the model is in predicting each group. SVM’s higher AUROC indicates a superior performance in its ability to distinguish the two classes here (Figure 3).

3.2. Frequency Domain

The time series of 8000 data points (80 s) for each sensor of each patient was transformed into its frequency domain using discrete Fourier transform. Since the force data was sampled at once per 0.01 s granularity, the highest frequency detected is 1 cycle/0.02 s, or 50 cycles/s, which should give great insight into the subject’s gait. Since the time domain data is real, we only need the first half of the frequency domain since the rest are symmetrically redundant in their magnitude [39]. Therefore, we obtained 4000 frequency bins spaced at the reciprocal of time series length apart, 1/8000, for every sensor, ready to be trained. Hyperparamters λ = 1 for Logistic Regression and C = 0.1, γ = 1 × 10⁻⁶ for SVM were found via grid search (Table 1).

Logistic Regression achieved perfect accuracy on the training set and comparable test accuracy as the CV method in the baseline, whereas SVM with RBF Kernel per-formed better in test accuracy and recall. In terms of the ability to separate the two classes, AUROC still indicates SVM’s slightly better performance in this area (Table 1, Supplementary Table S2 and Figure 4).

3.3. Decision Tree

Decision tree builds qualitative classification models fitting the underlying distribution of data. The 8000 data points for each sensor are utilized to train the decision tree classifier. We turned parameters and the results indicated the best evaluation metric with a max depth of 43 (mean 43.1 with standard deviation 2.85), criterion as entropy and random splitter. It reached 0.60 mean accuracy with 0.71 mean precision and 0.70 mean recall score (Table 1, Supplementary Table S2). Decision tree did not improve the baseline methods performance, and this indicates better performance of logistic regression in small datasets.

3.4. K-Nearest Neighbors

The other non-parametric classification method used was KNN. Different numbers of neighbors from 1 to 10, different weights, and other parameters were evaluated, and 5 neighbors (mean 4.6 with standard deviation 0.70) with weights as distance provided the best performance with accuracy 0.70 with standard deviation of 0.05, precision 0.73 with standard deviation of 0.03 and recall 0.91 with standard deviation of 0.05 (Table 1, Supplementary Table S2). KNN performance is improved compared with decision tree, but does not exceed logistic regression.

3.5. Limited Dataset

In both Sides-Only and Exclude-Diagonals schemes, Logistic Regression and SVM achieved slightly lower performance, which are expected due to some features being removed, but higher performance than decision tree and KNN. Logistic Regression and SVM both still maintained high recall (Table 2).

4. Discussion

In our study, the baseline models with Coefficient of Variation using simple Logistic Regression and Support Vector Machine with the Radial Basis Function kernel already performed very well in the PD patient population, achieving a perfect recall score. The more complex models we used were not able to improve on the false positive rate and precision. Even though techniques such as regularization and early stopping have been applied to our more complex models, the training and test scores still suggest some overfitting due to the small dataset size. While the performance of our classifiers did not surpass that of the CNN models in the previous studies [28,29], our baseline models provide additional insights into PD diagnosis and the simplicity makes them more accessible.

All of our models have shown high values in the recall score, especially baseline models, indicating their excellent performance in identifying the PD group. This makes Parkinson’s disease detection through gait measures an effective first line of screening, and its non-invasive nature could expedite its acceptance in the field. Taken together previous studies in the field, this indicates that different machine learning methods can be good approaches to address PD diagnosis with clear criteria for patients.

Moreover, the limited data exploration suggests that having fewer sensors would not significantly decrease the performance of the tested models. This could facilitate simpler and lower cost setups than the 16-sensor equipment used to obtain the dataset here and allow the technology to be more accessible. With the limited dataset, the overall performance is comparatively worse in the healthy subject population, suggesting some degree of variations in the gait profile of healthy adults as investigated in some studies [39,40].

Some limitations of this study include the small number of PD patients and healthy controls who came from similar demographics. In future work, possible directions in improving the performance of diagnosing PD via gait measures would include obtaining data from more healthy controls and PD patients so that deep learning methods can be better evaluated, as well as applying experimental deep network models that have improved capabilities with small datasets. Healthy gait measures can be further analyzed to understand their variations and help develop methods that can perform better without lowering the high recall scores in this paper. Also, external datasets with more even class distribution are required for evaluation to reduce bias in the current study. There may also be interesting properties in the frequency domain of the gait measure that can be learned by both non-deep and deep learning approaches. With further approaches, machine learning classifiers for PD patient diagnosis can be cost-effective with potential novel criteria.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics12102404/s1, Table S1. Optimal parameters in logistic regression. Table S2. Performance metrics in the training process.

Author Contributions

A.L. and C.L. contributed to conceptualization, methodology, analysis and writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data set used in the current study will be made available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gelb, D.J.; Oliver, E.; Gilman, S. Diagnostic Criteria for Parkinson Disease. Arch. Neurol. 1999, 56, 33–39. [Google Scholar] [CrossRef]
França, C.; Duarte, K.P.; Cury, R.G. Dynamic Tremor in a Patient With Parkinson Disease. JAMA Neurol. 2021, 78, 1015. [Google Scholar] [CrossRef]
Parkinson’s Foundation. Better Lives. Together. Available online: https://www.parkinson.org/ (accessed on 20 June 2022).
Jankovic, J. Parkinson’s disease: Clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 2008, 79, 368–376. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tolosa, E.; Garrido, A.; Scholz, S.W.; Poewe, W. Challenges in the diagnosis of Parkinson’s disease. Lancet Neurol. 2021, 20, 385–397. [Google Scholar] [CrossRef]
Tolosa, E.; Wenning, G.; Poewe, W. The diagnosis of Parkinson’s disease. Lancet Neurol. 2006, 5, 75–86. [Google Scholar] [CrossRef]
Pagano, G.; Niccolini, F.; Politis, M. Imaging in Parkinson’s disease. Clin. Med. Lond. Engl. 2016, 16, 371–375. [Google Scholar] [CrossRef]
Zhou, Y.; Tagare, H.D. Self-normalized Classification of Parkinson’s Disease DaTscan Images. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 1205–1212. [Google Scholar]
Isaacson, J.R.; Brillman, S.; Chhabria, N.; Isaacson, S.H. Impact of DaTscan Imaging on Clinical Decision Making in Clinically Uncertain Parkinson’s Disease. J. Park. Dis. 2021, 11, 885–889. [Google Scholar] [CrossRef] [PubMed]
Sadasivan, S.; Friedman, J.H. Experience with DaTscan at a tertiary referral center. Parkinsonism Relat. Disord. 2015, 21, 42–45. [Google Scholar] [CrossRef]
Rajput, A.H.; Rajput, A. Accuracy of Parkinson disease diagnosis unchanged in 2 decades. Neurology 2014, 83, 386–387. [Google Scholar] [CrossRef]
Tolosa, E.; Gaig, C.; Santamaría, J.; Compta, Y. Diagnosis and the premotor phase of Parkinson disease. Neurology 2009, 72, S12–S20. [Google Scholar] [CrossRef]
Mahlknecht, P.; Seppi, K.; Poewe, W. The Concept of Prodromal Parkinson’s Disease. J. Park. Dis. 2015, 5, 681–697. [Google Scholar] [CrossRef] [Green Version]
Hausdorff, J.M.; Lowenthal, J.; Herman, T.; Gruendlinger, L.; Peretz, C.; Giladi, N. Rhythmic auditory stimulation modulates gait variability in Parkinson’s disease. Eur. J. Neurosci. 2007, 26, 2369–2375. [Google Scholar] [CrossRef] [PubMed]
Frenkel-Toledo, S.; Giladi, N.; Peretz, C.; Herman, T.; Gruendlinger, L.; Hausdorff, J.M. Effect of gait speed on gait rhythmicity in Parkinson’s disease: Variability of stride time and swing time respond differently. J. Neuroeng. Rehabil. 2005, 2, 23. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bailey, C.A.; Corona, F.; Murgia, M.; Pili, R.; Pau, M.; Côté, J.N. Electromyographical Gait Characteristics in Parkinson’s Disease: Effects of Combined Physical Therapy and Rhythmic Auditory Stimulation. Front. Neurol. 2018, 9, 211. [Google Scholar] [CrossRef] [PubMed]
McIntosh, G.C.; Brown, S.H.; Rice, R.R.; Thaut, M.H. Rhythmic auditory-motor facilitation of gait patterns in patients with Parkinson’s disease. J. Neurol. Neurosurg. Psychiatry 1997, 62, 22–26. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yogev, G.; Giladi, N.; Peretz, C.; Springer, S.; Simon, E.S.; Hausdorff, J.M. Dual tasking, gait rhythmicity, and Parkinson’s disease: Which aspects of gait are attention demanding? Eur. J. Neurosci. 2005, 22, 1248–1256. [Google Scholar] [CrossRef] [PubMed]
Banda, J.M.; Sarraju, A.; Abbasi, F.; Parizo, J.; Pariani, M.; Ison, H.; Briskin, E.; Wand, H.; Dubois, S.; Jung, K.; et al. Finding missed cases of familial hypercholesterolemia in health systems using machine learning. NPJ Digit. Med. 2019, 2, 23. [Google Scholar] [CrossRef] [Green Version]
Boutet, A.; Madhavan, R.; Elias, G.J.B.; Joel, S.E.; Gramer, R.; Ranjan, M.; Paramanandam, V.; Xu, D.; Germann, J.; Loh, A.; et al. Predicting optimal deep brain stimulation parameters for Parkinson’s disease using functional MRI and machine learning. Nat. Commun. 2021, 12, 3043. [Google Scholar] [CrossRef]
Landolfi, A.; Ricciardi, C.; Donisi, L.; Cesarelli, G.; Troisi, J.; Vitale, C.; Barone, P.; Amboni, M. Machine Learning Approaches in Parkinson’s Disease. Curr. Med. Chem. 2021, 28, 6548–6568. [Google Scholar] [CrossRef] [PubMed]
Karapinar Senturk, Z. Early diagnosis of Parkinson’s disease using machine learning algorithms. Med. Hypotheses 2020, 138, 109603. [Google Scholar] [CrossRef]
Drotár, P.; Mekyska, J.; Rektorová, I.; Masarová, L.; Smékal, Z.; Faundez-Zanuy, M. Decision support framework for Parkinson’s disease based on novel handwriting markers. IEEE Trans. Neural Syst. Rehabil. Eng. Publ. IEEE Eng. Med. Biol. Soc. 2015, 23, 508–516. [Google Scholar] [CrossRef]
Segovia, F.; Górriz, J.M.; Ramírez, J.; Martínez-Murcia, F.J.; Castillo-Barnes, D. Assisted Diagnosis of Parkinsonism Based on the Striatal Morphology. Int. J. Neural Syst. 2019, 29, 1950011. [Google Scholar] [CrossRef] [Green Version]
Váradi, C.; Nehéz, K.; Hornyák, O.; Viskolcz, B.; Bones, J. Serum N-Glycosylation in Parkinson’s Disease: A Novel Approach for Potential Alterations. Mol. Basel Switz. 2019, 24, 2220. [Google Scholar] [CrossRef] [Green Version]
Ma, C.; Ouyang, J.; Chen, H.-L.; Zhao, X.-H. An efficient diagnosis system for Parkinson’s disease using kernel-based extreme learning machine with subtractive clustering features weighting approach. Comput. Math. Methods Med. 2014, 2014, 985789. [Google Scholar] [CrossRef] [Green Version]
Jeon, H.-S.; Han, J.; Yi, W.-J.; Jeon, B.; Park, K.S. Classification of Parkinson gait and normal gait using Spatial-Temporal Image of Plantar pressure. In Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–25 August 2008; pp. 4672–4675. [Google Scholar]
Zhao, A.; Qi, L.; Li, J.; Dong, J.; Yu, H. A hybrid spatio-temporal model for detection and severity rating of Parkinson’s disease from gait data. Neurocomputing 2018, 315, 1–8. [Google Scholar] [CrossRef] [Green Version]
El Maachi, I.; Bilodeau, G.-A.; Bouachir, W. Deep 1D-Convnet for accurate Parkinson disease detection and severity prediction from gait. Expert Syst. Appl. 2020, 143, 113075. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, E215–E220. [Google Scholar] [CrossRef] [Green Version]
Frenkel-Toledo, S.; Giladi, N.; Peretz, C.; Herman, T.; Gruendlinger, L.; Hausdorff, J.M. Treadmill walking as an external pacemaker to improve gait rhythm and stability in Parkinson’s disease. Mov. Disord. Off. J. Mov. Disord. Soc. 2005, 20, 1109–1114. [Google Scholar] [CrossRef] [PubMed]
Tudor-Locke, C.; Aguiar, E.J.; Han, H.; Ducharme, S.W.; Schuna, J.M.; Barreira, T.V.; Moore, C.C.; Busa, M.A.; Lim, J.; Sirard, J.R.; et al. Walking cadence (steps/min) and intensity in 21–40 year olds: CADENCE-adults. Int. J. Behav. Nutr. Phys. Act. 2019, 16, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Snijders, A.H.; van de Warrenburg, B.P.; Giladi, N.; Bloem, B.R. Neurological gait disorders in elderly people: Clinical approach and classification. Lancet Neurol. 2007, 6, 63–74. [Google Scholar] [CrossRef]
Hausdorff, J.M. Gait variability: Methods, modeling and meaning. J. Neuroeng. Rehabil. 2005, 2, 19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Alexander, R.M.; Jayes, A.S. Fourier analysis of forces exerted in walking and running. J. Biomech. 1980, 13, 383–390. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: New York, NY, USA, 2017; ISBN 978-1-315-13947-0. [Google Scholar]
Zhang, Z. Introduction to machine learning: K-nearest neighbors. Ann. Transl. Med. 2016, 4, 218. [Google Scholar] [CrossRef] [PubMed] [Green Version]
The Discrete Fourier Transform |2| The Transform and Data Compressio. Available online: https://www.taylorfrancis.com/chapters/mono/10.1201/9781315220529-2/discrete-fourier-transform-kamisetty-ramam-rao-patrick-yip?context=ubx&refId=1951c981-b380-41a1-872b-ead013f6f1be (accessed on 17 July 2022).
Beauchet, O.; Allali, G.; Annweiler, C.; Bridenbaugh, S.; Assal, F.; Kressig, R.W.; Herrmann, F.R. Gait variability among healthy adults: Low and high stride-to-stride variability are both a reflection of gait stability. Gerontology 2009, 55, 702–706. [Google Scholar] [CrossRef]

Figure 1. Time domain and frequency domain data from one sensor of a healthy control and a PD patient.

Figure 3. Presented Logistic Regression and SVM AUROC using the baseline model.

Figure 4. Presented Logistic Regression and SVM AUROC using Frequency Domain.

Table 1. Performance metrics of the test set.

		Accuracy Mean (SD) n = 10	Precision Mean (SD) n = 10	Recall Mean (SD) n = 10	False Positive Rate Mean (SD) n = 10
Baseline model	LR	0.81 (0.03)	0.81 (0.03)	0.95 (0.03)	0.52 (0.09)
Baseline model	SVM	0.85 (0.03)	0.86 (0.03)	0.94 (0.04)	0.35 (0.09)
Frequency domain	LR	0.80 (0.04)	0.84 (0.03)	0.88 (0.05)	0.37 (0.09)
Frequency domain	SVM	0.84 (0.03)	0.84 (0.03)	0.95 (0.05)	0.41 (0.10)
	DT	0.60 (0.06)	0.71 (0.04)	0.70 (0.08)	0.64 (0.12)
	KNN	0.70 (0.05)	0.73 (0.03)	0.91 (0.05)	0.75 (0.10)

Tuned hyperparameters for baseline model LR:

λ

= 1; baseline model SVM: C = 0.1,

γ

= 10; frequency domain LR:

λ

= 1; frequency domain SVM: C = 0.1,

γ

= 1 × 10⁻⁶; DT: max depth = 43; KNN: neighbors = 5.

Table 2. Performance metrics of the test set using limited dataset.

			Accuracy Mean (SD) n = 10	Precision Mean (SD) n = 10	Recall Mean (SD) n = 10	False Positive Rate Mean (SD) n = 10
Sides-Only	Baseline model	LR	0.80 (0.03)	0.79 (0.03)	0.97 (0.02)	0.60 (0.10)
	Baseline model	SVM	0.84 (0.06)	0.87 (0.04)	0.90 (0.06)	0.30 (0.10)
		DT	0.59 (0.08)	0.70 (0.06)	0.69 (0.10)	0.65 (0.15)
		KNN	0.62 (0.06)	0.73 (0.04)	0.71 (0.08)	0.57 (0.09)
Exclude-Diagonals	Baseline model	LR	0.78 (0.04)	0.78 (0.03)	0.94 (0.05)	0.59 (0.09)
	Baseline model	SVM	0.85 (0.03)	0.87 (0.02)	0.92 (0.03)	0.32 (0.06)
		DT	0.60 (0.05)	0.71 (0.04)	0.73 (0.06)	0.68 (0.12)
		KNN	0.68 (0.03)	0.69 (0.01)	0.98 (0.03)	0.98 (0.04)

Tuned hyperparameters for baseline model LR:

λ

= 1; baseline model SVM: C = 0.1,

γ

= 10; frequency domain LR:

λ

= 1; frequency domain SVM: C = 0.1,

γ

= 1 × 10⁻⁶; DT: max depth = 43; KNN: neighbors = 5.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, A.; Li, C. Detecting Parkinson’s Disease through Gait Measures Using Machine Learning. Diagnostics 2022, 12, 2404. https://doi.org/10.3390/diagnostics12102404

AMA Style

Li A, Li C. Detecting Parkinson’s Disease through Gait Measures Using Machine Learning. Diagnostics. 2022; 12(10):2404. https://doi.org/10.3390/diagnostics12102404

Chicago/Turabian Style

Li, Alex, and Chenyu Li. 2022. "Detecting Parkinson’s Disease through Gait Measures Using Machine Learning" Diagnostics 12, no. 10: 2404. https://doi.org/10.3390/diagnostics12102404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detecting Parkinson’s Disease through Gait Measures Using Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Features

2.2. Baseline Model

2.3. Frequency Domain

2.4. Decision Tree, K-Nearest Neighbors, Convolutional Neural Networks

2.5. Limited Dataset

3. Results

3.1. Baseline Model

3.2. Frequency Domain

3.3. Decision Tree

3.4. K-Nearest Neighbors

3.5. Limited Dataset

4. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI