Detection of Interaction Effects in a Nonparametric Concurrent Regression Model

Pan, Rui; Wang, Zhanfeng; Wu, Yaohua

doi:10.3390/e25091327

Open AccessArticle

Detection of Interaction Effects in a Nonparametric Concurrent Regression Model

by

Rui Pan

¹,

Zhanfeng Wang

^2,* and

Yaohua Wu

²

¹

School of Data Science, University of Science and Technology of China, Hefei 230026, China

²

Department of Statistics and Finance, Management School, University of Science and Technology of China, Hefei 230026, China

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(9), 1327; https://doi.org/10.3390/e25091327

Submission received: 1 June 2023 / Revised: 5 August 2023 / Accepted: 8 September 2023 / Published: 12 September 2023

(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data)

Download

Browse Figure

Review Reports Versions Notes

Abstract

:

Many methods have been developed to study nonparametric function-on-function regression models. Nevertheless, there is a lack of model selection approach to the regression function as a functional function with functional covariate inputs. To study interaction effects among these functional covariates, in this article, we first construct a tensor product space of reproducing kernel Hilbert spaces and build an analysis of variance (ANOVA) decomposition of the tensor product space. We then use a model selection method with the

L_{1}

criterion to estimate the functional function with functional covariate inputs and detect interaction effects among the functional covariates. The proposed method is evaluated using simulations and stroke rehabilitation data.

Keywords:

model selection; L₁ criterion; reproducing kernel Hilbert space; smoothing spline

1. Introduction

Functional data can be found in various fields, such as biology, economics, engineering, geology, medicine and psychology. Recently, statistical methods and theories on functional data were widely studied ([1,2,3,4,5,6]). Functional data sometimes have more complicated structures. For example, the motivation data of this paper, stroke rehabilitation data, utilized a collection of 3D video games known as Circus Challenge to enhance upper limb function in stroke patients ([7,8,9]). The patients were scheduled to play the movement game over three months at specified times. At each visit time t, the level of impairment of stroke subject i was measured using the CAHAI (Chedoke Arm and Hand Activity Inventory) score, denoted as

y_{i} (t)

, and movements of upper limbs of patients, such as forward circle movement and sawing movement, were also recorded. The movement data at time t and frequency s from the ith patient were denoted as

x_{i} (t, s)

. Determining a way to model the relationship of

y_{i} (t)

and functional

x_{i} (t, \cdot)

is key to studying whether the movements are helpful to the rehabilitation level of the stroke patient or not. Furthermore, there is a question of whether there are interaction effects among the movements on the stroke patient’s rehabilitation. Zhai et al. [9] developed a nonparametric concurrent regression model to study the relationship between functional movements and the CAHAI score. However, they did not consider the interaction effects of functional movements on the CAHAI score. We aim to examine the interaction effects of movements on CAHAI scores and predict the rehabilitation level of stroke patients in this paper.

In this paper, we apply the following nonparametric concurrent regression model (NCRM) to model stroke rehabilitation data:

\begin{matrix} y_{i} (t) = f (t, x_{i} (t, \cdot)) + ϵ_{i} (t), i = 1, \dots, n, \end{matrix}

(1)

where f is a bivariate functional to be estimated nonparametrically, response

y_{i} (t)

is a function of t, covariate

x_{i} (t, \cdot)

is a vector of functional with length q, and

ϵ_{i} (t)

is a random error. To explore the interaction effects among components of covariates

x_{i} (t, \cdot)

, we use the smooth spline analysis of variance (SS ANOVA) method [10,11] to decompose regression function f.

A multivariate function can be decomposed of main effects and interaction effects via the SS ANOVA method ([10,11]). When the dimension of covariates q is large, the decomposed model contains a large number of interaction effects. Even if only the main effects and second-order interaction terms are investigated, the order of the number of decomposition terms is

O (q^{2})

, which leads to a highly complicated model. To model stroke rehabilitation data with

q = 3

, there are 22 terms, including the main effects and interaction effects. This challenges the estimation method for the NCRM model. To avoid this shortcoming, Zhai et al. [9] took all functional covariates as a whole and did not consider interaction effects among covariates. Following [12], this paper conducts a model selection method for the NCRM model with all main effects and interaction effects. In this method, the regression function is estimated and significant components of the decomposition are selected simultaneously.

Model selection is a crucial step in building statistical models that accurately capture the relationships between variables ([13,14]). It can choose the most suitable model from a set of candidate models based on certain criteria such as goodness-of-fit, predictive performance, and interpretability. Based on the SS ANOVA approach, model selection is crucial to determine the contribution of each component of the decomposition to the overall variance of the response variable. Several methods have been proposed for selecting models with SS ANOVA, including forward selection, backward elimination, and stepwise regression ([15,16,17,18,19,20]). However, these methods are limited in their ability to handle high-dimensional data and identify complex interactions among variables. Hence, regularization methods such as the

L_{1}

penalty have gained popularity in recent years ([12,21,22,23,24,25]), which allow for the selection of sparse and robust models. For example, Zhang et al. [23] developed a nonparametric penalized likelihood method with the likelihood basis pursuit and used it for variable selection and model construction. Lin and Zhang [22] proposed a component selection and smoothing method for multivariate nonparametric regression models by penalizing the sum of component norms of SS ANOVA. Furthermore, Wang et al. [12] developed a unified framework for estimation and model selection methods in nonparametric function-on-function regression, which performs well when using

L_{1}

penalty methods for model selection. Dong and Wang [24] proposed a nonparametric method for learning conditional dependence structure in graph models by applying

L_{1}

regularization to detect the neighborhoods of edges, where SS ANOVA decomposition is used to depict interaction effects of edges in the graph model. In this paper, we borrowed the

L_{1}

regularization idea to build model selection by penalizing the sum of norms of the ANOVA decomposition components for the NCRM model. In addition, Bayesian analysis methods can also be used to study interaction effects; for example, Ren et al. [26] proposed a novel semiparametric Bayesian variable selection model for investigating linear and nonlinear gene–environment interactions simultaneously, allowing for structural identification.

This paper proposes an estimation and model selection approach for the NCRM model (1). Following [12,22], the SS ANOVA decomposition for the tensor product space of the reproducing kernel Hilbert spaces (RKHS) is constructed, and the

L_{1}

penalty approach for the components of the decomposition is implemented. We use estimation procedures under either an

L_{1}

or a joint

L_{1}

and

L_{2}

penalty to fit teh NCRM model. We study the interaction effects of the covariate

x_{i} (t, \cdot)

in model (1) via ANOVA decomposition of the regression function, where the tensor product RKHS is built based on Gaussian kernels. The decomposition is different from that of Zhai et al. [9], where they took the covariate as a whole variable and did not consider their interaction effects. Based on the decomposition, model selection with the tensor product RKHS is conducted using the

L_{1}

penalty method. With regards to the covariate

x_{i} (t, \cdot)

, the models from Wang et al. [12] are not suitable to analyze the stoke data. In this paper, we apply the proposed method to stroke rehabilitation data and study the relationship of the movements and the patient’s CAHAI score. Besides the main effects, the interaction effect of the movements is also detected.

The remainder of the article is organized as follows. In Section 2, we present the tensor product RKHS with the Gaussian kernel and the SS ANOVA decomposition of the regression function. In Section 3, we show model selection and estimation procedures. The simulation study and application of stroke rehabilitation data are presented in Section 4 and Section 5. We conclude in Section 6.

2. Nonparametric Concurrent Regression Model

For the NCRM model (1), we consider

x_{i} (t, \cdot) = (x_{i 1} (t, \cdot), \dots, x_{i q} (t, \cdot))

, where

x_{i j} (t, s) : S \to R

for any fixed time

t \in T

is a function of s within a space denoted by

X_{j}

,

j \in 1, \dots, q

. Generally, t and s can be transformed into

[0, 1]

. For simplicity, we let

T = [0, 1]

and

S = [0, 1]

and let

X_{j} \subset L^{2} [0, 1]

,

j = 1, \dots, q

, which are independent of t. Furthermore, we assume that

y_{i} (t) \in Y \subset L^{2} [0, 1]

and

ϵ_{i} (t)

for

i = 1, \dots, n

are identically and independently distributed in

L^{2} [0, 1]

with mean zero and

\int_{0}^{1} E [ϵ_{i} {(t)}^{2}] d t < \infty

. It is shown that the regression function f is a functional function with an independent covariate

x_{i} (t, \cdot)

. To provide a nonparametric estimation of f, the SS ANOVA decomposition method is used to construct a tensor product space of RKHS to which f belongs.

When f is treated as a function with respect to the first augment

t \in T

, we consider the Sobolev space [10],

H^{(1)} = \{f : f and f^{'} absolutely continuous, \int_{0}^{1} {(f^{″})}^{2} d t < \infty\},

(2)

where

H^{(1)}

can be rewritten as

H^{(1)} = {1} \oplus {t} \oplus H_{2}^{(1)},

where

{1}

is a constant space,

{t}

is a linear function space with t as an independent variable.

H_{2}^{(1)}

is a smooth function space orthogonal to the constant space and the linear function space. Reproducing kernels (RK) for these three subspaces are

K_{0}^{(1)} (t, t^{'}) = 1

,

K_{1}^{(1)} (t, t^{'}) = k_{1} (t) k_{1} (t^{'})

, and

K_{2}^{(1)} (t, t^{'}) = k_{2} (t) k_{2} (t^{'}) - k_{4} (| t - t^{'} |)

, where

k_{1}

,

k_{2}

and

k_{4}

are defined as

\begin{matrix} k_{1} (x) & = x - 0.5, \\ k_{2} (x) & = \frac{1}{2} \{k_{1}^{2} (x) - \frac{1}{12}\}, \\ k_{4} (x) & = \frac{1}{24} \{k_{1}^{4} (x) - \frac{1}{2} k_{1}^{2} (x) + \frac{7}{240}\} . \end{matrix}

For functional augments

x (t, \cdot)

, RK and its corresponding RKHS for f as a function of functions in

X = X_{1} \times \dots \times X_{q}

are constructed as follows. For any

u_{j}, u_{j}^{'} \in X_{j}

, we construct a Gaussian kernel as

\begin{matrix} K_{2, j}^{(2)} (u_{j}, u_{j}^{'}) = exp \{- \frac{{∥u_{j} - u_{j}^{'}∥}^{2}}{2}\}, \end{matrix}

(3)

where

{∥u_{j}∥}^{2} = \int_{0}^{1} u_{j}^{2} (s) d s

. We can show that when the space

X_{j}

is a complete space,

K_{2, j}^{(2)}

is a symmetric and strictly positive definite. The unique RKHS

H_{2, j}^{(2)}

derived from

K_{2, j}^{(2)}

is separable and does not contain any non-zero constants. To construct an SS ANOVA decomposition, we let

H_{j}^{(2)} = {1} \oplus H_{2, j}^{(2)}

. Then, the tensor product space in this paper is

H^{(2)} = H_{1}^{(2)} \otimes \dots \otimes H_{q}^{(2)}

with the following decomposition:

\begin{matrix} H^{(2)} = & H_{1}^{(2)} \otimes \dots \otimes H_{q}^{(2)} \\ = & {1} \oplus H_{2, 1}^{(2)} \oplus \dots \oplus H_{2, q}^{(2)} \oplus \{H_{2, 1}^{(2)} \otimes H_{2, 2}^{(2)}\} \oplus \dots \oplus \\ \{H_{2, q - 1}^{(2)} \otimes H_{2, 1}^{(2)}\} \oplus \dots \oplus \{H_{2, 1 - 1}^{(2)} \otimes \dots \otimes H_{2, q}^{(2)}\} . \end{matrix}

(4)

Decomposition (4) is different from that of Zhai et al. [9] where

H^{(2)}

is decomposed of constant space

{1}

and another RKHS not considering interaction among

x (t, \cdot)

.

Next, we consider the tensor product space

H = H^{(1)} \otimes H^{(2)}

which has the following decomposition:

\begin{matrix} H = & ({1} \oplus {t} \oplus H_{2}^{(1)}) \otimes ({1} \oplus {\oplus_{j = 1}^{q} H_{2, j}^{(2)}} \oplus \dots \oplus \{H_{2, 1}^{(2)} \otimes \dots \otimes H_{2, q}^{(2)}\}) \\ = & {1} \oplus {t} \oplus H_{2}^{(1)} \oplus {\oplus_{j = 1}^{q} H_{2, j}^{(2)}} \oplus {\oplus_{j = 1}^{q} {H_{2, j}^{(2)} \times {t}}} \oplus \\ {\oplus_{j = 1}^{q} {H_{2, j}^{(2)} \times H_{2}^{(1)}}} \oplus \dots \oplus \{H_{2, 1}^{(2)} \otimes \dots \otimes H_{2, q}^{(2)}\} \oplus \\ \{H_{2, 1}^{(2)} \otimes \dots \otimes H_{2, q}^{(2)} \times {t}\} \oplus \{H_{2, 1}^{(2)} \otimes \dots \otimes H_{2, q}^{(2)} \otimes H_{2}^{(1)}\} . \end{matrix}

There, the null space

{1} \oplus {t}

stands for the main effect of the parametric form of t,

H_{2}^{(1)}

is the main effect of the non-parametric form of t,

H_{2, j}^{(2)}

is the main effect of the non-parametric form of

u_{j}

,

{t} \otimes H_{2, j}^{(2)}

is the linear nonparametric interaction between t and

u_{j}

,

H_{2}^{(1)} \otimes H_{2, j}^{(2)}

is the nonparametric nonparametric interaction between t and

u_{j}

, and so on,

H_{2}^{(1)} \otimes H_{2, 1}^{(2)} \otimes \dots \otimes H_{2, q}^{(2)}

is the nonparametric nonparametric interaction between t and u, where

u = (u_{1}, \dots, u_{q})

. We denote

ϕ_{1} (t, u) = 1

and

ϕ_{2} (t, u) = k_{1} (t)

as the basis functions of

H_{0}

. For example, with

q = 3

, the RKs corresponding to the above sub-RKHS are

\begin{matrix} H_{0} := {1} \oplus {t} & ⟷ K_{0} ((t, u), (t^{'}, u^{'})) = 1 + k_{1} (t) k_{1} (t^{'}), \\ H_{1} := H_{2}^{(1)} & ⟷ K_{1} ((t, u), (t^{'}, u^{'})) = K_{2}^{(1)} (t, t^{'}), \\ H_{1 + j} := H_{2, j}^{(2)} & ⟷ K_{1 + j} ((t, u), (t^{'}, u^{'})) = K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ H_{4 + j} := {t} \otimes H_{2, j}^{(2)} & ⟷ K_{4 + j} ((t, u), (t^{'}, u^{'})) = k_{1} (t) k_{1} (t^{'}) K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ H_{7 + j} := H_{2}^{(1)} \otimes H_{2, j}^{(2)} & ⟷ K_{7 + j} ((t, u), (t^{'}, u^{'})) = K_{2}^{(1)} (t, t^{'}) K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ H_{8 + j + l} := H_{2, j}^{(2)} \otimes H_{2, l}^{(2)} & ⟷ K_{8 + j + l} := ((t, u), (t^{'}, u^{'})) = K_{2, j}^{(2)} (u_{j}, u_{j}^{'}) K_{2, l}^{(2)} (u_{l}, u_{l}^{'}), \\ H_{11 + j + l} := {t} \otimes H_{2, j}^{(2)} \otimes H_{2, l}^{(2)} & ⟷ K_{11 + j + l} ((t, u), (t^{'}, u^{'})) = k_{1} (t) k_{1} (t^{'}) K_{2, j}^{(2)} (u_{j}, u_{j}^{'}) K_{2, l}^{(2)} (u_{l}, u_{l}^{'}), \\ H_{14 + j + l} := H_{2}^{(1)} \otimes H_{2, j}^{(2)} \otimes H_{2, l}^{(2)} & ⟷ K_{14 + j + l} ((t, u), (t^{'}, u^{'})) = K_{2}^{(1)} (t, t^{'}) K_{2, j}^{(2)} (u_{j}, u_{j}^{'}) K_{2, l}^{(2)} (u_{l}, u_{l}^{'}), \\ H_{20} := H_{2, 1}^{(2)} \otimes H_{2, 3}^{(2)} \otimes H_{2, 3}^{(2)} & ⟷ K_{20} ((t, u), (t^{'}, u^{'})) = \prod_{j = 1}^{3} K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ H_{21} := {t} \otimes H_{2, 1}^{(2)} \otimes H_{2, 3}^{(2)} \otimes H_{2, 3}^{(2)} & ⟷ K_{21} ((t, u), (t^{'}, u^{'})) = k_{1} (t) k_{1} (t^{'}) \prod_{j = 1}^{3} K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ H_{22} := H_{2}^{(1)} \otimes H_{2, 1}^{(2)} \otimes H_{2, 3}^{(2)} \otimes H_{2, 3}^{(2)} & ⟷ K_{22} ((t, u), (t^{'}, u^{'})) = K_{2}^{(1)} (t, t^{'}) \prod_{j = 1}^{3} K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \end{matrix}

for

j, l = 1, 2, 3

and

j < l

, where the left and right parts stand for the tensor product spaces and their corresponding RKs, respectively.

3. Model Selection and Estimation

We let the projection of f onto

H_{0}

be

\sum_{k = 1}^{2} d_{k} ϕ_{k} (t, u)

,

u = (u_{1}, \dots, u_{q})

, and

{H_{1}, \dots, H_{Q}}

be the sub-RKHS generated by the tensor product method in Section 2, where Q is a number of sub-RKHS.

L_{1}

penalties are applied to coefficients

d_{k}

for the space

H_{0}

and components of the decomposition of f (projections of f onto

H_{j}, j = 1, \dots, Q

). We estimate f by minimizing the following penalized least squares:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \{\int_{0}^{1} {(y_{i} (t) - f (t, x_{i} (t, \cdot)))}^{2} d t\} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + λ_{2} \sum_{v = 1}^{Q} w_{2, v} {∥P_{v} f∥}_{H}, \end{matrix}

(5)

where

f \in H

,

P_{v}

is the projection operator onto

H_{j}

,

{∥\cdot∥}_{H}

is a norm induced from

H

,

λ_{1}

and

λ_{2}

are tuning parameters, and

0 \leq w_{1 k}, w_{2, v} < \infty

are pre-specified weights. We may set

w_{11} = 0

when

ϕ_{1} = 1

to avoid penalty to the constant function.

Since the response function is a stochastic process in the

L^{2} [0, 1]

space, there exists a set of orthogonal basis functions

{η_{k} (t), k = 1, 2, \dots}

in

L^{2} [0, 1]

, where

{η_{k} (t), k = 1, 2, \dots, n}

is an empirical functional principal component (EFPC) of

{y_{1} (t), \dots, y_{n} (t)}

([27]). We let

ν_{i k} = < y_{i} (t), η_{k} (t) >

and

L_{i k} f = \int_{0}^{1} f (t, x_{i} (t, \cdot)) η_{k} (t) d t

for

i = 1, 2, \dots, n

and

k = 1, \dots, n

. We assume that

{L_{i k}}

are bounded linear functionals. With EFPC, functional data can be transformed to scalar data such that modeling and analysis can be conducted by using traditional statistical methods. It can show that the PLS (5) based on functional data

y_{i} (t)

reduces to the following PLS based on scalar data

{ν_{i k}}

:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} \{{(ν_{i k} - L_{i k} f)}^{2}\} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + λ_{2} \sum_{v = 1}^{Q} w_{2, v} {∥P_{v} f∥}_{H} . \end{matrix}

(6)

By Lemma 3.1 in Wang et al. [12], minimizing the PLS (6) is equivalent to minimizing the following PLS:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} \{{(ν_{i k} - L_{i k} f)}^{2}\} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{0} \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} {∥P_{v} f∥}_{H}^{2} + τ_{1} \sum_{v = 1}^{Q} w_{2, v} θ_{v}, \end{matrix}

(7)

subject to

θ_{v} \geq 0

for

1 \leq v \leq Q

, where

λ_{1}

,

τ_{0}

,

τ_{1}

are tuning parameters.

We let

H^{*} = H_{1} \oplus \dots \oplus H_{Q}

. To provide an RK with linear combination of its subspaces RK for the space of

H^{*}

, we define a new inner product in

H^{*}

,

\begin{matrix} < f, g >_{*} = \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} < P_{v} f, P_{v} g >, \end{matrix}

(8)

where

< \cdot, \cdot >

is the inner product in

H

. Under the new inner product, the RK of

H_{1}^{*}

is

\begin{matrix} K^{*} ((t, u), (t^{'}, u^{'})) = \sum_{v = 1}^{Q} w_{2, v}^{- 1} θ_{v} K_{v} ((t, u), (t^{'}, u^{'})), \end{matrix}

where coefficient

θ_{v}

can measure the contribution of the components of the decomposition to the model. Next, we use the reproducing property of the kernel function to transform the infinite-dimensional optimization problem (7) into a finite-dimensional solution problem. We let

H_{1 n} = span {\int_{0}^{1} K^{*} ((t, x (t, \cdot)), (t^{'}, x_{i} (t^{'}, \cdot))) η_{k} (t^{'}) d t^{'}, i = 1, 2, \dots, n, k = 1, 2, \dots, n}

, which is a subspace of

H^{*}

. Then, any

f \in H^{*}

can be decomposed as

\begin{matrix} f = f_{0} + f_{1 n} + ρ, \end{matrix}

where

f_{0} \in H_{0}

,

f_{1 n} \in H_{1 n}

, and

ρ \in H^{*} ⊖ H_{1 n}

. We denote

K_{(t^{'}, x_{i} (t^{'}, \cdot))}^{*} (t, x (t, \cdot)) = K^{*} ((t^{'}, x_{i} (t^{'}, \cdot)), (t, x (t, \cdot)))

as the evaluation function of point

(t^{'}, x_{i} (t^{'}, \cdot))

, and

f_{1} = f_{1 n} + ρ

. Then, we can rewrite the PLS (7) as

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} {\{ν_{i k} - u_{i k} - < f_{1} (t^{'}, x_{i} (t^{'}, \cdot)), η_{k} (t^{'}) >\}}^{2} + τ_{0} \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} {∥P_{v} f∥}_{H^{*}}^{2} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{1} \sum_{v = 1}^{Q} w_{2, v} θ_{v} \\ = & \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} {\{ν_{i k} - u_{i k} - < < f_{1}, K_{(t^{'}, x_{i} (t^{'}, \cdot))}^{*} >_{H^{*}}, η_{k} (t^{'}) >\}}^{2} + τ_{0} \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} {∥P_{v} f∥}_{H^{*}}^{2} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{1} \sum_{v = 1}^{Q} w_{2, v} θ_{v} \\ = & \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} {\{ν_{i k} - u_{i k} - < f_{1}, \int_{0}^{1} K_{(t^{'}, x_{i} (t^{'}, \cdot))}^{*} η_{k} (t^{'}) d t^{'} >_{H^{*}}\}}^{2} + τ_{0} \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} {∥P_{v} f∥}_{H^{*}}^{2} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{1} \sum_{v = 1}^{Q} w_{2, v} θ_{v} \\ = & \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} {\{ν_{i k} - u_{i k} - < f_{1 n}, \int_{0}^{1} K_{(t^{'}, x_{i} (t^{'}, \cdot))}^{*} η_{k} (t^{'}) d t^{'} >_{H^{*}}\}}^{2} + \\ τ_{0} \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} {∥P_{v} f_{1 n}∥}_{H^{*}}^{2} + τ_{0} \sum_{v = 1}^{Q} w_{2, v} θ_{v}^{- 1} {∥P_{v} ρ∥}_{H^{*}}^{2} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{1} \sum_{v = 1}^{Q} w_{2, v} θ_{v}, \end{matrix}

(9)

where

u_{i k} = \int_{0}^{1} f_{0} (t^{'}, x_{i} (t^{'}, \cdot)) η_{k} (t^{'}) d t^{'}

. The first equality uses the reproducing property, and the third equality uses the fact that

ρ

is orthogonal to

H_{1 n}

. Minimizing (9) must have

ρ = 0

, and we obtain the following representer theorem:

Theorem 1

(Representer Theorem). The solution to PLS (9) is

\begin{matrix} f (t, x (t, \cdot)) = \sum_{j = 1}^{2} d_{j} φ_{j} (t) + \sum_{v = 1}^{Q} w_{2, v}^{- 1} θ_{v} \sum_{i = 1}^{n} \sum_{k = 1}^{n} c_{i k} ξ_{i k} (t, x (t, \cdot)), \end{matrix}

(10)

where

φ_{1} (t) = 1

,

φ_{2} (t) = k_{1} (t)

, and

ξ_{i k} (t, x (t, \cdot)) = \int_{0}^{1} K_{v} ((t, x (t, \cdot)), (t^{'}, x_{i} (t^{'}, \cdot))) η_{k} (t^{'}) d t^{'}

.

From this representer theorem, the PLS (9) reduces to

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{n} {(ν_{i k} - \sum_{j = 1}^{2} a_{i k j} d_{j} - \sum_{v = 1}^{Q} w_{2, v}^{- 1} θ_{v} \sum_{j = 1}^{n} \sum_{l = 1}^{n} c_{j l} b_{i k j l})}^{2} \\ + & λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{0} \sum_{v = 1}^{Q} w_{2, v}^{- 1} θ_{v} \sum_{i = 1}^{n} \sum_{k = 1}^{n} \sum_{j = 1}^{n} \sum_{l = 1}^{n} c_{i k} b_{i k j l} c_{j l} + τ_{1} \sum_{v = 1}^{Q} w_{2, v} θ_{v}, \end{matrix}

(11)

where

a_{i k j} = \int_{0}^{1} φ_{j} (t) η_{k} (t) d t

,

b_{i k j l} = \sum_{v = 1}^{Q} w_{2, v}^{- 1} θ_{v} b_{i k j l}^{v}

,

b_{i k j l}^{v} = \int_{0}^{1} ξ_{j l} (t, x_{i} (t, \cdot)) η_{k} (t) d t

. We let

Σ = \sum_{v = 1}^{Q} w_{2, v}^{- 1} θ_{v} Σ_{v}

, the

(i + (k - 1) n, j + (l - 1) n)

th element of

Σ_{v}

is

b_{i k j l}^{v}

. We let

Y_{k} = {(ν_{1 k}, \dots, ν_{n k})}^{⊤}

,

Y = {(Y_{1}^{⊤}, \dots, Y_{n}^{⊤})}^{⊤}

,

c = {(c_{11}, c_{21}, \dots, c_{n n})}^{⊤}

,

d = {(d_{1}, d_{2})}^{⊤}

,

w_{2} = {(w_{2, 1}, \dots, w_{2, Q})}^{⊤}

,

Σ

be an

n^{2} \times n^{2}

matrix with

b_{i k j l}

as the

(i + (k - 1) n, j + (l - 1) n)

element, and T be a

n^{2} \times 2

matrix with

a_{i k j}

as the

(i + (k - 1) n, j)

element. Then, the PLS (11) reduces to

\begin{matrix} \frac{1}{n} {∥Y - T d - Σ c∥}^{2} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} | + τ_{0} c^{⊤} Σ c + τ_{1} w_{2}^{⊤} θ, \end{matrix}

(12)

subject to

θ_{v} \geq 0, v = 1, 2, \dots, Q

.

The backfitting algorithm in Wang et al. [12] is applied to solve the PLS (12) as follows (Algorithm 1):

Algorithm 1 Model Selection Algorithm

Set initial value

d = d^{0}

,

θ = θ^{0}

.

repeat

Update c by minimizing

\frac{1}{n} {∥Y - T d - Σ c∥}^{2} + τ_{0} c^{⊤} Σ c

Calculate

Y^{*} = Y - R θ

, where R is an

n \times Q

matrix with the v-th column being

w_{2, v}^{- 1} Σ_{v} c

Update d by minimizing

\frac{1}{n} {∥Y^{*} - T d∥}^{2} + λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} |

select tuning parameter M by the k-fold cross-validation or BIC method

Update

θ

by minimizing

\frac{1}{n} {∥Y - T d - R θ∥}^{2} + τ_{0} c^{⊤} R θ

subject to

θ_{v} \geq 0

for

1 \leq v \leq Q

and

w_{2}^{⊤} θ \leq M

until c, d and

θ

converge

return c, d and

θ

4. Statistical Properties

In this section, we assume that

X

and

Y

are complete measurable spaces. We let P be a probability measure on

X^{q} \times L^{2} (T)

and

M = T \times X^{q}

. Without the loss of generality, we let the terms

λ_{1} \sum_{k = 1}^{2} w_{1 k} | d_{k} |

and

λ_{2} \sum_{v = 1}^{Q} w_{2, v} {∥P_{v} f∥}_{H}

in (6) be combined into

{∥f∥}_{H}

.

We define a loss function,

\begin{matrix} L (f; x, y) = \int_{0}^{1} {(y (t) - f (t, x (t, \cdot)))}^{2} d t, \end{matrix}

where

y (t) \in Y

and

x \in X^{q}

. The corresponding L-risk function (Steinwart and Christmann [28]) is

\begin{matrix} R_{L, P} (f) = E_{P} [L (f; x, y)] . \end{matrix}

We let

f^{*} = \underset{f \in H}{arg min} R_{L, P} (f)

,

R_{L, P, H}^{*} = R_{L, P} (f^{*})

, and

\begin{matrix} f_{P, λ} = \underset{f \in H}{arg min} {R_{L, P} (f) + λ {∥f∥}_{H}} . \end{matrix}

Obviously,

\hat{f} = f_{D, λ}

. We state convergence properties in the following theorem and show its proofs in Appendix B.

Theorem 2.

Assume that

f : M \to R

is measurable for any

f \in H

,

M

is a complete measurable space, and

{| P |}_{2} = \int_{X^{q} \times L^{2} (T)} {∥y (t)∥}_{2}^{2} d P (x, y) < \infty

. When

λ \to 0

and

λ^{6} n \to \infty

as

n \to \infty

, we have

\begin{matrix} | R_{L, P} (\hat{f}) - R_{L, P, H}^{*} | = O_{p} (λ) . \end{matrix}

Theorem 2 states that as

λ

tends to 0 and

λ^{6} n

tends to infinity as n tends to infinity, the function estimate

\hat{f}

is L-risk consistent (Steinwart and Christmann [28]).

5. Simulation

In this section, numerical experiments are studied to evaluate the performance of the proposed model selection approach. Functional covariate take

x_{i} (t, \cdot) = (x_{i 1} (t, \cdot), x_{i 2} (t, \cdot))

, where

x_{i j} (t, \cdot) = cos (2 π (x_{i j}^{*} (t, \cdot)))

, and

x_{i j}^{*} (t, \cdot)

follows a Gaussian process with mean function

μ (t) = t

. Kernel function for the GP takes the RBF kernel

k_{g} (s_{1}, s_{2}) = e x p (- {(s_{1} - s_{2})}^{2} / 2)

for

j = 1

and the rational quadratic kernel

k_{l} (s_{1}, s_{2}) = 1 / (1 + {(s_{1} - s_{2})}^{2})

for

j = 2

. Three functions for

f (t, x (t, \cdot))

are presented as follows: for

t \in

[0, 1],

\begin{matrix} M_{1} : f (t, x (t, \cdot)) = & 1 + 5 cos {(2 π t)}^{3}, \\ M_{2} : f (t, x (t, \cdot)) = & 1 + 0.5 t + 10 \int_{0}^{1} x_{1}^{3} (t, s) d s + 5 \int_{0}^{1} x_{2}^{3} (t, s) d s + 10 \int_{0}^{1} x_{1}^{3} (t, s) d s \int_{0}^{1} x_{2}^{3} (t, s) d s, \\ M_{3} : f (t, x (t, \cdot)) = & 1 + 5 cos (2 π t) + 10 \int_{0}^{1} x_{1} (t, s) x_{2} (t, s) d s . \end{matrix}

We see that

M_{1}

has the main effect of t,

M_{2}

consists of three main effects and the nonparametric nonparametric interaction of

x_{1}

and

x_{2}

,

M_{3}

consists of the main effect of t, and the nonparametric nonparametric interaction of

x_{1}

and

x_{2}

. Random error

ϵ_{i} (t)

follows

N (0, {0.22}^{2})

and

N (0, {0.5}^{2})

. All simulations are repeated 200 times.

We generate n samples

{y_{i} (t), x_{i} (t, \cdot) : i = 1, \dots, n}

as training data, and

n_{t} = 50

samples

{{\tilde{y}}_{i} (t), {\tilde{x}}_{i} (t, \cdot) : i = 1, \dots, n}

as test data. For comparison, we evaluate the performance using the following root mean square error (RMSE) on the test data:

\begin{matrix} RMSE = \sqrt{\frac{1}{n_{t}} \sum_{i = 1}^{n_{t}} {∥f (t, {\tilde{x}}_{i} (t, \cdot)) - \hat{f} (t, {\tilde{x}}_{i} (t, \cdot))∥}_{2}^{2}}, \end{matrix}

where

{∥\cdot∥}_{2}

is the norm of

L^{2} (T)

.

The proposed model selection method is used to train the model and predict the test data, denoted by

L_{1}

. Not considering model selection, we use the

L_{2}

penalty method to estimate the NCRM model denoted by

L_{2}

, which minimizes the following objective function,

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \{\int_{0}^{1} {(y_{i} (t) - f (t, x_{i} (t, \cdot)))}^{2} d t\} + λ \sum_{v = 1}^{Q} {∥P_{v} f∥}_{H}^{2}, \end{matrix}

(13)

where

λ

is the tuning parameter. After model selection, the selected model is estimated with the

L_{2}

penalty method, which is denoted by

L_{1} + L_{2}

. Table 1 shows the average RMSE and standard deviation in parentheses for these three estimation methods,

L_{1}

,

L_{2}

and

L_{1} + L_{2}

. We see that under models M1 and M3,

L_{1} + L_{2}

have the smallest RMSEs among these three estimation methods. Under model M2,

L_{1}

has better performance than

L_{2}

and comparable results with those of

L_{1} + L_{2}

. In addition, for the three different methods, prediction performance improves as the

σ

decreases and the training sample size increases.

To evaluate the performance of model selection by the

L_{1}

penalty method, we take three measurement indices in Wang et al. [12], specificity (SPE), sensitivity (SEN) and

F_{1}

scores,

\begin{matrix} SPE = \frac{TN}{TN + FP}, SEN = \frac{TP}{TP + FN}, F_{1} = \frac{2 TP}{2 TP + FN + FP}, \end{matrix}

where TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives, respectively. The non-zero components of the decomposition of the regression function are considered as positive samples. For

θ_{v} > 0

, its estimated value is larger than 0, which is considered a true positive.

Table 2 shows the sensitivities, specificities, and F1 scores. Overall, the

L_{1}

penalty method performs well in different simulation settings. In addition, model selection becomes better with decreasing

σ

and increasing training sample size.

6. Application

The proposed model selection approach is applied to analyze stroke rehabilitation data with 70 stroke survivors ([7]).

The data consist of 34 acute patients with an incidence of stroke less than a month ago and 36 chronic patients with an incidence of stroke more than six months ago. To improve upper limb functions for stroke patients, a convenient home-based rehabilitation system via action video games with 3D-position movement behaviors has been developed [7,8]. The patients played the movement game at scheduled times. For each visit time t, the impairment level of subject i was assessed using a measure called CAHAI (Chedoke Arm and Hand Activity Inventory) score, denoted as

y_{i} (t)

, and movements such as forward circular movement and sawing movement were recorded. In this paper, three movements, forward circular movement of the parental limb from the x-axis (

x_{i 1} = L A 05 . l x

), sawing movement of the parental limb from the y-axis (

x_{i 2} = L A 09 . l y

), and the movement of the non-parental limb from the direction of the x-axis (

x_{i 3} = L A 28 . r q x

) are taken as functional covariates. For the purpose of illustrating the proposed method, we use the data from acute patients. During the three-month study period, each acute patient received up to seven assessments, which resulted in 173 observations. CAHAI scores were normalized before analysis.

In this paper, we focus on the interaction effect upon the order of two, and take the following decomposition:

\begin{matrix} K_{0} ((t, u), (t^{'}, u^{'})) & = 1 + k_{1} (t) k_{1} (t^{'}), \\ K_{1} ((t, u), (t^{'}, u^{'})) & = K_{2}^{(1)} (t, t^{'}), \\ K_{1 + j} ((t, u), (t^{'}, u^{'})) & = K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ K_{4 + j} ((t, u), (t^{'}, u^{'})) & = k_{1} (t) k_{1} (t^{'}) K_{2, j}^{(2)} (u_{j}, u_{j}^{'}) + K_{2}^{(1)} (t, t^{'}) K_{2, j}^{(2)} (u_{j}, u_{j}^{'}), \\ K_{5 + j + l} ((t, u), (t^{'}, u^{'})) & = K_{2, j}^{(2)} (u_{j}, u_{j}^{'}) K_{2, l}^{(2)} (u_{l}, u_{l}^{'}), \end{matrix}

for

j, l = 1, 2, 3

and

j < l

. Readers can also choose various kinds of SS ANOVA decomposition by merging kernel functions according to their own needs. From Section 3, we have

K^{*} ((t, u), (t^{'}, u^{'})) = \sum_{v = 1}^{10} w_{2, v}^{- 1} θ_{v} K_{v} ((t, u), (t^{'}, u^{'})),

where coefficient

θ_{v}

for kernel function

K_{v}

provides levels of contribution of

K_{v}

to the overall model.

The penalty method with

L_{1}

regularization for model selection is applied to stroke rehabilitation data. Parameters

{θ_{v}}

are computed, and values larger than 0 are

θ_{2} = 4.157

,

θ_{3} = 0.819

,

θ_{4} = 0.636

,

θ_{7} = 0.592

and

θ_{10} = 0.741

. This shows that the main effects of

x_{i 1} (t, \cdot)

,

x_{i 2} (t, \cdot)

and

x_{i 3} (t, \cdot)

, the linear nonparametric interaction of t and

x_{3} (t, \cdot)

and the nonparametric-nonparametric interaction of

x_{2} (t, \cdot)

and

x_{3} (t, \cdot)

have nonzero contributions to the CAHAI score. Thus, the three movements, forward circular movements of the parental limb, awing movements of the parental limb and of the non-parental limb, may be helpful to the recovery of stroke patients. In addition, the interaction of awing movements of the parental limb and the non-parental limb, may contribute to the level of daily life dependence or upper limb function impairment. Figure 1 plots the estimates of nonparametric regression functions for four stroke patients. We can see that the regression function in the NCRM model has the same trend as the scores of CAHAI, and on the whole, they all show an increasing trend along with fluctuating trends, which shows that movements may improve upper limb function for stroke patients.

Prediction performance of the proposed method is evaluated using a tenfold cross-validation,

\begin{matrix} RPE = \frac{1}{10} \sum_{i = 1}^{10} \sqrt{\frac{1}{n_{j}} \sum_{i \in j t h f o l d}^{} {∥Y_{i} (t) - {\hat{Y}}_{i}^{(- j)} (t)∥}_{2}^{2}}, \end{matrix}

where

{\hat{Y}}_{i}^{(- j)} (t)

is the predicted value of

Y_{i} (t)

based on the fitted selected model with the

L_{1} + L_{2}

penalty and the data excluding the jth fold. The RPE for Stoke data is 1.0690, which is smaller than 1.1700 from the method of Zhai et al. [9].

7. Conclusions

For functional data with functional covariate inputs, this paper applies the Gaussian kernel function to construct the tensor product RKHS to model the regression function. This leads to a nonparametric concurrent regression model. The

L_{1}

penalty method is used to detect components of the SS ANOVA decomposition of the regression function, which has nonzero contribution to model fit. The backfitting algorithm is developed to estimate the model selection. The proposed method is applied to stroke rehabilitation data, and the results show that besides the main effects, there are interaction effects of the movements on the CAHAI score. This indicates that movements may help improve the level of daily life dependence or impairment of upper limb function of a stroke patient.

Author Contributions

Conceptualization, Z.W. and Y.W.; methodology, Z.W. and R.P.; data curation, R.P.; writing draft preparation, R.P.; writing and editing, Z.W. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 11971457, 12201601 and Anhui Provincial Natural Science Foundation grant number 2208085.

Data Availability Statement

The data presented in this study are available on reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SS ANOVA	Smooth spline analysis of variance
RKHS	Reproducing Kernel Hilbert Space
NCRM	Nonparametric concurrent regression model

Appendix A. Tensor Product and Reproducing Kernel Hilbert Space

In this section, we provide a brief description of tensor product space, reproducing kernel Hilbert space (RKHS) and SS ANOVA.

Tensor Product Space. A tensor product space refers to the direct product of multiple vector spaces. For two vector spaces, denoted by V and W, the tensor product space is

V \otimes W = {(v, w) | v \in V, w \in W}

. For details, please see Lin [29].

Reproducing Kernel Hilbert Space (RKHS). Reproducing kernel is a kernel function with the property of reproduction, and a reproducing kernel Hilbert space is a type of Hilbert space that possesses the property of a reproducing kernel. Mathematically, for a reproducing kernel K and its deduced RKHS

H

, the reproduction property is

f (x) = 〈 f, K (x, \cdot) 〉

, where f is in

H

and x is an input variable. RKHS provides an effective tool for modeling nonlinear relationships and handling high-dimensional data. In the context of regression, an RKHS is utilized as the foundation for model selection and estimation. For details, please see Wainwright [30].

Smoothing spline analysis of variance. Smoothing spline analysis of variance (SS ANOVA) is a powerful tool which combines the strengths of smoothing splines and analysis of variance to facilitate the simultaneous exploration of main effects and interactions among variables. SS ANOVA is an important and useful method to model nonlinear relationships within the regression framework [31,32,33,34,35,36]. For example, Wahba [31] presented theory and applications of smoothing spline models, with a special focus on function estimation from functional data with noise, where it includes univariate smoothing spline, multidimensional thin plate spline, splines on the sphere, additive spline, and interacting spline. Furthermore, Wahba et al. [32] extended the SS ANOVA model to the exponential family distribution and used the developed method to estimate the risk of diabetic retinopathy progression. In addition, Gao et al. [34] combined an SS ANOVA model with a log-linear model to fit multivariate Bernoulli data.

To illustrate the SS ANOVA approach, we consider a nonparametric model represented as follows:

\begin{matrix} y = f (x_{1}, \dots, x_{p}) + ϵ, \end{matrix}

(A1)

where f is the unknown smoothing function and

ϵ

is an error term. Applying SS ANOVA to model (A1), we decompose f as follows:

\begin{matrix} f (x_{1}, \dots, x_{p}) = μ + \sum_{i = 1}^{p} f_{i} (x_{i}) + \sum_{i < j} f_{i, j} (x_{i}, x_{j}) + \dots + f_{1, \dots, p} (x_{1}, \dots, x_{p}) . \end{matrix}

(A2)

For this decomposition,

μ

denotes the overall mean, functions

f_{1} (x_{1}), f_{2} (x_{2}), \dots,

and

f_{p} (x_{p})

capture the main effects inherent in Model (A1), and functions

f_{i, j} (x_{i}, x_{j})

provide the interactions between variables

x_{i}

and

x_{j}

, and so forth. One way to model these functions,

f_{1} (x_{1}), f_{2} (x_{2}), \dots, f_{p} (x_{p}),

and

f_{1, \dots, p} (x_{1}, \dots, x_{p})

, is to use the smoothing splines, such as the cubic splines.

Appendix B. Proof of Theorem 2

From the triangle inequality, we have

\begin{matrix} | R_{L, P} (\hat{f}) - R_{L, P, H}^{*} | \leq | R_{L, P} (\hat{f}) - R_{L, P} (f_{P, λ}) | + | R_{L, P} (f_{P, λ}) - R_{L, P, H}^{*} | . \end{matrix}

Hence, we separately calculate the convergence rates of

| R_{L, P} (\hat{f}) - R_{L, P} (f_{P, λ}) |

and

| R_{L, P} (f_{P, λ}) - R_{L, P, H}^{*} |

.

Since

{| P |}_{2}

and

{∥K_{H}∥}_{H}

are bounded, without loss of generality, we assume that

q = 1

,

{| P |}_{2} = 1

, and

∥K_{H}∥ = 1

, where

K_{H}

is the RK of

H

and then

R_{L, P} (0) \leq {| P |}_{2} = 1

. From the proof of Theorem 3 in Zhai et al. [9], we have

\begin{matrix} |R_{L, P} (\hat{f}) - R_{L, P} (f_{P, λ})| \leq c (2 + ∥ \hat{f} ∥_{\infty} + {∥f_{P, λ}∥}_{\infty}) {∥\hat{f} - f_{P, λ}∥}_{\infty}, \end{matrix}

where c is an indeterminate constant depending on L.

We know that

\begin{matrix} λ {∥f_{P, λ}∥}_{H} \leq \underset{f \in H}{arg inf} R_{L, P} (f) + λ {∥f∥}_{H} \leq R_{L, P} (0) \leq 1 . \end{matrix}

Hence, when

{∥f_{P, λ} - \hat{f}∥}_{H} \leq 1

, we have

{∥f_{P, λ}∥}_{\infty} \leq {∥K_{H}∥}_{\infty} {∥f_{P, λ}∥}_{H} \leq c λ^{- 1}

and

∥ \hat{f} ∥_{\infty} \leq {∥f_{P, λ}∥}_{\infty} + ∥ f_{P, λ} -

\hat{f} ∥_{\infty} \leq c λ^{- 1} + 1

. Therefore, we have

|R_{L, P} (\hat{f}) - R_{L, P} (f_{P, λ})| \leq c λ^{- 1} | | f_{P, λ} - \hat{f} ∥_{H} .

Meanwhile,

\begin{matrix} λ {∥f_{P, λ}∥}_{H} + R_{L, P} (f_{P, λ}) - R_{L, P, H}^{*} = & inf_{f \in H} λ {∥f∥}_{H} + R_{L, P} (f) - R_{L, P, H}^{*} \\ \leq & λ {∥f^{*}∥}_{H} + R_{L, P} (f^{*}) - R_{L, P, H}^{*}, \end{matrix}

which shows that

\begin{matrix} R_{L, P} (f_{P, λ}) - R_{L, P, H}^{*} \leq λ ({∥f^{*}∥}_{H} - {∥f_{P, λ}∥}_{H}) \leq c_{2} λ, \end{matrix}

where

c_{2} > 0

is a constant. Taking the Fr

\overset{´}{e}

chet derivative of

R_{L, P} (f) + λ {∥f∥}_{H}

with respect to f, setting it to zero, we have

\begin{matrix} λ - E_{P} (y - f_{P, λ} (x)) Φ = 0, \end{matrix}

(A3)

where

Φ ((t, x (t, \cdot))) = K_{H} (\cdot, (t, x (t, \cdot)))

is a canonical map. We let

h (x, y) = - 2 (y - f_{P, λ} (x))

. Following the proof of Theorem 5.9 in [28], we show that

\begin{matrix} 〈 f_{\bar{P}, λ} - f_{P, λ}, E_{\bar{P}} h Φ - E_{P} h Φ 〉 + λ {∥f_{P, λ} - f_{\bar{P}, λ}∥}_{H}^{2} \leq 0, \end{matrix}

where

\bar{P}

is any distribution defined on

X^{q} \times L^{2} (T)

. According to (A3), we know that

\begin{matrix} λ {∥f_{P, λ} - f_{\bar{P}, λ}∥}_{H}^{2} \leq & 〈 f_{P, λ} - f_{\bar{P}, λ}, E_{\bar{P}} h Φ - E_{P} h Φ 〉 \\ \leq & {∥f_{P, λ} - f_{\bar{P}, λ}∥}_{H} \cdot {∥E_{\bar{P}} h Φ - E_{P} h Φ∥}_{H}, \end{matrix}

which indicates that

\begin{matrix} {∥f_{P, λ} - f_{\bar{P}, λ}∥}_{H} \leq \frac{1}{λ} {∥E_{\bar{P}} h Φ - E_{P} h Φ∥}_{H} . \end{matrix}

Let

\bar{P} = D

, and from Lemma 9.2 of Steinwart and Christmann [28], we have

\begin{matrix} P (|R_{L, P} (\hat{f}) - R_{L, P, H}^{*}| \geq ϵ) \\ \leq & P (c λ^{- 2} {∥E_{P} h Φ - E_{D} h Φ∥}_{H} + c_{2} λ > ϵ) \\ \leq & O (n^{- 1} λ^{- 6}), \end{matrix}

with

ϵ = O (λ)

. Thence, we obtain the order of

|R_{L, P} (\hat{f}) - R_{L, P, H}^{*}|

as

O_{p} (λ)

.

References

Aue, A.; Rice, G.; Sönmez, O. Detecting and dating structural breaks in functional data without dimension reduction. J. R. Stat. Soc. Ser. B Stat. Methodol. 2018, 80, 509–529. [Google Scholar] [CrossRef]
Aristizabal, J.P.; Giraldo, R.; Mateu, J. Analysis of variance for spatially correlated functional data: Application to brain data. Spat. Stat. 2019, 32, 100381. [Google Scholar] [CrossRef]
Slaoui, Y. Recursive nonparametric regression estimation for independent functional data. Stat. Sin. 2020, 30, 417–437. [Google Scholar] [CrossRef]
Yao, F.; Yang, Y. Online estimation for functional data. J. Am. Stat. Assoc. 2021, 1–15. [Google Scholar]
Smida, Z.; Cucala, L.; Gannoun, A.; Durif, G. A median test for functional data. J. Nonparametric Stat. 2022, 34, 520–553. [Google Scholar] [CrossRef]
De Silva, J.; Abeysundara, S. Functional Data Analysis on Global COVID-19 Data. Asian J. Probab. Stat. 2023, 21, 12–28. [Google Scholar] [CrossRef]
Serradilla, J.; Shi, J.; Cheng, Y.; Morgan, G.; Lambden, C.; Eyre, J.A. Automatic assessment of upper limb function during play of the action video game, circus challenge: Validity and sensitivity to change. In Proceedings of the 2014 IEEE 3nd International Conference on Serious Games and Applications for Health (SeGAH), IEEE, Rio de Janeiro, Brazil, 14–16 May 2014; pp. 1–7. [Google Scholar]
Shi, J.Q.; Cheng, Y.; Serradilla, J.; Morgan, G.; Lambden, C.; Ford, G.A.; Price, C.; Rodgers, H.; Cassidy, T.; Rochester, L.; et al. Evaluating functional ability of upper limbs after stroke using video game data. In Proceedings of the Brain and Health Informatics: International Conference, BHI 2013, Maebashi, Japan, 29–31 October 2013; pp. 181–192. [Google Scholar]
Zhai, Y.; Wang, Z.; Wang, Y. A nonparametric concurrent regression model with multivariate functional inputs. Stat. Its Interface 2023. To be appear. [Google Scholar]
Wang, Y. Smoothing Splines: Methods and Applications; Chapman and Hall: New York, NY, USA, 2011. [Google Scholar]
Gu, C. Smoothing Spline ANOVA Models, 2nd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
Wang, Z.; Dong, H.; Ma, P.; Wang, Y. Estimation and model selection for nonparametric function-on-function regression. J. Comput. Graph. Stat. 2022, 31, 835–845. [Google Scholar] [CrossRef]
Vapnik, V.; Izmailov, R. Rethinking statistical learning theory: Learning using statistical invariants. Mach. Learn. 2019, 108, 381–423. [Google Scholar] [CrossRef]
Hsu, H.L.; Ing, C.K.; Tong, H. On model selection from a finite family of possibly misspecified time series models. Ann. Stat. 2019, 47, 1061–1087. [Google Scholar] [CrossRef]
Guo, W. Inference in smoothing spline analysis of variance. J. R. Stat. Soc. Ser. B Stat. Methodol. 2002, 64, 887–898. [Google Scholar] [CrossRef]
Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 2006, 68, 49–67. [Google Scholar] [CrossRef]
Olusegun, A.M.; Dikko, H.G.; Gulumbe, S.U. Identifying the limitation of stepwise selection for variable selection in regression analysis. Am. J. Theor. Appl. Stat. 2015, 4, 414–419. [Google Scholar] [CrossRef]
Malik, N.A.M.; Jamshaid, F.; Yasir, M.; Hussain, A.A.N. Time series Model selection via stepwise regression to predict GDP Growth of Pakistan. Indian J. Econ. Bus. 2021, 20, 1881–1894. [Google Scholar]
Untadi, A.; Li, L.D.; Li, M.; Dodd, R. Modeling Socioeconomic Determinants of Building Fires through Backward Elimination by Robust Final Prediction Error Criterion. Axioms 2023, 12, 524. [Google Scholar] [CrossRef]
Radman, M.; Chaibakhsh, A.; Nariman-zadeh, N.; He, H. Generalized sequential forward selection method for channel selection in EEG signals for classification of left or right hand movement in BCI. In Proceedings of the 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 24–25 October 2019; pp. 137–142. [Google Scholar]
Storlie, C.B.; Bondell, H.D.; Reich, B.J.; Zhang, H.H. Surface estimation, variable selection, and the nonparametric oracle property. Stat. Sin. 2011, 21, 679–705. [Google Scholar] [CrossRef]
Lin, Y.; Zhang, H.H. Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 2006, 34, 2272–2297. [Google Scholar] [CrossRef]
Zhang, H.H.; Wahba, G.; Lin, Y.; Voelker, M.; Ferris, M.; Klein, R.; Klein, B. Variable selection and model building via likelihood basis pursuit. J. Am. Stat. Assoc. 2004, 99, 659–672. [Google Scholar] [CrossRef]
Dong, H.; Wang, Y. Nonparametric Neighborhood Selection in Graphical Models. J. Mach. Learn. Res. 2022, 23, 1–36. [Google Scholar]
Dong, H. Nonparametric Learning Methods for Graphical Models; University of California: Santa Barbara, CA, USA, 2022. [Google Scholar]
Ren, J.; Zhou, F.; Li, X.; Chen, Q.; Zhang, H.; Ma, S.; Jiang, Y.; Wu, C. Semiparametric Bayesian variable selection for gene-environment interactions. Stat. Med. 2020, 39, 617–638. [Google Scholar] [CrossRef]
Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; John Wiley & Sons: Hoboken, NJ, USA, 2015; p. 997. [Google Scholar]
Steinwart, I.; Christmann, A. Support Vector Machines; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Lin, Y. Tensor product space ANOVA models. Ann. Stat. 2000, 28, 734–755. [Google Scholar] [CrossRef]
Wainwright, M.J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint; Cambridge University Press: Cambridge, UK, 2019; Volume 48. [Google Scholar]
Wahba, G. Spline Models for Observational Data; SIAM: Philadelphia, PA, USA, 1990. [Google Scholar]
Wahba, G.; Wang, Y.; Gu, C.; Klein, R.; Klein, B. Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy: The 1994 Neyman Memorial Lecture. Ann. Stat. 1995, 23, 1865–1895. [Google Scholar] [CrossRef]
Gu, C.; Wahba, G. Smoothing spline ANOVA with component-wise Bayesian “confidence intervals”. J. Comput. Graph. Stat. 1993, 2, 97–117. [Google Scholar]
Gao, F.; Wahba, G.; Klein, R.; Klein, B. Smoothing spline ANOVA for multivariate Bernoulli observations with application to ophthalmology data. J. Am. Stat. Assoc. 2001, 96, 127–160. [Google Scholar] [CrossRef]
Guo, W.; Dai, M.; Ombao, H.C.; Von Sachs, R. Smoothing spline ANOVA for time-dependent spectral analysis. J. Am. Stat. Assoc. 2003, 98, 643–652. [Google Scholar] [CrossRef]
Chiu, C.Y.; Liu, A.; Wang, Y. Smoothing spline mixed-effects density models for clustered data. Stat. Sin. 2020, 30, 397–416. [Google Scholar] [CrossRef]

Figure 1. CAHAI scores (red line) and their corresponding fitted values (blue line) for 4 patients.

Table 1. Average values and standard deviations of RMSEs. (Methods corresponding to bold numbers perform best).

n	$σ$	Model	$L_{1}$	$L_{2}$	$L_{1} + L_{2}$
20	0.2	M1	0.2906 (0.1386)	0.3010 (0.0287)	$0.0981 (0.0420)$
		M2	$0.8876 (0.1147)$	0.9577 (0.0782)	0.9493 (0.0707)
		M3	1.2065 (0.8556)	0.9423 (0.1713)	$0.7108 (0.6317)$
	0.5	M1	0.3027 (0.1323)	0.4238 (0.0654)	$0.1205 (0.0591)$
		M2	$0.9486 (0.1166)$	1.0039 (0.0760)	0.9996 (0.0794)
		M3	1.3906 (0.9134)	1.0044 (0.1952)	$0.8008 (0.5486)$
40	0.2	M1	0.1655 (0.0127)	0.2390 (0.0416)	$0.0792 (0.0113)$
		M2	$0.7722 (0.0517)$	0.8088 (0.0877)	0.7928 (0.0630)
		M3	0.6664 (0.2255)	0.5773 (0.0456)	$0.3968 (0.0423)$
	0.5	M1	0.1913 (0.0944)	0.3257 (0.0860)	$0.0913 (0.0179)$
		M2	$0.8605 (0.0678)$	0.8918 (0.0856)	0.8819 (0.0746)
		M3	0.7824 (0.2060)	0.6869 (0.0420)	$0.4952 (0.0764)$
80	0.2	M1	0.1358 (0.0860)	0.1416 (0.0238)	$0.0737 (0.0076)$
		M2	$0.6040 (0.0620)$	0.6599 (0.0894)	0.6595 (0.0708)
		M3	0.3227 (0.0218)	0.3793 (0.0237)	$0.2635 (0.0248)$
	0.5	M1	0.1635 (0.1412)	0.2419 (0.0704)	$0.0844 (0.0206)$
		M2	$0.7338 (0.0552)$	0.7589 (0.0696)	0.7578 (0.0605)
		M3	0.4080 (0.1488)	0.5511 (0.0475)	$0.3566 (0.0470)$

Table 2. Average values and standard deviations of SPE, SEN, F1 scores under models M1, M2, M3.

n	$σ$	Model	SPE	SEN	F1
20	0.2	M1	0.9956 (0.0291)	0.9800 (0.1404)	0.9783 (0.1421)
		M2	0.9700 (0.0601)	0.7017 (0.1472)	0.7883 (0.1166)
		M3	0.9906 (0.0366)	0.9850 (0.1219)	0.9592 (0.1527)
	0.5	M1	0.9961 (0.0233)	0.9850 (0.1219)	0.9800 (0.1279)
		M2	0.9786 (0.0511)	0.6917 (0.1529)	0.7881 (0.1201)
		M3	0.9889 (0.0386)	0.9850 (0.1219)	0.9553 (0.1543)
40	0.2	M1	1.0000 (0.0000)	1.0000 (0.0000)	1.0000 (0.0000)
		M2	0.9614 (0.0667)	0.9817 (0.0762)	0.9512 (0.0872)
		M3	0.9833 (0.0428)	1.0000 (0.0000)	0.9517 (0.1212)
	0.5	M1	0.9989 (0.0111)	0.9950 (0.0707)	0.9933 (0.0744)
		M2	0.9407 (0.0720)	0.9600 (0.1086)	0.9177 (0.1060)
		M3	0.9844 (0.0417)	1.0000 (0.0000)	0.9550 (0.1178)
80	0.2	M1	0.9983 (0.0175)	1.0000 (0.0000)	0.9958 (0.0424)
		M2	0.9743 (0.0737)	1.0000 (0.0000)	0.9635 (0.0701)
		M3	1.0000 (0.0000)	1.0000 (0.0000)	1.0000 (0.0000)
	0.5	M1	0.9956 (0.0291)	0.9850 (0.1219)	0.9817 (0.1259)
		M2	0.9546 (0.0853)	1.0000 (0.0000)	0.9317 (0.1031)
		M3	0.9983 (0.0135)	1.0000 (0.0000)	0.9950 (0.0406)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, R.; Wang, Z.; Wu, Y. Detection of Interaction Effects in a Nonparametric Concurrent Regression Model. Entropy 2023, 25, 1327. https://doi.org/10.3390/e25091327

AMA Style

Pan R, Wang Z, Wu Y. Detection of Interaction Effects in a Nonparametric Concurrent Regression Model. Entropy. 2023; 25(9):1327. https://doi.org/10.3390/e25091327

Chicago/Turabian Style

Pan, Rui, Zhanfeng Wang, and Yaohua Wu. 2023. "Detection of Interaction Effects in a Nonparametric Concurrent Regression Model" Entropy 25, no. 9: 1327. https://doi.org/10.3390/e25091327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Interaction Effects in a Nonparametric Concurrent Regression Model

Abstract

1. Introduction

2. Nonparametric Concurrent Regression Model

3. Model Selection and Estimation

4. Statistical Properties

5. Simulation

6. Application

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Tensor Product and Reproducing Kernel Hilbert Space

Appendix B. Proof of Theorem 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI