The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression

Nurcahayani, Helida; Budiantara, I Nyoman; Zain, Ismaini

doi:10.3390/math9101141

Open AccessArticle

The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression

by

Helida Nurcahayani

^1,2

,

I Nyoman Budiantara

^1,*

and

Ismaini Zain

¹

Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia

²

BPS—Statistics of Daerah Istimewa Yogyakarta Province, Bantul 55183, Indonesia

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(10), 1141; https://doi.org/10.3390/math9101141

Submission received: 11 April 2021 / Revised: 14 May 2021 / Accepted: 17 May 2021 / Published: 18 May 2021

(This article belongs to the Section Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Nonparametric regression becomes a potential solution if the parametric regression assumption is too restrictive while the regression curve is assumed to be known. In multivariable nonparametric regression, the pattern of each predictor variable’s relationship with the response variable is not always the same; thus, a combined estimator is recommended. In addition, regression modeling sometimes involves more than one response, i.e., multiresponse situations. Therefore, we propose a new estimation method of performing multiresponse nonparametric regression with a combined estimator. The objective is to estimate the regression curve using combined truncated spline and Fourier series estimators for multiresponse nonparametric regression. The regression curve estimation of the proposed model is obtained via two-stage estimation: (1) penalized weighted least square and (2) weighted least square. Simulation data with sample size variation and different error variance were applied, where the best model satisfied the result through a large sample with small variance. Additionally, the application of the regression curve estimation to a real dataset of human development index indicators in East Java Province, Indonesia, showed that the proposed model had better performance than uncombined estimators. Moreover, an adequate coefficient of determination of the best model indicated that the proposed model successfully explained the data variation.

Keywords:

combined estimator; Fourier series; multiresponse; nonparametric regression; truncated spline

1. Introduction

As one of the renowned methods of regression analysis, parametric regression has been used for many years in various scientific fields. However, some parametric regression assumptions are too restrictive, such as identifying a regression curve’s shape with some prespecified functional form (e.g., linear, quadratic, or cubic) [1]. In real datasets, not all regression curves have a visible pattern due to the absence of relationship information between response and predictor variables. Therefore, in such scenarios, nonparametric regression analysis is more recommended [2]. Nonparametric regression is able to degrade the misspecification risk because data inherently represent the real shape of a regression curve without interference due to the researcher’s subjectivity [2].

To date, researchers have investigated various functions for estimating the regression curve in nonparametric regression, such as spline [3,4,5,6], Fourier series [7,8,9], kernel [10,11,12], polynomial [13,14,15], and wavelet [16,17,18] functions. The present study explores the nonparametric regression approach to analyzing spatial data, i.e., the use of the nonparametric truncated spline function in geographically weighted regression [19,20]. In nonparametric regression with more than one predictor variable (multivariable), the pattern of the relationship between each predictor variable and the response variable is not always the same; therefore, in this study, we employ a nonparametric regression model with a combined estimator. Considering the concept of semiparametric regression, some studies have proposed combined estimator models to estimate the regression curve, such as spline and kernel functions [21,22,23]. Similarly, in [24], the researchers developed a combined estimator of kernel and Fourier series, whereas in [25,26,27], the researchers presented nonparametric regression with a mixed estimator of truncated spline and Fourier series. Recently, researchers have also been giving attention to a mixed model of longitudinal data, as carried out by [28].

So far, studies on nonparametric regression with combined estimators have not dealt with more than two responses, i.e., multiresponse nonparametric regression. Numerous studies on nonparametric regression have been applied to many scientific fields, such as sociodemographic analysis [23,24,25,26], finance [5,29], climatology [4], and economics [30], wherein some cases have involved real data with two or more correlated response variables. Therefore, this study makes a major contribution to the research on multiresponse nonparametric regression by using a combined estimator. Of the several types of regression functions for nonparametric regression, the truncated spline function has the major advantages of high flexibility, good visual interpretation, and the ability to handle smooth function characters and data that have changed behavior at certain subintervals [2]. On the other hand, the Fourier series function is used to estimate the regression curve if the data are smooth and follow a repeated pattern at specific intervals [9]. The Fourier series function is also a trigonometric polynomial that can adjust the data’s local nature effectively. In this study, we adopt a cosine Fourier series function [9], which follows a trend line as the regression curve estimator. A Fourier series with a cosine function was employed since it is an even function, such that the second derivative usually mathematically obtains a scalar or a nonzero value. Furthermore, the penalties in the penalized least square method can be well defined. Considering some of the advantages of these two functions, as outlined above, this study highlights the combined use of truncated spline and Fourier series estimators for multiresponse nonparametric regression.

One of the most well-known tools for the estimation of regression is the ordinary least square (OLS) method. However, the OLS method cannot be directly used in nonparametric regression because of the unknown curve shape. By utilizing smooth, continuous, and differentiable properties, as well as other components, the OLS method was modified with conditional optimization to create the penalized least-square (PLS) regression method [31]. The combined use of estimators for multiresponse nonparametric regression leads to the use of an error variance–covariance matrix as a weighting matrix. Therefore, PLS optimization is still added with a weight; accordingly, the proposed model was obtained through penalized weighted least square (PWLS) optimization. To evaluate the proposed model performance, we employed a simulation study with three different sample sizes and three error variances and applied the proposed model to a real dataset.

Given the view above, the objective of this study is to obtain a curve estimation using combined truncated spline and Fourier series estimators for multiresponse nonparametric regression. This aim is followed by estimating an error variance–covariance matrix as a weighting matrix. To estimate the proposed model, a two-stage estimation method was used. Stage 1 was to estimate the Fourier series component using PWLS optimization. The following stage was to estimate the truncated spline component by utilizing weighted least square (WLS) optimization. In addition to curve estimation, we attempted to simulate and apply the proposed model to a dataset of the HDI in East Java Province, Indonesia. The paper is organized as follows: Section 1 provides an overview of the topic while identifying the knowledge gap and stating the aims of the research. Section 2 describes the truncated spline function and the Fourier series function, along with the PWLS method. Section 3 presents the findings of the regression curve estimation via the combined use of truncated spline and Fourier series estimators for multiresponse nonparametric regression, along with an estimation of the variance–covariance matrix, smoothing parameter selection, and a simulation study following the application of a real dataset. Section 4 gives a brief conclusion and makes future recommendations.

2. Materials and Methods

2.1. Multiresponse Nonparametric Regression, Truncated Spline Function, and Fourier Series Function

Given paired data

(y_{1 i}, y_{2 i}, \dots, y_{r i} x_{1 i}, x_{2 i}, \dots, x_{p i}, z_{1 i}, z_{2 i}, \dots, z_{q i}),

the relationship between response

(y_{1 i}, y_{2 i}, \dots, y_{r i})

and predictor

(x_{1 i}, x_{2 i}, \dots, x_{p i}, z_{1 i}, z_{2 i}, \dots, z_{q i})

variables is assumed to follow multiresponse nonparametric regression, as follows:

y_{h i} = μ_{h i} (x_{1 i}, x_{2 i}, \dots, x_{p i}, z_{1 i}, z_{2 i}, \dots, z_{q i}) + ε_{h i}, ε_{h i} ~ N (0, σ_{h}^{2})

(1)

where

h = 1, 2, \dots, r

and

i = 1, 2, \dots, n

. In this study, r is defined as the number of response variables, and n refers to the number of observations for each response and predictor variable. For convenience, Equation (1) can be rewritten in the following matrix form:

y = μ (x, z) + ε

(2)

The regression curve in Equation (1) can be assumed to be an additive model.

μ_{h i} (x_{1 i}, x_{2 i}, \dots, x_{p i}, z_{1 i}, z_{2 i}, \dots, z_{q i}) = \sum_{j = 1}^{p} f_{h j} (x_{j i}) + \sum_{k = 1}^{q} g_{h k} (z_{k i}) .

(3)

In Equation (3), the regression curve

f_{h j} (x_{j i}); j = 1, 2, \dots, p

is assumed to be smooth and is approached by a truncated spline function. Meanwhile, the regression curve

g_{h k} (z_{k i}); k = 1, 2, \dots, q

is approached by a Fourier series function, which assumes that the regression curve is unknown and is contained in continuous space

C [0, π]

. Hence, the multiresponse nonparametric regression in Equation (3) can be written as

y_{h i} = \sum_{j = 1}^{p} f_{h j} (x_{j i}) + \sum_{k = 1}^{q} g_{h k} (z_{k i}) + ε_{h i}, ε_{h i} ~ N (0, σ_{h}^{2}) .

(4)

Under the assumption

ε_{h i} ~ N (0, σ_{h}^{2})

, as described in [32],

c o r r (ε_{h i} ε_{ℓ i}) = ρ

for

h \neq ℓ;

h, ℓ = 1, 2, \dots, r

. This term refers to situations in which the response variables

y_{h i}

and

y_{ℓ i}

are a pair, such that a correlation between the

h

-th response and the

ℓ

-th response yields

c o r r (ε_{h i} ε_{ℓ i}) = ρ

, and zero otherwise. The error correlation between responses is the same for every response, defined as

ρ = \frac{cov (ε_{h i} ε_{ℓ i})}{\sqrt{σ_{h h} σ_{ℓ ℓ}}}

;

i = 1, 2, \dots, n

;

h \neq ℓ;

h, ℓ = 1, 2, \dots, r

.

Thus, the regression curve

f_{h j} (x_{j i})

in Equation (4) is approached by a linear truncated spline function with knots

K_{h j 1}, K_{h j 2}, \dots, K_{h j u}

, as follows:

f_{h j} (x_{j i}) = α_{h j} x_{j i} + \sum_{s = 1}^{u} β_{h j s} {(x_{j i} - K_{h j s})}_{+},

(5)

with the truncated function

{(x_{j i} - K_{h j s})}_{+} = {\begin{matrix} {(x_{j i} - K_{h j s})}_{+} & , x j i \geq K_{h j s} \\ 0 & , x_{j i} < K_{h j s} \end{matrix}

Adopting the Fourier series function in [9],

g_{h k} (z_{k i})

in Equation (4) is approached by a Fourier series cosine function following the trend line (

b_{h k} z_{k i}

), as given in Equation (6):

g_{h k} (z_{k i}) = b_{h k} z_{k i} + \frac{1}{2} a_{o h k} + \sum_{t = 1}^{T} a_{t h k} \cos t z_{k i} .

(6)

2.2. Penalized Weighted Least Square

Suppose that the dataset

(y_{1 i}, y_{2 i}, \dots, y_{r i}, x_{1 i}, x_{2 i}, \dots, x_{p i}, z_{1 i}, z_{2 i}, \dots, z_{q i})

follows the multiresponse nonparametric regression in Equation (1) and is rewritten in matrix form as Equation (2). The random error

ε

in Equation (2) is normally distributed, with mean 0 and error variance–covariance matrix W, such that it can be written as

ε ~ N (0, W)

. In particular,

W

as a weighting matrix plays an important role in accommodating the correlation between responses. If there is a correlation between responses, the correlation is defined as

ρ = \frac{σ_{h ℓ}}{\sqrt{σ_{h h} σ_{ℓ ℓ}}}

, such that

σ_{h ℓ} = ρ \sqrt{σ_{h h} σ_{ℓ ℓ}}

. Note that

I

is the identity matrix. When all observations come in pairs,

W

has

σ_{h, ℓ}

elements, as described below:

W = (\begin{matrix} σ_{11} & σ_{12} & \dots & σ_{1 r} \\ σ_{12} & σ_{22} & \dots & σ_{2 r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ σ_{1 r} & σ_{2 r} & \dots & σ_{r r} \end{matrix}) \otimes I

= Σ \otimes I

The regression curve of the combined truncated spline and Fourier series estimators for multiresponse nonparametric regression was estimated by carrying out the following PWLS optimization:

\underset{g_{k} \in C [0, π]}{Min} {N^{- 1} \sum_{h = 1}^{r} \sum_{i = 1}^{n} w_{h i} {(y_{h i} - \sum_{j = 1}^{p} f_{h j} (x_{j i}) - \sum_{k = 1}^{q} g_{h k} (z_{k i}))}^{2} + \sum_{k = 1}^{q} λ_{k} \int_{0}^{π} \frac{2}{π} {(g_{k}^{″} (z_{k}))}^{2} d z_{k}},

(7)

where

N

refers to the number of observations for all response variables, alternatively written as

N = \sum_{h = 1}^{r} n_{h}

. The first component in Equation (7) is a function that measures the goodness-of-fit (GoF), while the second component is the penalty. In addition,

w_{h i}

is a weighted component, and

λ_{k}

serves as a positive smoothing parameter that controls the balance between the GoF and the penalty. In this study, during the process of regression curve estimation, the parameters

w_{h i}

and

λ_{k}

are given. Note that the superscript ^T refers to transpose and the superscript “ refers to the second derivative of the function.

As stated in the Introduction section, we estimated the regression curve of the proposed model in two stages. The first stage involves completing the estimation of the regression curve

g_{h k} (z_{k i})

through PWLS optimization, which results in Theorem 1 (see the Section 3.1). Subsequently, the second stage involves completing the estimated regression curve

f_{h j} (x_{j i})

using WLS optimization, which results in Theorem 2 (see the Section 3.1).

3. Results

3.1. Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression

As previously stated, PWLS and WLS optimization were used to obtain combined truncated spline and Fourier series estimators for multiresponse nonparametric regression. Therefore, some lemmas and theorems were required. Whenever Lemma 1 presents the GoF, Lemma 2 presents the penalty of the PWLS optimization. By using the results of Lemma 1 and Lemma 2, the first stage estimation is obtained, which is formulated in Theorem 1. The second stage involves estimating the regression curve using WLS optimization, as presented in Lemma 3, with the main result presented in Theorem 2. All the proofs of the lemmas and theorems are presented in Appendix A, Appendix B, Appendix C, Appendix D and Appendix E.

Lemma 1.

If the multiresponse nonparametric regression model is written as Equation (4) and the regression curve

g_{h k} (z_{k i})

is as given in Equation (6), the GoF can be formulated as follows:

R (g_{1}, \dots, g_{r}) = N^{- 1} {(v - Z a)}^{T} W (v - Z a),

where

W

is the

n h \times n h

weighting matrix,

v = {[\begin{matrix} v_{1}^{T} & v_{2}^{T} & \dots & v_{r}^{T} \end{matrix}]}^{T}; v_{h} = {[\begin{matrix} v_{h 1} & v_{h 2} & \dots & v_{h n} \end{matrix}]}^{T}; v_{h i} = y_{h i} - \sum_{j = 1}^{p} f_{h j} (x_{j i})

Z = [\begin{matrix} Z_{1} & 0 & \dots & 0 \\ 0 & Z_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Z_{r} \end{matrix}], a = {[\begin{matrix} a_{1}^{T} & a_{2}^{T} & \begin{matrix} \dots & a_{r}^{T} \end{matrix} \end{matrix}]}^{T},

Z_{h} = [\begin{matrix} z_{11} & 1 / 2 & \cos 1 z_{11} & \cos 2 z_{11} & \dots & \cos T z_{11} & \dots & z_{q 1} & 1 / 2 & \cos 1 z_{q 1} & \cos 2 z_{q 1} & \dots & \cos T z_{q 1} \\ z_{12} & 1 / 2 & \cos 1 z_{12} & \cos 2 z_{12} & \dots & \cos T z_{12} & \dots & z_{q 2} & 1 / 2 & \cos 1 z_{q 2} & \cos 2 z_{q 2} & \dots & \cos T z_{q 2} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ z_{1 n} & 1 / 2 & \cos 1 z_{1 n} & \cos 2 z_{1 n} & \dots & \cos T z_{1 n} & \dots & z_{q n} & 1 / 2 & \cos 1 z_{q n} & \cos 2 z_{q n} & \dots & \cos T z_{q n} \end{matrix}],

a_{h} = {[\begin{matrix} b_{1 h} & a_{01 h} & a_{11 h} & a_{21 h} & \dots & a_{T 1 h} & \dots & b_{q h} & a_{0 q h} & a_{1 q h} & a_{2 q h} & \begin{matrix} \dots & a_{T q h} \end{matrix} \end{matrix}]}^{T} .

The proof of Lemma 1 is given in Appendix A.

Lemma 2.

If the multiresponse nonparametric regression model is written as Equation (4) and the regression curve

g_{h k} (z_{k i})

is as presented in Equation (6), the penalty component is as follows:

P (λ_{1}, λ_{2}, \dots, λ_{q}) = a^{T} D (λ) a

where

D (λ) = [\begin{matrix} 0 & 0 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & λ_{1} 1^{4} & 0 & \dots & 0 & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & λ_{1} 2^{4} & \dots & 0 & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & λ_{1} T^{4} & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & \dots & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ 0 & 0 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & λ_{q} 1^{4} & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & 0 & λ_{q} 2^{4} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & 0 & ⋮ & 0 & 0 & 0 & 0 & \dots & λ_{q} T^{4} \end{matrix}] .

A complete description of the proof of Lemma 2 can be found in Appendix B.

Having discussed how to construct the GoF and the penalty, Theorem 1 addresses potential solutions to the first-stage estimation using PWLS optimization in Equation (7).

Theorem 1.

If the GoF and the penalty of the model are given in Lemma 1 and Lemma 2, the curve estimation for multiresponse nonparametric regression can be attained from PWLS optimization, as follows:

{\hat{g}}_{(K, λ, T)} (x, z) = (Z {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W) v .

The evidence of Theorem 1 is given in Appendix C. The second stage involves estimating the regression curve using WLS optimization, as described in Lemma 3, with the result in Theorem 2.

Lemma 3.

If the regression curve

f_{h j} (x_{j i})

is as described in Equation (5), WLS optimization can be obtained as

{[(I - H) (y - J γ)]}^{T} W [(I - H) (y - J γ)],

where

H = (Z {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W),

S_{i} = [\begin{matrix} {(x_{11} - K_{h 11})}_{+} & {(x_{11} - K_{h 12})}_{+} & \dots & {(x_{11} - K_{h 1 u})}_{+} & \dots & {(x_{p 1} - K_{h p 1})}_{+} & {(x_{11} - K_{h p 2})}_{+} & \dots & {(x_{11} - K_{h p u})}_{+} \\ {(x_{12} - K_{h 11})}_{+} & {(x_{12} - K_{h 12})}_{+} & \dots & {(x_{12} - K_{h 1 u})}_{+} & \dots & {(x_{p 2} - K_{h p 1})}_{+} & {(x_{12} - K_{h p 2})}_{+} & \dots & {(x_{12} - K_{h p u})}_{+} \\ ⋮ & ⋮ & ⋱ & ⋮ & \dots & ⋮ & ⋮ & ⋱ & ⋮ \\ {(x_{1 n} - K_{h 11})}_{+} & {(x_{1 n} - K_{h 12})}_{+} & \dots & {(x_{1 n} - K_{h 1 u})}_{+} & \dots & {(x_{p n} - K_{h p 1})}_{+} & {(x_{1 n} - K_{h p 2})}_{+} & \dots & {(x_{1 n} - K_{h p u})}_{+} \end{matrix}], γ = [\frac{\begin{matrix} α_{1} \\ α_{2} \\ ⋮ \\ α_{r} \end{matrix}}{\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{r} \end{matrix}}]

α_{h} = {[\begin{matrix} α_{1 h} & α_{2 h} & \dots & α_{p h} \end{matrix}]}^{T}, β_{h} = {[\begin{matrix} β_{h 11} & β_{h 12} & \dots & β_{h 1 u} & \dots & β_{h p 1} & β_{h p 2} & \dots & β_{h p u} \end{matrix}]}^{T} .

As a note, Lemma 3′s proof is provided in Appendix D.

Theorem 2.

If WLS optimization is given by Lemma 3, the regression curve estimation for multiresponse nonparametric regression can be attained from WLS optimization such that

{\hat{f}}_{(K, λ, T)} (x, z) = J {[J^{T} (I - 2 H^{T}) W J + J^{T} H^{T} W H J]}^{- 1} [J^{T} (H^{T} - I) W (H - I)] y .

(8)

= J K^{- 1} L y,

(9)

where

K = [J^{T} (I - 2 H^{T}) W J + J^{T} H^{T} W H J]

and

L = [J^{T} (H^{T} - I) W (H - I)]

.

The proof of Theorem 2 is given in Appendix E.

After obtaining

{\hat{f}}_{(K, λ, T)} (x, z)

in Theorem 2, the result of

{\hat{g}}_{(K, λ, T)} (x, z)

in Theorem 1 can be rewritten as Equation (12). First, substituting Equation (9) into Equation (A9) gives Equation (10):

\hat{a} = {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W v

= {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W (y - f)

= {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W (y - J K^{- 1} L y)

= {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W (I - J K^{- 1} L) y .

(10)

If the result of Equation (10) is substituted into Equation (A10), the regression curve estimation of the Fourier series can be rewritten as Equation (12). As a note, Equation (A9) and Equation (A10) can be found in Appendix C.

{\hat{g}}_{(K, λ, T)} (x, z) = Z \hat{a}

= Z ({[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W (I - J K^{- 1} L) y)

(11)

= H (I - J K^{- 1} L) y .

(12)

Another main finding of this study is the regression curve of the combined truncated spline and Fourier series estimators for multiresponse nonparametric regression, as presented in Corollary 1.

Corollary 1.

According to the main result in Equation (8) and Equation (11), the regression curve estimation of the combined truncated spline and Fourier series estimators for the additive model is presented below:

{\hat{μ}}_{(K, λ, T)} (x, z) = [J K^{- 1} L + H (I - J K^{- 1} L)] y .

Proof.

By using an additive model in Equation (3), the regression curve estimation of the combined estimator for multiresponse nonparametric regression can be written in the following matrix form:

{\hat{μ}}_{(K, λ, T)} (x, z) = {\hat{f}}_{(K, λ, T)} (x, z) + {\hat{g}}_{(K, λ, T)} (x, z) .

Based on the result of

{\hat{f}}_{(K, λ, T)} (x, z)

in Equation (8) and

{\hat{g}}_{(K, λ, T)} (x, z)

in Equation (11), thus

{\hat{μ}}_{(K, λ, T)} (x, z)

can be drawn as

\begin{matrix} {\hat{μ}}_{(K, λ, T)} (x, z) = {J {[J^{T} (I - 2 H^{T}) W J + J^{T} H^{T} W H J]}^{- 1} [J^{T} (H^{T} - I) W (H - I)] y} \\ + {Z ({[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W (I - J K^{- 1} L) y)} . \end{matrix}

(13)

= C (K, λ, T) y

For simplification, by substituting Equation (9) and Equation (12) into Equation (13),

{\hat{μ}}_{(K, λ, T)} (x, z)

can also be expressed as

{\hat{μ}}_{(K, λ, T)} (x, z) = [J K^{- 1} L + H (I - J K^{- 1} L)] y .

□

3.2. Estimation of Error Variance–Covariance Matrix

The result in Equation (13) shows that curve estimation of the proposed model leads to the use of an error variance–covariance matrix as a weighting matrix. Hence, estimation of the error variance–covariance matrix is presented in Theorem 3.

Theorem 3.

If the regression curve estimation is given by Corollary 1 and random error ε is normally distributed with mean 0 and error variance–covariance matrix W such that it can be written as

ε ~ N (0, W)

, the error variance–covariance matrix estimation is as follows:

\hat{W} = (\begin{matrix} {\hat{σ}}_{11} & {\hat{σ}}_{12} & \dots & {\hat{σ}}_{1 r} \\ {\hat{σ}}_{12} & {\hat{σ}}_{22} & \dots & {\hat{σ}}_{2 r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{σ}}_{1 r} & {\hat{σ}}_{2 r} & \dots & {\hat{σ}}_{r r} \end{matrix}) \otimes I

where

{\hat{σ}}_{11} = \frac{{[y_{1} - (J F + Z G) y_{1}]}^{T} [y_{1} - (J F + Z G) y_{1}]}{n}, {\hat{σ}}_{22} = \frac{{[y_{2} - (J F + Z G) y_{2}]}^{T} [y_{2} - (J F + Z G) y_{2}]}{n}, {\hat{σ}}_{r r} = \frac{{[y_{r} - (J F + Z G) y_{r}]}^{T} [y_{r} - (J F + Z G) y_{r}]}{n},

{\hat{σ}}_{12} = \frac{{[y_{1} - (J F + Z G) y_{1}]}^{T} [y_{2} - (J F + Z G) y_{2}]}{n}, {\hat{σ}}_{1 r} = \frac{{[y_{1} - (J F + Z G) y_{1}]}^{T} [y_{r} - (J F + Z G) y_{r}]}{n}, {\hat{σ}}_{2 r} = \frac{{[y_{2} - (J F + Z G) y_{2}]}^{T} [y_{r} - (J F + Z G) y_{r}]}{n},

F = (({[I - {(J^{T} J)}^{- 1} J^{T} Z {(Z^{T} Z)}^{- 1} Z^{T} J]}^{- 1} {(J^{T} J)}^{- 1} J^{T}) - ({(J^{T} J)}^{- 1} J^{T} Z {(Z^{T} Z)}^{- 1} Z^{T})),

G = (({[I - {(Z^{T} Z)}^{- 1} Z^{T} J {(J^{T} J)}^{- 1} J^{T} Z]}^{- 1} {(Z^{T} Z)}^{- 1} Z^{T}) - ({(Z^{T} Z)}^{- 1} Z^{T} J {(J^{T} J)}^{- 1} J^{T})) .

The proof of Theorem 3 is given in Appendix F.

3.3. Smoothing Parameter Selection

Another critical step in nonparametric regression modeling is selecting the optimal knot, oscillation parameter, and smoothing parameter. A large smoothing parameter produces a smoother estimator function but constrains the capability mapping of data. In contrast, if the smoothing parameter is too small, the function estimator is rougher [33]. We concluded that a suitable method was required to determine the optimal smoothing parameters. The smoothing parameter selection in this study was based on the GCV method, as conducted by [33]; the detailed procedure and optimal properties of this method have been carried out by [34]. The modified GCV method for combined truncated spline and Fourier series estimators in the multiresponse nonparametric regression model is as follows:

G C V (K, λ, T) = \frac{M S E (K, λ, T)}{{[N^{- 1} t r a c e (I - C (K, λ, T))]}^{2}}

(14)

where

M S E (K, λ, T) = N^{- 1} {(y - \hat{μ})}^{T} (y - \hat{μ}) = N^{- 1} {(y - C (K, λ, T) y)}^{T} (y - C (K, λ, T) y)

= N^{- 1} {∥ I - C (K, λ, T) y ∥}^{2}

(15)

By substituting Equation (15) into Equation (14), the optimum smoothing parameter is obtained by taking the minimum of modified GCV for the proposed model, as presented below

G C V (K_{o p t}, λ_{o p t}, T_{o p t}) = \underset{K, λ, T}{Min} {\frac{N^{- 1} {∥ I - C (K, λ, T) y ∥}^{2}}{{[N^{- 1} t r a c e (I - C (K, λ, T))]}^{2}}}

where

C (K, λ, T)

is the regression curve estimation of the proposed model, as in Equation (13).

3.4. Simulation Study

Simulations with 100 replications were carried out to verify and validate the theoretical result of the proposed model. Using three data sample sizes (

n

= 20, 50, and 100) and three different error variances (

σ^{2}

= 0.1, 0.5, and 1), random data were generated for multiresponse nonparametric regression. As shown in Figure 1, each predictor variable describes different functions, i.e., a polynomial function represents the truncated spline (

x_{i}

), while a trigonometry function with a trend represents the Fourier series function (

z_{i}

). As a note, Figure 1 only presents a partial scatterplot of the numerical example function for

n

= 100 and

σ^{2}

= 0.1. The simulation study followed the model in Equation (4), so the polynomial and trigonometric functions used in this numerical example were obtained from the following function:

y_{1 i} = 15 (x_{i} - 1) {(1 - x_{i})}^{2} - 10.5 z_{i} - 4 \cos (2 π z_{i}) + ε_{1 i}

y_{2 i} = 14 (x_{i} - 1) {(1 - x_{i})}^{2} - 7.5 z_{i} - 4 \cos (2 π z_{i}) + ε_{2 i}

y_{3 i} = 13 (x_{i} - 1) {(1 - x_{i})}^{2} - 4.5 z_{i} - 4 \cos (2 π z_{i}) + ε_{3 i}

In particular, the predictor variables

x_{i}

and

z_{i}

are generated from a Uniform (0,1) distribution, while random errors

ε_{h i}

are generated from a multivariate normal distribution. In addition,

ε_{1 i}

,

ε_{2 i}

, and

ε_{3 i}

are correlated with

c o r r (ε_{1 i}, ε_{2 i}) = c o r r (ε_{1 i}, ε_{3 i}) = c o r r (ε_{2 i}, ε_{3 i}) = ρ = 0.9

. In order for ease of the computational process and based on the results of the partial scatterplot identification from Figure 1, the simulation study was carried out with the combination of three knots (K = 1, 2, 3) and three oscillation parameters (T = 1, 2, 3). To determine the optimal smoothing parameter, the minimum GCV criteria were used.

Table 1 provides a comparison of the statistical results for

n =

100 with the error variances, the number of knots, and oscillation parameters. A complete summary of the statistical results for

n =

20 and 50 is provided in Appendix G. Table 1 shows that the smallest GCV (6.098) occurs for variance

σ^{2} =

0.1 with the combination of the three knots–three oscillation model. This model yields adequate results, with coefficient of determination (R²) = 93,202 and mean-square-error (MSE) = 5.675. Surprisingly, the same result is seen for the other sample sizes (

n =

20 and 50), where the smallest GCV is obtained for variance

σ^{2} =

0.1. In addition, a comparison between sample sizes gives the result in which the smallest GCV is gained from the simulation with the largest sample number (

n =

100). Thus, a smaller variance with a large sample size will produce a minimum GCV, which implies that the regression curve will be better estimated as well. These findings are consistent with [23], who developed mixed estimators in nonparametric regression for cross-sectional data.

3.5. Data Application

This section attempts to provide the application of the proposed model to a real dataset. The proposed model was applied to four indicators of HDI data in 38 regencies/cities across East Java Province, Indonesia, in 2018. To measure human development in some regions, the UNDP developed the HDI in 1990. In the 1990 Human Development Report, three basic dimensions of human development were defined in the HDI: income (economy), health, and education [35]. From these dimensions, four indicators of HDI data were derived: life expectancy rate, expected years of schooling, mean years of schooling, and adjusted per capita expenditure [35]. In this study, these four indicators are the response variables,

y_{1}, y_{2}, y_{3},

and

y_{4}

, respectively.

The issue of the HDI has received considerable critical attention in every province/regency in Indonesia as it is one of the determining components of the general allocation fund (DAU) [36]. As one of the largest provinces in Indonesia, East Java Province contributes significantly to Indonesia’s HDI. East Java Province’s HDI in 2018 was 70.77 [37], slightly lower than the national achievement. In addition, it was the lowest among the five other provinces of Java island in the last three years. Therefore, further studies on East Java Province’s HDI are to be potentially discussed.

To determine whether the correlation matrix has an identity matrix (homogeneity of variances), we used Bartlett’s test of sphericity with statistical significance

α =

0.05. The results showed a statistical value of

χ^{2}

= 121.993 with p-value = 0.00, which means that the decision was to reject the null hypothesis as p-value <

α

. According to this result, we can infer that there is a significant correlation among the response variables; hence, multiresponse nonparametric regression analysis can be used.

Several studies have highlighted the potential factors associated with Indonesia’s HDI, such as the percentage of people living in poverty [25,38], the unemployment rate [25,38,39], population density [25,38], and the per capita gross regional domestic product (GRDP) [25,39,40]. Based on these, the predictor variables used in this study were population density and the percentage of people living in poverty. In addition, the relationship between the response variable and each predictor variable was identified by a partial scatterplot, as shown in Figure 2. The partial scatter plot between the four indicators of the HDI and population density (

x

) showed changes at a particular subinterval that is fit for the truncated spline function. Meanwhile, the partial scatterplot between the response variable and the percentage of people living in poverty (

z

) followed a pattern that repeated at a certain interval, with a particular trend; thus, this variable was approached by the Fourier series function.

As shown in Table 2, some scenarios using a combined estimator (Model 1) and uncombined estimators (Model 2 and Model 3) were performed to compare the effectiveness of the proposed model. By using the modified GCV formula in Section 3.3, we selected the best model according to the minimum GCV as the criterion. In this study, a comparison of these three models revealed that the proposed model (Model 1) had the smallest GCV compared to the other uncombined estimator models for multiresponse nonparametric regression. In summary, these results indicate that the proposed model is better recommended for modeling the 2018 HDI data of East Java Province.

Another important result from Table 2 is the comparison of the number of knots and oscillation combinations from the proposed model. Similar to the simulation studies, we used a combination of three knot and oscillation parameters each. According to the minimum GCV as the criterion, the leading model was a three knots–three oscillations model with a GCV equal to 1.39189 and MSE = 1.08377. In addition, this model yields a coefficient of determination (R²) of 99.84%. Due to the satisfactory result of R², it can be said that the proposed model is able to describe the variance of the response variable through the predictor variables exceptionally well. Another interesting result was the number of knot and oscillation combinations of the best model in the simulation study, in line with those of data application.

Appendix H presents the results of parameter estimation from the best model obtained from the minimum GCV in Table 2. As such, we obtained a three knots–three oscillations model. Based on the parameter estimation results for each response variable for the 2018 HDI data of East Java Province, Indonesia, the multiresponse nonparametric regression model with a combined truncated spline and Fourier series estimator can be written as

\begin{matrix} {\hat{y}}_{1} = - 0.932 + 0.869 x_{1 i} - 3057.409 {(x_{1 i} - 3.718)}_{+} + 6107.433 {(x_{1 i} - 3.933)}_{+} - 3050.878 {(x_{1 i} - 4.148)}_{+} - 0.677 z_{1 i} + \\ 1 / 2 (1355.724) - 935.001 \cos z_{1 i} + 423.443 \cos 2 z_{1 i} - 92.348 \cos 3 z_{1 i} . \\ = 676.929 + 0.869 x_{1 i} - 3057.409 {(x_{1 i} - 3.718)}_{+} + 6107.433 {(x_{1 i} - 3.933)}_{+} - 3050.878 {(x_{1 i} - 4.148)}_{+} - 0.677 z_{1 i} - \\ 935.001 \cos z_{1 i} + 423.443 \cos 2 z_{1 i} - 92.348 \cos 3 z_{1 i} . \end{matrix}

\begin{matrix} {\hat{y}}_{2} = - 0.747 + 0.393 x_{1 i} - 619.153 {(x_{1 i} - 3.718)}_{+} + 1235.608 {(x_{1 i} - 3.933)}_{+} - 616.769 {(x_{1 i} - 4.148)}_{+} - 0.180 z_{1 i} + \\ 1 / 2 (274.146) - 189.489 \cos z_{1 i} + 85.224 \cos 2 z_{1 i} - 18.760 \cos 3 z_{1 i} . \\ = 136.326 + 0.393 x_{1 i} - 619.153 {(x_{1 i} - 3.718)}_{+} + 1235.608 {(x_{1 i} - 3.933)}_{+} - 616.769 {(x_{1 i} - 4.148)}_{+} - 0.180 z_{1 i} - \\ 189.489 \cos z_{1 i} + 85.224 \cos 2 z_{1 i} - 18.760 \cos 3 z_{1 i} . \end{matrix}

\begin{matrix} {\hat{y}}_{3} = - 0.102 + 0.865 x_{1 i} - 510.292 {(x_{1 i} - 3.718)}_{+} + 1015.208 {(x_{1 i} - 3.933)}_{+} - 505.678 {(x_{1 i} - 4.148)}_{+} - 0.411 z_{1 i} + \\ 1 / 2 (224.955) - 156.440 \cos z_{1 i} + 69.027 \cos 2 z_{1 i} - 15.588 \cos 3 z_{1 i} . \\ = 112.375 + 0.865 x_{1 i} - 510.292 {(x_{1 i} - 3.718)}_{+} + 1015.208 {(x_{1 i} - 3.933)}_{+} - 505.678 {(x_{1 i} - 4.148)}_{+} - 0.411 z_{1 i} - \\ 156.440 \cos z_{1 i} + 69.027 \cos 2 z_{1 i} - 15.588 \cos 3 z_{1 i} . \end{matrix}

\begin{matrix} {\hat{y}}_{4} = - 0.482 + 0.969 x_{1 i} - 684.712 {(x_{1 i} - 3.718)}_{+} + 1363.397 {(x_{1 i} - 3.933)}_{+} - 679.163 {(x_{1 i} - 4.148)}_{+} - 0.456 z_{1 i} + \\ 1 / 2 (302.146) - 209.894 \cos z_{1 i} + 92.927 \cos 2 z_{1 i} - 20.891 \cos 3 z_{1 i} . \\ = 150.591 + 0.969 x_{1 i} - 684.712 {(x_{1 i} - 3.718)}_{+} + 1363.397 {(x_{1 i} - 3.933)}_{+} - 679.163 {(x_{1 i} - 4.148)}_{+} - 0.456 z_{1 i} - \\ 209.894 \cos z_{1 i} + 92.927 \cos 2 z_{1 i} - 20.891 \cos 3 z_{1 i} . \end{matrix}

Figure 3 presents a comparison between the response variable and the fitted values using the proposed model. From the graph, we can see that all fitted values resemble the pattern of the real data; although some do not, there are only a few deviations. Thus, the proposed model can produce a prediction model.

4. Discussion and Conclusions

This study presents the major findings from regression curve estimation with a combined truncated spline and Fourier series estimator for an additive model in multiresponse nonparametric regression using PWLS and WLS optimization, as shown below:

{\hat{f}}_{(K, λ, T)} (x, z) = J {[J^{T} (I - 2 H^{T}) W J + J^{T} H^{T} W H J]}^{- 1} [J^{T} (H^{T} - I) W (H - I)] y,

{\hat{g}}_{(K, λ, T)} (x, z) = Z ({[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W (I - J K^{- 1} L) y),

{\hat{μ}}_{(K, λ, T)} (x, z) = [J K^{- 1} L + H (I - J K^{- 1} L)] y

Furthermore, the result of error variance–covariance matrix estimation is as follows:

\hat{W} = (\begin{matrix} {\hat{σ}}_{11} & {\hat{σ}}_{12} & \dots & {\hat{σ}}_{1 r} \\ {\hat{σ}}_{12} & {\hat{σ}}_{22} & \dots & {\hat{σ}}_{2 r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {\hat{σ}}_{1 r} & {\hat{σ}}_{2 r} & \dots & {\hat{σ}}_{r r} \end{matrix}) \otimes I

From the simulation study, the minimum GCV obtained a satisfactory outcome through a large sample with small variance. In addition, data application to a real dataset of the 2018 HDI in East Java revealed that the proposed model had a better result than the uncombined model. An adequate coefficient of determination (R²) from the best model indicates that the proposed model can explain the data variation remarkably well. Interestingly, the number of knots and oscillations of the best model in this simulation study and data application is consistent; that is a combination of three knots–three oscillations. In summary, these findings have significant implications for the understanding of regression curve estimation when using combined estimators for multiresponse nonparametric regression. Although the focus of the research was on combined truncated spline and Fourier series estimators, the procedure in this study can be applied to other combinations of estimators for multiresponse nonparametric regression.

The scope of this study was limited in terms of curve estimation and estimation of the variance–covariance matrix; thus, the major limitation was the absence of hypothesis testing and confidence intervals testing. Considering its necessity for good statistical practice, there is a commitment to perform such a goodness-of-fit test (model checking). Therefore, further research will be conducted to achieve hypothesis testing and confidence intervals testing. Another possible area is that a further simulation study could be performed using another function, such as an exponential or logarithmic function, to validate the capability of the proposed model. An additional variation of

ρ

in a simulation study could be used to gain more insights into the performance evaluation of the proposed model.

Author Contributions

Conceptualization, H.N., I.N.B., and I.Z.; methodology, H.N. and I.N.B.; writing—original draft preparation, H.N.; writing—review and editing, H.N., I.N.B., and I.Z. All authors have read and agreed to the published version of the manuscript.

Funding

H.N. thanks BPS—Statistics Indonesia, which has bestowed a doctoral scholarship through APBN BPS. This research was funded by Deputi Bidang Penguatan Riset dan Pengembangan, Ministry of Research and Technology/National Research and Innovation Agency (Kemenristek or RISTEK-BRIN), the Republic of Indonesia, via grant Penelitian Disertasi Doktor (PDD) in 2021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: BPS—Statistics of Jawa Timur Province, Indonesia, https://jatim.bps.go.id, accessed on 1 March 2021.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

GoF	goodness-of-fit
GCV	generalized cross-validation
MSE	mean square error
MLE	maximum likelihood estimation
OLS	ordinary least square
PLS	penalized least square
PWLS	penalized weighted least square
WLS	weighted least square

Appendix A

Lemma 1 can be proven by the completed

R (g_{h k} (z_{k 1}), g_{h k} (z_{k 2}), \dots, g_{h k} (z_{k n}))

as the GoF of PWLS optimization in Equation (7), as shown below:

R (g_{h k} (z_{k 1}), g_{h k} (z_{k 2}), \dots, g_{h k} (z_{k n})) = N^{- 1} \sum_{h = 1}^{r} \sum_{i = 1}^{n} w_{h i} {(y_{h i} - \sum_{j = 1}^{p} f_{h j} (x_{j i}) - \sum_{k = 1}^{q} g_{h k} (z_{k i}))}^{2} .

(A1)

If

v_{h i} = y_{h i} - \sum_{j = 1}^{p} f_{h j} (x_{j i})

, then Equation (A1) can be written as

R (g_{h k} (z_{k 1}), g_{h k} (z_{k 2}), \dots, g_{h k} (z_{k n})) = N^{- 1} \sum_{h = 1}^{r} \sum_{i = 1}^{n} w_{h i} {(v_{h i} - \sum_{k = 1}^{q} g_{h k} (z_{k i}))}^{2},

(A2)

where

g_{h k} (z_{k i})

is the Fourier series function, as shown in Equation (6). Furthermore, the function

g_{h k} (z_{k i})

with one predictor variable, symbolized as

k

, can be written in the following matrix form:

g_{h} = [\begin{matrix} g_{h k} (z_{k 1}) \\ g_{h k} (z_{k 2}) \\ ⋮ \\ g_{h k} (z_{k n}) \end{matrix}] = [\begin{matrix} b_{h k} z_{k 1} + \frac{1}{2} a_{0 h k} + a_{1 h k} \cos 1 z_{k 1} + a_{2 h k} \cos 2 z_{k 1} + \dots + a_{T h k} \cos T z_{k 1} \\ b_{h k} z_{k 2} + \frac{1}{2} a_{0 h k} + a_{1 h k} \cos 1 z_{k 2} + a_{2 h k} \cos 2 z_{k 2} + \dots + a_{T h k} \cos T z_{k 2} \\ ⋮ \\ b_{h k} z_{k n} + \frac{1}{2} a_{0 h k} + a_{1 h k} \cos 1 z_{k n} + a_{2 h k} \cos 2 z_{k n} + \dots + a_{T h k} \cos T z_{k n} \end{matrix}]

= [\begin{matrix} z_{k 1} & 1 / 2 & \cos 1 z_{k 1} & \cos 2 z_{k 1} & \dots & \cos 1 z_{k 1} \\ z_{k 2} & 1 / 2 & \cos 1 z_{k 2} & \cos 2 z_{k 2} & \dots & \cos T z_{k 2} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ z_{k n} & 1 / 2 & \cos 1 z_{k n} & \cos 2 z_{k n} & \dots & \cos T z_{k n} \end{matrix}] [\begin{matrix} b_{h k} \\ a_{0 h k} \\ a_{1 h k} \\ a_{2 h k} \\ ⋮ \\ a_{T h k} \end{matrix}]

= Z_{k} a_{h k}

(A3)

Similarly, for

k = 1, 2, \dots, q

predictors, the Fourier series function for the multiresponse nonparametric regression presented in Equation (A3) can be written as

g_{h} = Z_{1} a_{h 1} + Z_{2} a_{h 2} + \dots + Z_{q} a_{h q}

= \sum_{k = 1}^{q} Z_{k} a_{h k} = Z_{h} a_{h},

such that

g = [\begin{matrix} g_{1}^{T} \\ g_{2}^{T} \\ ⋮ \\ g_{r}^{T} \end{matrix}] = [\begin{matrix} Z_{1} a_{1} \\ Z_{2} a_{2} \\ ⋮ \\ Z_{r} a_{r} \end{matrix}] = [\begin{matrix} Z_{1} & 0 & \dots & 0 \\ 0 & Z_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Z_{r} \end{matrix}] [\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{r} \end{matrix}] = Z a

(A4)

Thus, the GoF in Equation (7), where

g_{h k} (z_{k i})

is the Fourier series function, can be expressed as

R (g_{h k} (z_{k 1}), g_{h k} (z_{k 2}), \dots, g_{h k} (z_{k n})) = N^{- 1} \sum_{h = 1}^{r} \sum_{i = 1}^{n} w_{h i} {(v_{h i} - \sum_{k = 1}^{q} g_{h k} (z_{k i}))}^{2}

= N^{- 1} \sum_{h = 1}^{r} \sum_{i = 1}^{n} w_{h i} {(v_{h i} - \sum_{k = 1}^{q} (b_{h k} z_{k i} + \frac{1}{2} a_{0 h k} + \sum_{t = 1}^{T} a_{t h k} \cos t z_{k i}))}^{2} .

(A5)

As a result, the GoF component in Equation (A5) can be drawn in the following matrix form:

R (g_{1}, \dots, g_{r}) = N^{- 1} W {(v - Z a)}^{2}

= N^{- 1} {(v - Z a)}^{T} W (v - Z a) .

Appendix B

The penalty component of the PWLS optimization in Equation (7) is as follows:

\sum_{k = 1}^{q} λ_{k} \int_{0}^{π} \frac{2}{π} {(g_{k}^{″} (z_{k}))}^{2} d z_{k},

(A6)

where

g_{h k} (z_{k i})

is presented in Equation (6). To prove Lemma 2, let us begin with solving the second derivative in Equation (A6), as follows:

\int_{0}^{π} \frac{2}{π} {(g_{k}^{″} (z_{k}))}^{2} d z_{k} = \int_{0}^{π} \frac{2}{π} {(\frac{d}{d z_{k}} [\frac{d}{d z_{k}} (b_{h k} z_{k i} + \frac{1}{2} a_{0 h k} + \sum_{t = 1}^{T} a_{t h k} \cos t z_{k i})])}^{2} d z_{k}

= \frac{2}{π} \int_{0}^{π} {(\sum_{t = 1}^{T} t^{2} a_{t h k} \cos t z_{k i})}^{2} d z_{k}

= \frac{2}{π} \int_{0}^{π} {\sum_{t = 1}^{T} {(t^{2} a_{t h k} \cos t z_{k i})}^{2} + 2 \sum_{t < u}^{T} (t^{2} a_{t h k} \cos t z_{k i}) (u^{2} a_{t h k} \cos u z_{k i})} d z_{k}

= \frac{2}{π} \sum_{t = 1}^{T} \int_{0}^{π} (t^{4} a_{t h k}^{2} \cos^{2} t z_{k i}) d z_{k} + \frac{4}{π} \int_{0}^{π} \sum_{t < u}^{T} ({(t u)}^{2} a_{t h k} \cos t z_{k i} a_{u h k} \cos u z_{k i}) d z_{k}

= \sum_{t = 1}^{T} t^{4} a_{t h k}^{2} .

(A7)

After obtaining the result in Equation (A7), the solution to the penalty component in Equation (A6) can be written as

P (λ_{1}, λ_{2}, \dots, λ_{q}) = \sum_{k = 1}^{q} λ_{k} \int_{0}^{π} \frac{2}{π} {(g_{k}^{″} (z_{k}))}^{2} d z_{k}

= \sum_{k = 1}^{q} (λ_{k} \sum_{t = 1}^{T} t^{4} a_{t h k}^{2})

= a^{T} D (λ) a .

Appendix C

According to the result of the GoF in Lemma 1 and the penalty in Lemma 2, the PWLS optimization in Equation (7) can be written as

\underset{g_{k} \in C [0, π]}{Min} {N^{- 1} \sum_{h = 1}^{r} \sum_{i = 1}^{n} w_{h i} {(y_{h i} - \sum_{j = 1}^{p} f_{h j} (x_{j i}) - \sum_{k = 1}^{q} g_{h k} (z_{k i}))}^{2} + \sum_{k = 1}^{q} λ_{k} \int_{0}^{π} \frac{2}{π} {(g_{k}^{″} (z_{k}))}^{2} d z_{k}}

= \underset{a \in R^{(2 + T) q h x 1}}{M i n} {N^{- 1} {(v - Z a)}^{T} W (v - Z a) + a^{T} D (λ) a}

If

F (a) = N^{- 1} {(v - Z a)}^{T} W (v - Z a) + a^{T} D (λ) a

, then

F (a) = N^{- 1} v^{T} W v - 2 N^{- 1} a^{T} Z^{T} W v + N^{- 1} a^{T} Z^{T} W Z a + a^{T} D (λ) a,

such that

\underset{a \in R^{(2 + T) q h x 1}}{M i n} {N^{- 1} {(v - Z a)}^{T} W (v - Z a) + a^{T} D (λ) a}

= \underset{a \in R^{(2 + T) q h x 1}}{M i n} {N^{- 1} v^{T} W v - 2 N^{- 1} a^{T} Z^{T} W v + N^{- 1} a^{T} Z^{T} W Z a + a^{T} D (λ) a} .

(A8)

To solve the optimization in Equation (A8), the estimator

\hat{a}

can be obtained by deriving the partial Equation (A8) against

a

and equating the result with 0, as follows:

\frac{\partial F (a)}{\partial a} = \frac{\partial [N^{- 1} v^{T} W v - 2 N^{- 1} a^{T} Z^{T} W v + N^{- 1} a^{T} Z^{T} W Z a + a^{T} D (λ) a]}{\partial a}

\hat{a} = {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W v .

(A9)

The subsequent step is to substitute

\hat{a}

in Equation (A9) into Equation (A4), such that the estimator of the truncated spline function can be written as

{\hat{g}}_{(K, λ, T)} (x, z) = Z \hat{a}

(A10)

= (Z {[Z^{T} W Z + N D (λ)]}^{- 1} Z^{T} W) v

= H v .

(A11)

Appendix D

The solution in Theorem 1 still contains an

f

component, where

f

is a truncated linear spline function, as described in Equation (5). Hence, to complete the PWLS optimization in Equation (7), it is necessary to estimate the regression curve of

f_{h j} (x_{j i})

. First, the multiresponse nonparametric regression curve of the truncated spline component in Equation (5), which involves only one predictor, is written in the following matrix form:

f_{h} = [\begin{matrix} f_{h j} (x_{j 1}) \\ f_{h j} (x_{j 2}) \\ ⋮ \\ f_{h j} (x_{j n}) \end{matrix}] = [\begin{matrix} α_{h j} x_{j 1} + \sum_{s = 1}^{u} β_{h j s} {(x_{j 1} - K_{h j s})}_{+} \\ α_{h j} x_{j 2} + \sum_{s = 1}^{u} β_{h j s} {(x_{j 2} - K_{h j s})}_{+} \\ ⋮ \\ α_{h j} x_{j n} + \sum_{s = 1}^{u} β_{h j s} {(x_{j n} - K_{h j s})}_{+} \end{matrix}]

= [\begin{matrix} x_{j 1} \\ x_{j 2} \\ ⋮ \\ x_{j n} \end{matrix}] α_{h j} + [\begin{matrix} {(x_{j 1} - K_{h j 1})}_{+} & {(x_{j 1} - K_{h j 2})}_{+} & \dots & {(x_{j 1} - K_{h j u})}_{+} \\ {(x_{j 2} - K_{h j 1})}_{+} & {(x_{j 2} - K_{h j 2})}_{+} & \dots & {(x_{j 2} - K_{h j u})}_{+} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {(x_{j n} - K_{h j 1})}_{+} & {(x_{j n} - K_{h j 2})}_{+} & \dots & {(x_{j n} - K_{h j u})}_{+} \end{matrix}] [\begin{matrix} β_{h j 1} \\ β_{h j 2} \\ ⋮ \\ β_{h j u} \end{matrix}]

= x_{j i} α_{h j} + S_{j i} β_{h j}

(A12)

Consequently,

f_{h}

with the

j = 1, 2, \dots, p

predictor can be written as follows:

f_{h} = x_{1 i} α_{1 h} + S_{1 i} β_{1 h} + x_{2 i} α_{2 h} + S_{2 i} β_{2 h} + \dots + x_{p i} α_{p h} + S_{p i} β_{p h}

= X_{i} α_{h} + S_{i} β_{h},

(A13)

According to Equation (A13), the truncated spline function for multiresponse nonparametric regression can be expressed in matrix form, as follows:

f = [\begin{matrix} f_{1}^{T} \\ f_{2}^{T} \\ ⋮ \\ f_{r}^{T} \end{matrix}] = [\begin{matrix} X_{1} α_{1} \\ X_{2} α_{2} \\ ⋮ \\ X_{r} α_{r} \end{matrix}] + [\begin{matrix} S_{1} β_{1} \\ S_{2} β_{2} \\ ⋮ \\ S_{r} β_{r} \end{matrix}] = [\begin{matrix} X_{1} & 0 & \dots & 0 \\ 0 & X_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & X_{r} \end{matrix}] [\begin{matrix} α_{1} \\ α_{2} \\ ⋮ \\ α_{r} \end{matrix}] + [\begin{matrix} S_{1} & 0 & \dots & 0 \\ 0 & S_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & S_{r} \end{matrix}] [\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{r} \end{matrix}]

= [\begin{matrix} X_{1} & 0 & \dots & 0 \\ 0 & X_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & X_{r} \end{matrix} | \begin{matrix} S_{1} & 0 & \dots & 0 \\ 0 & S_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & S_{r} \end{matrix}] [\frac{\begin{matrix} α_{1} \\ α_{2} \\ ⋮ \\ α_{r} \end{matrix}}{\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{r} \end{matrix}}]

= [X | S] [\frac{α}{β}] = J γ .

(A14)

To solve the regression curve estimation of the truncated spline and Fourier series estimators for multiresponse nonparametric regression with PWLS optimization, substitute Equations (A11) and (A14) into Equation (4), as follows:

y = f + g + ε; g = H v

ε = y - f - H v; v = y - f

= y - f - H (y - f); f = J γ

= (I - H) (y - J γ) .

(A15)

The subsequent step is to solve Equation (A15) with WLS optimization. Therefore, the estimator can be obtained by completing the following WLS:

ε^{T} ε = {[(I - H) (y - J γ)]}^{T} W [(I - H) (y - J γ)]

Appendix E

The second stage of the proposed method is performed using WLS optimization, as follows:

\underset{γ \in R^{(1 + u) n r}}{Min} {{[(I - H) (y - J γ)]}^{T} W [(I - H) (y - J γ)]}

If

T (γ) = {[y - H y - J γ + H J γ]}^{T} W [y - H y - J γ + H J γ]

, then we perform the multiplication in parentheses to obtain,

\begin{matrix} T (γ) = y^{T} W y + y^{T} H^{T} W H y + γ^{T} J^{T} W J γ + γ^{T} J^{T} H^{T} W H J γ - 2 y^{T} H^{T} W y - 2 γ^{T} J^{T} W y \\ + 2 γ^{T} J^{T} H^{T} W y + 2 γ^{T} J^{T} W H y - 2 γ^{T} J^{T} H^{T} W H y - 2 γ^{T} J^{T} H^{T} W J γ . \end{matrix}

WLS optimization can be obtained by partial derivation of

T (γ)

against

γ

. When the expression is equal to 0, the result is Equation (A16):

\begin{matrix} \frac{\partial T (γ)}{\partial γ} = \frac{\partial}{\partial γ} (y^{T} W y + y^{T} H^{T} W H y + γ^{T} J^{T} W J γ + γ^{T} J^{T} H^{T} W H J γ - 2 y^{T} H^{T} W y \\ - 2 γ^{T} J^{T} W y + 2 γ^{T} J^{T} H^{T} W y + 2 γ^{T} J^{T} W H y - 2 γ^{T} J^{T} H^{T} W H y - 2 γ^{T} J^{T} H^{T} W J γ) = 0, \end{matrix}

\hat{γ} = {[J^{T} (I - 2 H^{T}) W J + J^{T} H^{T} W H J]}^{- 1} [J^{T} (H^{T} - I) W (H - I)] y .

(A16)

Substituting Equation (A16) into Equation (A14) yields

{\hat{f}}_{(K, λ, T)} (x, z) = J \hat{γ}

(A17)

= J {[J^{T} (I - 2 H^{T}) W J + J^{T} H^{T} W H J]}^{- 1} [J^{T} (H^{T} - I) W (H - I)] y

= J K^{- 1} L y .

(A18)

Appendix F

Assume random error

ε

in Equation (2) is normally distributed with mean 0 and error variance–covariance matrix

W

such that it can be written as

ε ~ N (0, W)

. Thus, matrix

W

is outlined as follows

W = V a r (ε) = E [(ε - E (ε)) {(ε - E (ε))}^{T}]

= E [{(ε_{11}, ε_{12}, \dots, ε_{1 n}, ε_{21}, ε_{22}, \dots, ε_{2 n}, \dots, ε_{r 1}, ε_{r 2}, \dots, ε_{r n})}^{T} (ε_{11}, ε_{12}, \dots, ε_{1 n}, ε_{21}, ε_{22}, \dots, ε_{2 n}, \dots, ε_{r 1}, ε_{r 2}, \dots, ε_{r n})]

= E (\begin{array}{l} ε_{11}^{2} & ε_{11} ε_{12} & \dots & ε_{11} ε_{1 n} & ε_{11} ε_{21} & ε_{11} ε_{22} & \dots & ε_{11} ε_{2 n} & \dots & ε_{11} ε_{r 1} & ε_{11} ε_{r 2} & \dots & ε_{11} ε_{r n} \\ ε_{12} ε_{11} & ε_{12}^{2} & \dots & ε_{12} ε_{1 n} & ε_{12} ε_{21} & ε_{12} ε_{22} & \dots & ε_{12} ε_{2 n} & \dots & ε_{12} ε_{r 1} & ε_{12} ε_{r 2} & \dots & ε_{12} ε_{r n} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ε_{1 n} ε_{11} & ε_{21} ε_{12} & \dots & ε_{1 n}^{2} & ε_{1 n} ε_{21} & ε_{1 n} ε_{22} & \dots & ε_{1 n} ε_{2 n} & \dots & ε_{1 n} ε_{r 1} & ε_{1 n} ε_{r 2} & \dots & ε_{1 n} ε_{r n} \\ ε_{21} ε_{11} & ε_{21} ε_{12} & \dots & ε_{21} ε_{1 n} & ε_{21}^{2} & ε_{21} ε_{22} & \dots & ε_{21} ε_{2 n} & \dots & ε_{21} ε_{r 1} & ε_{21} ε_{r 2} & \dots & ε_{21} ε_{r n} \\ ε_{22} ε_{11} & ε_{22} ε_{12} & \dots & ε_{22} ε_{1 n} & ε_{22} ε_{21} & ε_{22}^{2} & \dots & ε_{22} ε_{2 n} & \dots & ε_{22} ε_{r 1} & ε_{22} ε_{r 2} & \dots & ε_{22} ε_{r n} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ε_{2 n} ε_{11} & ε_{2 n} ε_{12} & \dots & ε_{2 n} ε_{1 n} & ε_{2 n} ε_{21} & ε_{2 n} ε_{22} & \dots & ε_{2 n}^{2} & \dots & ε_{2 n} ε_{r 1} & ε_{2 n} ε_{r 2} & \dots & ε_{2 n} ε_{r n} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ ε_{r 1} ε_{11} & ε_{r 1} ε_{12} & \dots & ε_{r 1} ε_{1 n} & ε_{r 1} ε_{21} & ε_{r 1} ε_{22} & \dots & ε_{r 1} ε_{2 n} & \dots & ε_{r 1}^{2} & ε_{r 1} ε_{r 2} & \dots & ε_{r 1} ε_{r n} \\ ε_{r 2} ε_{11} & ε_{r 2} ε_{12} & \dots & ε_{r 2} ε_{1 n} & ε_{r 2} ε_{21} & ε_{r 2} ε_{22} & \dots & ε_{r 2} ε_{2 n} & \dots & ε_{r 2} ε_{r 1} & ε_{r 2}^{2} & \dots & ε_{r 2} ε_{r n} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ε_{r n} ε_{11} & ε_{r n} ε_{12} & \dots & ε_{r n} ε_{1 n} & ε_{r n} ε_{21} & ε_{r n} ε_{22} & \dots & ε_{r n} ε_{2 n} & \dots & ε_{r n} ε_{r 1} & ε_{r n} ε_{r 2} & \dots & ε_{r n}^{2} \end{array}) .

(A19)

The correlation between response variables, define as

c o r r (ε_{h i} ε_{ℓ i})

, is assumed to be constant as

ρ

for all response variables and succeeded by the other assumptions, as follows:

\begin{matrix} E (ε_{h i} ε_{h i^{'}}) = {\begin{array}{l} σ_{h h}, i f i = i^{'} \\ 0, i f i \neq i^{'} \end{array} w h e r e h = 1, 2, \dots, r \\ E (ε_{h i} ε_{ℓ i^{'}}) = {\begin{array}{l} σ_{h ℓ}, i f i = i^{'} \\ 0, i f i \neq i^{'} \end{array} w i t h σ_{h ℓ} = ρ \sqrt{σ_{h h} σ_{ℓ ℓ}} \end{matrix}

According to the above assumption, Equation (A19) can be written as

W = (\begin{array}{l} σ_{11} & σ_{12} & \dots & σ_{1 r} \\ σ_{12} & σ_{22} & \dots & σ_{2 r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ σ_{1 r} & σ_{2 r} & \dots & σ_{r r} \end{array}) \otimes I

= Σ \otimes I

(A20)

The estimation of matrix

W

is obtained by the maximum likelihood estimation (MLE) method, as in Equation (A20). Furthermore, substitute Equation (A10) and Equation (A17) into

μ (x, z)

; then, the likelihood presented in Equation (A22) is as follows:

L (W | μ, y) = {(2 π)}^{- r n / 2} {| W |}^{- n / 2} \exp [- \frac{1}{2} \sum_{i = 1}^{n} ({(y - μ (x, z))}^{T} W^{- 1} (y - μ (x, z)))]

(A21)

= {(2 π)}^{- r n / 2} {| W |}^{- n / 2} \exp [- \frac{1}{2} \sum_{i = 1}^{n} ({(y - (J γ + Z a))}^{T} W^{- 1} (y - (J γ + Z a)))] .

(A22)

By taking the natural logarithm from the likelihood function and substituting

W

, as in Equation (A20), the result is as follows:

\ln L (Σ) = - \frac{r n}{2} \ln (2 π) - \frac{n}{2} \ln | Σ \otimes I | - \frac{1}{2} \sum_{i = 1}^{n} [{(y - (J γ + Z a))}^{T} {(Σ \otimes I)}^{- 1} (y - (J γ + Z a))]

(A23)

Point

(y - (J γ + Z a))

into the form of

v e c (C)

{(y - (J γ + Z a))}^{T} {(Σ \otimes I)}^{- 1} (y - (J γ + Z a)) = {(v e c (C))}^{T} {(Σ \otimes I)}^{- 1} (v e c (C)) = t r (Σ^{- 1} C^{T} C)

such that Equation (A23) can be written as

\ln L (Σ | γ, a, y) = - \frac{r n}{2} \ln (2 π) - \frac{n}{2} \ln | Σ | - \frac{1}{2} t r (Σ^{- 1} C^{T} C)

(A24)

The maximum value of the likelihood function is obtained by partially deriving Equation (A24) against

σ_{h ℓ}

and equating the result with 0, as follows:

\frac{\partial}{\partial σ_{h ℓ}} \ln L (\hat{Σ} | γ, a, y) = - \frac{n}{2} (\frac{\partial}{\partial σ_{h ℓ}} \ln | \hat{Σ} |) - \frac{1}{2} \frac{\partial}{\partial σ_{h ℓ}} t r ({\hat{Σ}}^{- 1} C^{T} C) = 0

such that

\hat{Σ} = \frac{1}{n} (C^{T} C)

= (\begin{matrix} \frac{{[y_{1} - (J γ + Z a)]}^{T} [y_{1} - (J γ + Z a)]}{n} & \frac{{[y_{1} - (J γ + Z a)]}^{T} [y_{2} - (J γ + Z a)]}{n} & \dots & \frac{{[y_{1} - (J γ + Z a)]}^{T} [y_{r} - (J γ + Z a)]}{n} \\ \frac{{[y_{1} - (J γ + Z a)]}^{T} [y_{2} - (J γ + Z a)]}{n} & \frac{{[y_{2} - (J γ + Z a)]}^{T} [y_{2} - (J γ + Z a)]}{n} & \dots & \frac{{[y_{2} - (J γ + Z a)]}^{T} [y_{r} - (J γ + Z a)]}{n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{{[y_{1} - (J γ + Z a)]}^{T} [y_{r} - (J γ + Z a)]}{n} & \frac{{[y_{2} - (J γ + Z a)]}^{T} [y_{r} - (J γ + Z a)]}{n} & \dots & \frac{{[y_{r} - (J γ + Z a)]}^{T} [y_{r} - (J γ + Z a)]}{n} \end{matrix})

(A25)

The subsequent step is to estimate

γ

and

a

using the ordinary least square (OLS) method, with the result as follows:

\frac{\partial ℚ}{\partial γ} = \frac{\partial [y^{T} y - 2 γ^{T} J^{T} y - 2 a^{T} Z^{T} y + 2 a^{T} Z^{T} J γ + γ^{T} J^{T} J γ + a^{T} Z^{T} Z a]}{\partial γ}

{\hat{γ}}_{ℓ s} = (({[I - {(J^{T} J)}^{- 1} J^{T} Z {(Z^{T} Z)}^{- 1} Z^{T} J]}^{- 1} {(J^{T} J)}^{- 1} J^{T}) - ({(J^{T} J)}^{- 1} J^{T} Z {(Z^{T} Z)}^{- 1} Z^{T})) y = F y

(A26)

\frac{\partial ℚ}{\partial a} = \frac{\partial [y^{T} y - 2 γ^{T} J^{T} y - 2 a^{T} Z^{T} y + 2 a^{T} Z^{T} J γ + γ^{T} J^{T} J γ + a^{T} Z^{T} Z a]}{\partial a}

{\hat{a}}_{ℓ s} = (({[I - {(Z^{T} Z)}^{- 1} Z^{T} J {(J^{T} J)}^{- 1} J^{T} Z]}^{- 1} {(Z^{T} Z)}^{- 1} Z^{T}) - ({(Z^{T} Z)}^{- 1} Z^{T} J {(J^{T} J)}^{- 1} J^{T})) y = G y

(A27)

Substituting the result of

{\hat{γ}}_{ℓ s}

and

{\hat{a}}_{ℓ s}

above into Equation (A25), error variance–covariance matrix

W

in Equation (A20) can be written as follows:

\hat{W} = \hat{Σ} \otimes I,

where

{\hat{σ}}_{11} = \frac{{[y_{1} - (J F + Z G) y_{1}]}^{T} [y_{1} - (J F + Z G) y_{1}]}{n}, {\hat{σ}}_{22} = \frac{{[y_{2} - (J F + Z G) y_{2}]}^{T} [y_{2} - (J F + Z G) y_{2}]}{n}, {\hat{σ}}_{r r} = \frac{{[y_{r} - (J F + Z G) y_{r}]}^{T} [y_{r} - (J F + Z G) y_{r}]}{n},

{\hat{σ}}_{12} = \frac{{[y_{1} - (J F + Z G) y_{1}]}^{T} [y_{2} - (J F + Z G) y_{2}]}{n}, {\hat{σ}}_{1 r} = \frac{{[y_{1} - (J F + Z G) y_{1}]}^{T} [y_{r} - (J F + Z G) y_{r}]}{n}, {\hat{σ}}_{2 r} = \frac{{[y_{2} - (J F + Z G) y_{2}]}^{T} [y_{r} - (J F + Z G) y_{r}]}{n}

Appendix G

Table A1. Summary of statistical results for n = 20 and 50 with error variances, the number of knots, and oscillation parameters.

Variance	Number of Oscillations	Number of Knots	n = 20			n = 50
Variance	Number of Oscillations	Number of Knots	GCV	R²	MSE	GCV	R²	MSE
0.1	1	1	7.023	88.665	4.527	7.207	90.170	6.735
		2	6.859	89.866	4.080	6.848	90.607	6.312
		3	6.470	91.672	3.336	6.685 *	91.172	6.075
	2	1	6.777	87.649	5.036	7.074	91.133	5.960
		2	6.386 *	88.288	4.712	6.972	91.609	5.638
		3	6.388	89.881	4.085	7.165	91.835	5.489
	3	1	6.535	86.398	5.294	7.074	91.133	5.959
		2	6.601	88.112	5.151	6.972	91.610	5.638
		3	6.617	87.929	4.970	7.166	91.833	5.489
0.5	1	1	9.185	93.280	5.667	9.217	90.952	7.783
		2	10.010	93.303	5.655	7.141	93.293	5.769
		3	10.139	93.824	5.219	7.265	93.384	5.692
	2	1	7.970	92.470	6.455	9.215	90.933	7.797
		2	8.193	92.538	6.393	7.133	93.284	5.775
		3	8.012	92.896	6.018	7.293	93.436	5.645
	3	1	10.772	92.210	6.599	9.215	90.930	7.799
		2	9.263	93.328	5.655	7.132	93.283	5.776
		3	8.396	92.557	6.306	7.293	93.433	5.647
1.0	1	1	9.256	92.935	5.719	7.559	94.734	7.064
		2	10.136	93.218	5.491	7.319	94.972	6.746
		3	10.241	93.751	5.058	6.890	95.332	6.262
	2	1	8.019	91.988	6.495	7.585	95.227	6.401
		2	7.838	92.518	6.116	7.461	95.508	6.024
		3	7.330	93.445	5.506	7.380	95.753	5.698
	3	1	10.593	91.666	6.763	7.490	94.862	6.903
		2	7.668	92.629	5.983	7.175	95.155	6.521
		3	8.697	93.833	5.013	6.968	95.351	6.245

* The smallest value of GCV.

Appendix H

Table A2. Results of parameter estimation of the best model.

Response Variable	Parameter	Estimation	Response Variable	Parameter	Estimation
$y_{1}$	$α_{01}$	−0.932	$y_{3}$	$α_{03}$	−0.102
	$α_{11}$	0.869		$α_{31}$	0.865
	$β_{111}$	−3057.409		$β_{311}$	−510.292
	$β_{112}$	6107.433		$β_{312}$	1015.208
	$β_{113}$	−3050.878		$β_{313}$	−505.678
	$b_{11}$	−0.677		$b_{31}$	−0.411
	$a_{011}$	1355.724		$a_{031}$	224.955
	$a_{111}$	−935.001		$a_{311}$	−156.440
	$a_{112}$	423.443		$a_{312}$	69.027
	$a_{113}$	−92.348		$a_{313}$	−15.588
$y_{2}$	$α_{02}$	−0.747	$y_{4}$	$α_{04}$	−0.482
	$α_{21}$	0.393		$α_{41}$	0.969
	$β_{211}$	−619.153		$β_{411}$	−684.712
	$β_{212}$	1235.608		$β_{412}$	1363.397
	$β_{213}$	−616.769		$β_{413}$	−679.163
	$b_{21}$	−0.180		$b_{41}$	−0.456
	$a_{021}$	274.146		$a_{041}$	302.146
	$a_{211}$	−189.489		$a_{411}$	−209.894
	$a_{212}$	85.224		$a_{412}$	92.927
	$a_{213}$	−18.760		$a_{413}$	−20.891

References

Härdle, W. Applied Nonparametric Regression; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Eubank, R.L. Nonparametric Regression and Spline Smoothing, 2nd ed.; Marcel Dekker, Inc.: New York, NY, USA, 1999. [Google Scholar]
Budakçı, G.; Oruç, H. Further Properties of Quantum Spline Spaces. Mathematics 2020, 8, 1770. [Google Scholar] [CrossRef]
Du, R.; Yamada, H. Principle of Duality in Cubic Smoothing Spline. Mathematics 2020, 8, 1839. [Google Scholar] [CrossRef]
Lapshin, V. A nonparametric approach to bond portfolio immunization. Mathematics 2019, 7, 1121. [Google Scholar] [CrossRef] [Green Version]
Nurcahayani, H.; Budiantara, I.N.; Zain, I. Nonparametric Truncated Spline Regression on Modelling Mean Years Schooling of Regencies in Java. AIP Conf. Proc. 2019, 2194, 020073. [Google Scholar] [CrossRef]
Yu, W.; Yong, Y.; Guan, G.; Huang, Y.; Su, W.; Cui, C. Valuing guaranteed minimum death benefits by cosine series expansion. Mathematics 2019, 7, 835. [Google Scholar] [CrossRef] [Green Version]
Kim, J.; Hart, J.D. A Change-Point Estimator Using Local Fourier series. J. Nonparametr. Stat. 2011, 23, 83–98. [Google Scholar] [CrossRef]
Bilodeau, M. Fourier Smoother and Additive Models. Can. J. Stat. 1992, 20, 257–269. [Google Scholar] [CrossRef]
Yang, Y.; Pilanci, M.; Wainwright, M.J. Randomized Sketches for Kernels: Fast and Optimal Nonparametric Regression. Ann. Stat. 2017, 45, 991–1023. [Google Scholar] [CrossRef] [Green Version]
Zhao, G.; Ma, Y. Robust Nonparametric Kernel Regression Estimator. Stat. Probab. Lett. 2016, 116, 72–79. [Google Scholar] [CrossRef]
Kayri, M.; Zırhlıoğlu, G. Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods. Ozean J. Appl. Sci. 2009, 2, 49–54. [Google Scholar]
Syengo, C.K.; Pyeye, S.; Orwa, G.O.; Odhiambo, R.O. Local Polynomial Regression Estimator of the Finite Population Total under Stratified Random Sampling: A Model-Based Approach. Open J. Stat. 2016, 6, 1085–1097. [Google Scholar] [CrossRef] [Green Version]
Chamidah, N.; Budiantara, I.N.; Sunaryo, S.; Zain, I. Designing of Child Growth Chart Based on Multi-Response Local Polynomial Modeling. J. Math. Stat. 2012, 8, 342–347. [Google Scholar]
Opsomer, J.D.; Ruppert, D. Fitting a Bivariate Additive Model by Local Polynomial Regression. Ann. Stat. 1997, 25, 186–211. [Google Scholar] [CrossRef]
Maronge, J.M.; Zhai, Y.; Wiens, D.P.; Fang, Z. Optimal designs for spline wavelet regression models. J. Stat. Plan. Inference 2017, 184, 94–104. [Google Scholar] [CrossRef] [Green Version]
Antoniadis, A.; Bigot, J.; Sapatinas, T. Wavelet estimators in nonparametric regression: A comparative simulation study. J. Stat. Softw. 2001, 6, 1–83. [Google Scholar] [CrossRef] [Green Version]
Antoniadis, A.; Leblanc, F. Nonparametric Wavelet Regression for Binary Response. Statistics 2000, 34, 183–213. [Google Scholar] [CrossRef]
Sifriyani; Budiantara, I.N.; Kartiko, S.H.; Gunardi. Evaluation of Factors Affecting Increased Unemployment in East Java Using NGWR-TS Method. Int. J. Sci. Basic Appl. Res. 2019, 46, 123–142. [Google Scholar]
Sifriyani; Kartiko, S.H.; Budiantara, I.N.; Gunardi. Development of Nonparametric Geographically Weighted Regression using Truncated Spline Approach. Songklanakarin J. Sci. Technol. 2018, 40, 909–920. [Google Scholar]
Budiantara, I.N.; Ratnasari, V.; Ratna, M.; Zain, I. The Combination of Spline and Kernel Estimator for Nonparametric Regression and Its Properties. Appl. Math. Sci. 2015, 9, 6083–6094. [Google Scholar] [CrossRef]
Ratnasari, V.; Budiantara, I.N.; Ratna, M.; Zain, I. Estimation of Nonparametric Regression Curve using Mixed Estimator of Multivariable Truncated Spline and Multivariable Kernel. Glob. J. Pure Appl. Math. 2016, 12, 5047–5057. [Google Scholar]
Hidayat, R.; Budiantara, I.N.; Otok, B.W.; Ratnasari, V. The regression curve estimation by using mixed smoothing spline and kernel (MsS-K) model. Commun. Stat. Theory Methods 2020. [Google Scholar] [CrossRef]
Budiantara, I.N.; Ratnasari, V.; Ratna, M.; Wibowo, W.; Afifah, N.; Rahmawati, D.P.; Octavanny, M.A.D. Modeling Percentage of Poor People In Indonesia Using Kernel And Fourier Series Mixed Estimator In Nonparametric Regression. Rev. Investig. Operacional. 2019, 40, 538–550. [Google Scholar]
Nurcahayani, H.; Budiantara, I.N.; Zain, I. The Semiparametric Regression Curve Estimation by Using Mixed Truncated Spline and Fourier Series Model. AIP Conf. Proc. 2021, 2329, 060025. [Google Scholar] [CrossRef]
Mariati, N.P.A.M.; Budiantara, I.N.; Ratnasari, V. Combination Estimation of Smoothing Spline and Fourier Series in Nonparametric Regression. J. Math. 2020, 2020, 1–10. [Google Scholar] [CrossRef]
Sudiarsa, I.W.; Budiantara, I.N.; Suhartono; Purnami, S.W. Combined Estimator Fourier Series and Spline Truncated in Multivariable Nonparametric Regression. Appl. Math. Sci. 2015, 9, 4997–5010. [Google Scholar] [CrossRef]
Octavanny, M.A.D.; Budiantara, I.N.; Kuswanto, H.; Rahmawati, D.P. Nonparametric Regression Model for Longitudinal Data with Mixed Truncated Spline and Fourier Series. Abstr. Appl. Anal. 2020, 2020, 1–11. [Google Scholar] [CrossRef]
Caldeira, J.F.; Gupta, R.; Torrent, H.S. Forecasting U.S. Aggregate Stock Market Excess Return: Do Functional Data Analysis Add Economic Value? Mathematics 2020, 8, 2042. [Google Scholar] [CrossRef]
Correa-Quezada, R.; Cueva-Rodríguez, L.; Álvarez-García, J.; del Río-Rama, M.d.l.C. Application of the Kernel Density Function for the Analysis of Regional Growth and Convergence in the Service Sector through Productivity. Mathematics 2020, 8, 1234. [Google Scholar] [CrossRef]
Green, P.J.; Silverman, B.W. Nonparametric Regression and Generalized Linear Models, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 1993. [Google Scholar]
Wang, Y.; Guo, W.; Brown, M.B. Spline smoothing for bivariate data with applications to association between hormones. Stat. Sin. 2000, 10, 377–397. [Google Scholar]
Wahba, G. Spline Models for Observational Data; SIAM Society For Industrial And Applied Mathemathics: Philadelphia, PA, USA, 1990. [Google Scholar]
Craven, P.; Wahba, G. Smoothing noisy data with spline functions—Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 1978, 31, 377–403. [Google Scholar] [CrossRef]
United Nations Development Programme. Human Development Report 1990; Oxford University Press: Jericho, NY, USA, 1990. [Google Scholar]
BPS—Statistics Indonesia. Indeks Pembangunan Manusia 2018; BPS—Statistics Indonesia: Jakarta, Indonesia, 2019. [Google Scholar]
BPS—Statistics of Jawa Timur Province. Provinsi Jawa Timur Dalam Angka 2019; BPS—Statistics of Jawa Timur Province: Jawa Timur, Indonesia, 2019. [Google Scholar]
Rahayu, A.; Purhadi; Sutikno; Prastyo, D.D. Multivariate Gamma Regression: Parameter Estimation, Hypothesis Testing, and Its Application. Symmetry 2020, 12, 813. [Google Scholar] [CrossRef]
Sofilda, E.; Hermiyanti, P.; Hamzah, M.Z. Determinant Variable Analysis of Human Development Index in Indonesia (Case For High And Low Index At Period 2004–2013). OIDA Int. J. Sustain. Dev. 2015, 8, 11–28. [Google Scholar]
Dewanti, P.; Budiantara, I.N.; Rumiati, A.T. Modelling of SDG’s Achievement in East Java Using Bi-responses Nonparametric Regression with Mixed Estimator Spline Truncated and Kernel. J. Phys. Conf. Ser. 2020, 1562, 012016. [Google Scholar] [CrossRef]

Figure 1. Partial scatterplot between the response variable and each predictor variable, which represents (a) a truncated spline function and (b) a Fourier series function.

Figure 2. Partial scatterplot between the four indicators of the Human Development Index (HDI) with two predictor variables: (a) population density and (b) percentage of people living in poverty.

Figure 3. Comparison between actual and fitted values.

Table 1. Summary of statistical results for n = 100 with error variances, the number of knots, and oscillation parameters.

Variance	Number of Oscillations	Number of Knots	Generalized Cross-Validation (GCV)	R²	Mean Square Error (MSE)
0.1	1	1	7.050	91.866	6.817
		2	6.996	92.100	6.719
		3	6.525	92.991	5.853
	2	1	6.413	92.964	5.875
		2	6.391	93.072	5.784
		3	6.386	93.167	5.704
	3	1	7.017	91.876	6.785
		2	6.435	93.066	5.789
		3	6.098 *	93.202	5.675
0.5	1	1	6.915	92.454	6.686
		2	6.759	92.716	6.492
		3	6.634	93.023	6.328
	2	1	8.183	91.121	7.913
		2	6.720	92.757	6.454
		3	6.383	93.671	5.640
	3	1	6.145	93.645	5.663
		2	6.260	93.660	5.650
		3	6.381	93.673	5.638
1.0	1	1	6.408	94.856	5.905
		2	6.479	94.906	5.847
		3	6.582	94.933	5.816
	2	1	6.358	94.895	5.860
		2	6.417	94.955	5.791
		3	6.490	95.004	5.734
	3	1	6.358	94.895	5.860
		2	6.417	94.955	5.791
		3	6.490	95.004	5.734

* The smallest value of GCV.

Table 2. Summary of statistical results for the case study.

Model 1	Combined Truncated Spline and Fourier Series Estimator for Multiresponse Nonparametric Regression
	Number of Oscillations	Number of Knots	GCV	MSE
	1	1	1.43056	1.16177
		2	1.45686	1.14882
		3	1.41989	1.07056
	2	1	1.43464	1.11886
		2	1.46877	1.07960
		3	1.43883	1.07192
	3	1	2.10749	1.91785
		2	1.44040	1.13633
		3	1.39189 *	1.08377
Model 2	Truncated Spline Estimator for Multiresponse Nonparametric Regression
	Number of Knots		GCV	MSE
	1		1.478036	1.114668
	2		1.522833	1.013464
	3		1.586664	1.055945
Model 3	Fourier Series Estimator for Multiresponse Nonparametric Regression
	Number of Oscillations		GCV	MSE
	1		1.524463	1.149681
	2		1.536081	1.089298
	3		1.578724	1.05066

* The smallest value of GCV.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nurcahayani, H.; Budiantara, I.N.; Zain, I. The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression. Mathematics 2021, 9, 1141. https://doi.org/10.3390/math9101141

AMA Style

Nurcahayani H, Budiantara IN, Zain I. The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression. Mathematics. 2021; 9(10):1141. https://doi.org/10.3390/math9101141

Chicago/Turabian Style

Nurcahayani, Helida, I Nyoman Budiantara, and Ismaini Zain. 2021. "The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression" Mathematics 9, no. 10: 1141. https://doi.org/10.3390/math9101141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression

Abstract

1. Introduction

2. Materials and Methods

2.1. Multiresponse Nonparametric Regression, Truncated Spline Function, and Fourier Series Function

2.2. Penalized Weighted Least Square

3. Results

3.1. Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression

3.2. Estimation of Error Variance–Covariance Matrix

3.3. Smoothing Parameter Selection

3.4. Simulation Study

3.5. Data Application

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

Appendix H

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI