Next Article in Journal
On Miller–Ross-Type Poisson Distribution Series
Previous Article in Journal
Boundary Value Problem for a Loaded Pseudoparabolic Equation with a Fractional Caputo Operator
Previous Article in Special Issue
Discriminating among Generalized Exponential, Weighted Exponential and Weibull Distributions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Ratio Test for Mean Changes in Time Series with Heavy-Tailed AR(p) Noise Based on Multiple Sampling Methods

1
School of Mathematical Science, Huaibei Normal University, Huaibei 235099, China
2
Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an 710060, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(18), 3988; https://doi.org/10.3390/math11183988
Submission received: 22 August 2023 / Revised: 14 September 2023 / Accepted: 16 September 2023 / Published: 20 September 2023
(This article belongs to the Special Issue Current Developments in Theoretical and Applied Statistics)

Abstract

:
This paper discusses the problem of the mean changes in time series with heavy-tailed AR(p) noise. Firstly, it proposes a modified ratio-type test statistic, and the results show that under the null hypothesis of no mean change, the asymptotic distribution of the modified statistic is a functional of Lévy processes and the consistency under the alternative hypothesis is obtained. However, a heavy-tailed index exists in the asymptotic distribution and is difficult to estimate. This paper uses bootstrap sampling, jackknife sampling, and subsampling to approximate the distribution under the null hypothesis, and obtain more accurate critical values and empirical power. In addition, some results from a small simulation study and a practical example give an idea of the finite sample behavior of the proposed statistic.
MSC:
62E20; 62D05; 62P05; 62P20

1. Introduction and Statistical Framework

The problem of detecting change points in real data series has attracted attention due to the heterogeneities of real data series. The aim of change point detection and estimation is to partition data sequences into multiple homogeneous segments, and the theories have been applied in a variety of fields such as finance [1], medicine [2], the environment [3], and so on. Traditionally, change point problems are generally described as typical hypothesis testing problems. What most scholars are interested in is testing the null hypothesis that all observations are samples from distributions with equal means, then, deriving the distributions of the statistics (see [4,5,6,7,8]). A similar reasoning holds for other change point problems, such as variance change points [9].
In the past 40 years, new models have continuously been established for more precise descriptions of financial data. Many empirical studies have shown that the phenomenon of heavy tails frequently exists in economic and financial sequences, but most previous studies have been based on finite variance (see Guillaume et al. [10] and Mittnik and Rachev [11]), and there have been few simulations using infinite variance processes. In the case of infinite variance, there will be more factors to consider and the problem will be more complex. Qin et al. [12] proposed a modified ratio test statistic that can effectively detect changes in the second half of the observed values, and established the asymptotic properties under the null and alternative hypotheses. Then, Jin et al. [13] also proposed a modified ratio statistic to test the possible trend term changes in the sequences and, based on the subsampling method, more accurate critical values were obtained.
In the change point problems of heavy-tailed sequences, due to the influence of extreme values in heavy-tailed sequences, it is generally not possible to directly locate the critical value of the test statistic. Therefore, this paper proposes using multiple resampling methods to approximate the distribution of the original statistic and obtain accurate critical values and power, then comparing and analyzing the performance of these sampling methods.
As for the error terms in some models, many scholars default to the error terms being independently and identically distributed. As Aue and Horváth [14] commented in a review paper, many methods were initially developed for independent observation. But the concept of simple independent observation models is very narrow, and most real sequences do not satisfy such models. Therefore, we would like to consider more general weakly dependent cases, such as the autoregressive model. The latest research and applications can be found in [15,16,17], and this classic model will not be introduced too much here.
So, this paper discusses the problem of mean changes in time series with heavy-tailed AR(p) errors. More specifically, observations X 1 , X 2 , , X n , which conform to the following structure:
X t = μ + e t , t = p + 1 , p + 2 , , n ,
e t = ρ 1 e t 1 + ρ 2 e t 2 + + ρ p e t p + η t ,
where μ is a constant, ρ 1 , ρ 2 , are coefficients, and η t is a heavy-tailed sequence.
Remark 1.
A heavy-tailed distribution is a special type of distribution in statistics, where the probability of the tail (i.e., extreme case) is greater than that of a normal distribution. This means that in actual observational data, the distribution is more likely to exhibit extreme and far off average values than a normal distribution.
There are two main characteristics of heavy-tailed sequences: firstly, the probability decay rate of the tail is slow; and secondly, the variance of the tail may be infinite. Common heavy-tailed distributions include the Pareto distribution, the Cauchy distribution, the t-distribution, etc. These characteristics will be implied in the assumptions in the next section.
The rest of the paper is arranged as follows: The main ideas for constructing the test statistic are detailed in Section 2. The main results are presented in Section 3. A small simulation study under different parameters is provided in Section 4. A real example is provided in Section 5. Section 6 contains the conclusions and outlooks.

2. Main Ideas

Assumption 1.
All the characteristic roots of ρ ( z ) = 1 ρ 1 z ρ 2 z 2 ρ p z p = 0 lie outside the unit circle.
Assumption 2.
η t lies in the domain of attraction of a stable law with a heavy-tailed index κ ( 1 , 2 ) , and E η t = 0 .
Remark 2.
Assumptions 1 and 2 guarantee that the heavy-tailed AR(p) sequence is smooth and has infinite variance; they are necessary underlying assumptions.
In addition, Assumption 2 implies
n P ( | η t | > a n x ) x κ ,
where a n = inf { x : P ( | η t | > x ) n 1 } and
lim x P ( η t > x ) P ( | η t | > x ) = q ( 0 , 1 ) .
It is not difficult to find that, by combining Equations (3) and (4), there is a constant b n such that
a n 1 t = 1 n ( η t b n ) d S k ,
where S k is a stable random variable. In addition, it can be verified that in the special cases of κ = 2 and κ = 1 , S k is a Gaussian and Cauchy distribution, respectively. Here, if we assume that b n = 0 , it implies that E η t = 0 in Assumption 2. As a special case, if η t is an independent and identically distributed sequence, Kokoszka [18] obtained the following:
a n 1 t = 1 n r η t , a n 2 t = 1 n r η t 2 d L 1 ( r ) , L 2 ( r ) ,
where L 1 ( r ) and L 2 ( r ) are κ-stable and κ / 2 -stable Lévy processes in the space D [ 0 , 1 ] with the Skorohod topology, respectively. It is worth noting that a n can be rewritten as a n = n 1 / k L ( n ) ; L ( · ) is a slowly changing function.
Since the characteristic roots are outside the unit circle, then B–N (Beveridge–Nelson) decomposition can be used to rewrite the partial sum of e t as
a n 1 t = 1 n r e t = D ( 1 ) a n 1 t = 1 n r η t + a n 1 ( η ~ 0 η ~ n r ) ,
where D ( L ) = ( 1 ρ 1 L ρ 2 L 2 ρ p L p ) 1 : = j = 0 p d j L j , η ~ t : = j = 0 ( s = j + 1 d s ) η t j .
Remark 3.
L 1 ( · ) is a stationary process that can be expressed as:
L 1 ( v ) = j = 1 δ j Γ j 1 / κ I ( U j v ) , κ ( 1 , 2 ) , W ( v ) , κ = 2 .
where W ( v ) is a standard Brownian motion. { U j } is a uniformly distributed random variable independently and identically distributed on the interval [ 0 , 1 ] , { δ j } is an independently and identically distributed random variable sequence with P ( δ j = 1 ) = p ; P ( δ j = 1 ) = 1 p . Γ 1 , Γ 2 , ⋯ are the arrival times of the Poisson process with the Lebesgue measure, and { U j , δ j , Γ j } are independent of each other.
Observation X t can be rewritten as
X t = μ + Δ 1 I ( k * < t < n ) + e t ,
where e t is defined by (1), μ is the mean, k * is the time point of abrupt change, and I ( · ) is characteristic function. The null hypothesis for testing the change is
H 0 : k * = n ,
against the alternative hypothesis
H 1 : k * < n .
Shao [19] proposed a ratio test to detect the mean change point.
Ξ = max n v 1 k n v 2 R ( k ) ,
R ( k ) = n X ¯ 1 , k X ¯ k + 1 , n n 1 / 2 i = 1 k j = 1 i ( X j X ¯ 1 , k ) 2 + i = k + 1 n j = 1 i ( X j X ¯ k + 1 , n ) 2 1 / 2 ,
where X ¯ m , n = ( n m + 1 ) 1 t = m n X t .
When dealing with the mean change point problem for heavy-tailed sequences, it is common to choose to intercept some part of the sequence for processing. Often, it is desirable to retain a large portion of the data while effectively minimizing the effect of extreme values. So, the truncated parameters are set to ( v 1 , v 2 ) = ( 0.2 , 0.8 ) .
The statistic is not actually directly substitutable into the change point model presented in this paper. The reason is the limiting distribution of a n 1 t = 1 n r e t is not available under the case of H 0 . To solve this resistance, we suggest using η t instead of X t and re-establishing the test statistics. Because of the heavy-tail feature of η t , the new test statistics are based on the residual error η ^ t . The improved ratio tests are as follows:
Ξ 1 = max n v 1 k n v 2 R 1 ( k ) ,
R 1 ( k ) = t = p + 1 k ( η ^ 0 , t η ^ ¯ 0 , ( p + 1 , n ) ) n 1 / 2 i = p + 1 k t = p + 1 i ( η ^ 1 , t η ^ ¯ 1 , ( p + 1 , k ) ) 2 + i = k + p + 1 n t = k + p + 1 i ( η ^ 2 , t η ^ ¯ 2 , ( k + p + 1 , n ) ) 2 1 / 2 ,
where η ^ ¯ i , ( m , n ) = ( n m + 1 ) 1 t = m n η ^ i , t . The process of obtaining η ^ 0 , t is not difficult. First, use the regression of X t on the intercept to calculate the ordinary least squares residual e ^ 0 , t . Then, repeating the same method, calculate the residual η ^ 0 , t from the regression of e ^ 0 , t on e ^ 0 , t j , where j = 1 , , p , t = p + 1 , , n . Similarly, η ^ 1 , t and η ^ 2 , t can be obtained in { X t } t = p + 1 k and { X t } t = k + 1 n , respectively.

3. Asymptotic Results

This section gives the limit distribution of the test statistic under the null hypothesis and the consistency under the alternative hypothesis. Let v = k / n , s = i / n , v * = k * / n , then we obtain Theorem 1:
Theorem 1.
Let X t be defined by (9). If the null hypothesis is true and X t satisfies Assumptions 1 and 2, when n ,
Ξ 1 d sup v 1 * v v 2 * | V ( v ; 0 , 1 ) | 0 v V 2 ( s ; 0 , v ) d s + v 1 V 2 ( s ; v , 1 ) d s 1 / 2 ,
V ( a ; b , c ) is defined as: V ( a ; b , c ) = L 1 ( a ) L 1 ( b ) ( a b ) ( c b ) 1 ( L 1 ( c ) L 1 ( b ) ) , where 0 b a c 1 .
Proof of Theorem 1.
Considering a prior claim about the first term in the denominator. First, using ordinary least squares (OLS), calculate the residual e ^ 1 , t = X t k 1 t = 1 k X t = e t k 1 t = 1 k e t : = e t e ˙ according to the expression of X t in (9). Therefore, we can rewrite e ^ 1 , t .
e ^ 1 , t = e t e ˙ = ρ 1 e ^ 1 , t 1 + + ρ p e ^ 1 , t p + ξ ^ t ,
where ξ ^ t = ( ρ 1 + + ρ p 1 ) e ˙ + η t , t = p + 1 , ⋯, k.
Let θ = ( ρ 1 , ρ 2 , , ρ p ) , ξ ^ = ( ξ ^ p + 1 , , ξ ^ k ) ,
e ^ = e ^ 1 , p e ^ 1 , p 1 e ^ 1 , p 2 e ^ 1 , 1 e ^ 1 , p + 1 e ^ 1 , p e ^ 1 , p 1 e ^ 1 , 2 e ^ 1 , p + 2 e ^ 1 , p + 1 e ^ 1 , p e ^ 1 , 3 e ^ 1 , k 1 e ^ 1 , k 2 e ^ 1 , k 3 e ^ 1 , k p .
At this point, we let G = ( e ^ 1 , p + 1 , e ^ 1 , p + 2 , , e ^ 1 , k ) . Then, combined with (10), we can obtain
G = e ^ θ + ξ ^ .
Therefore, using OLS again we can obtain
θ ^ θ = ( e ^ e ^ ) 1 e ^ ξ ^ .
where e ^ ξ ^ = t = p k 1 e ^ 1 , t ξ ^ t + 1 , t = p k 2 e ^ 1 , t ξ ^ t + 2 , , t = p k p e ^ 1 , t ξ ^ t + p and
e ^ e ^ = t = p k 1 e ^ 1 , t e ^ 1 , t t = p k 1 e ^ 1 , t e ^ 1 , t 1 t = p k 1 e ^ 1 , t e ^ 1 , t 2 t = p k 1 e ^ 1 , t e ^ 1 , t p + 1 t = p 1 k 2 e ^ 1 , t e ^ 1 , t + 1 t = p 1 k 2 e ^ 1 , t e ^ 1 , t t = p 1 k 2 e ^ 1 , t e ^ 1 , t 1 t = p 1 k 2 e ^ 1 , t e ^ 1 , t p + 2 t = p 2 k 3 e ^ 1 , t e ^ 1 , t + 2 t = p 2 k 3 e ^ 1 , t e ^ 1 , t + 1 t = p 2 k 3 e ^ 1 , t e ^ 1 , t t = p 2 k 3 e ^ 1 , t e ^ 1 , t p + 3 t = 1 k p e ^ 1 , t e ^ 1 , t + p 1 t = 1 k p e ^ 1 , t e ^ 1 , t + p 2 t = 1 k p e ^ 1 , t e ^ 1 , t + p 3 t = 1 k p e ^ 1 , t e ^ 1 , t .
Phillips proved the following fact in Theorem 3.19 of [20]: For all j, a n 1 t = 1 n · e t = O p ( 1 ) and a n 2 t = 1 n · e t e t j = O p ( 1 ) are valid. After basic calculation, we can obtain
e ˙ = O p ( a n n 1 )
and
t = p k 1 e ^ 1 , t e ^ 1 , t j = t = p k 1 ( e t e ˙ ) ( e t j e ˙ ) = O p ( a n 2 ) + O p ( a n 2 n 1 ) = O p ( a n 2 ) , j = 0 , , p 1 .
Other elements in the matrix e ^ e ^ can be treated similarly, and their convergence rate is O p ( a n 2 ) . Therefore, we will only consider the convergence rate of the first element in matrix e ^ ξ ^ . Using B–N decomposition, e ^ 1 , t can be rewritten as e ^ 1 , t = s = 0 d s ξ ^ 1 , t s , where s = 0 d s < .
Combining ξ ^ t = ( ρ 1 + + ρ p ) e ˙ t + η t , then we can obtain that
t = p k 1 e ^ 1 , t ξ ^ t + 1 = t = p k 1 s = 0 d s ξ ^ t s ξ ^ t + 1 = s = 0 d s t = p k 1 η t s η t + 1 + O p ( a n 2 n 1 ) = O p ( a n ) .
Owing to η t + 1 being independent of η t s , η t + 1 η t s D ( κ ) , i.e., t = p k 1 η t + 1 η t s = O p ( a n ) . Let
A = a n 1 a n 1 a n 1 ,
Combining (12) and (13),
θ ^ θ = A ( A e ^ e ^ A ) 1 A e ^ ξ ^ = [ O p ( a n 1 ) , O p ( a n 1 ) , , O p ( a n 1 ) ] .
Then, for t = p + 1 , , k ,
η ^ 1 , t = e ^ 1 , t ( ρ ^ 1 e ^ 1 , t 1 + ρ ^ 2 e ^ 1 , t 2 + + ρ ^ p e ^ 1 , t p ) = l = 1 p ( ρ l ρ ^ l ) e ^ 1 , t l + ( ρ 1 + ρ 2 + + ρ p 1 ) e ˙ + η t .
It is not difficult to see that
t = 1 [ n · ] e ^ 1 , t = t = 1 [ n · ] ( e t + e ˙ ) = O p ( a n ) .
Let i = [ n s ] , k = [ n v ] , where 0 < s , v < 1 . Due to the independence of η t and Assumption 2, we combine (15), (16) and (17), then
a n 1 t = p + 1 i ( η ^ 1 , t η ^ ¯ 1 , ( p + 1 , k ) ) = a n 1 t = p + 1 i η t ( i p ) ( k p ) 1 a n 1 t = p + 1 k η t + o p ( 1 ) d L 1 ( s ) s v 1 L 1 ( v )
Next, we consider the numerator of R 1 ( k ) . Let k = n , using the same proof method the following fact can be obtained:
a n 1 t = p + 1 i ( η ^ 0 , t η ^ ¯ 0 , ( p + 1 , n ) ) d L 1 ( s ) s L 1 ( 1 ) .
Finally, deal with the second term in the denominator of R 1 ( k ) . After simple processing, it can be obtained that
η ^ 2 , t = s = 1 p ( ρ s ρ s ^ ) e ^ 2 , t s + ( ρ 1 + ρ 2 + + ρ p 1 ) e ¨ + η t ,
where e ¨ = ( n k ) 1 t = k + 1 n e t , e ^ 2 , t s = e t s e ¨ . It is not difficult to see that from X k + 1 to X n the limiting distribution of (15) is still valid. Since t = 1 [ n · ] e ^ 2 , t = t = 1 [ n · ] ( e t + e ¨ ) = O p ( a n ) , we obtain that
η ^ 2 , t η ^ ¯ 2 , ( k + p + 1 , n ) = η t ( n k ) 1 j = k + p + 1 n η j + O p ( a n 1 ) .
At this time, for i > k + p + 1 , we can easily obtain that
a n 1 t = k + p + 1 i ( η ^ 2 , t η ^ ¯ 2 , ( k + p + 1 , n ) ) = a n 1 t = 1 i ( η ^ 2 , t η ^ ¯ 2 , ( k + p + 1 , n ) ) a n 1 t = 1 k + p ( η ^ 2 , t η ^ ¯ 2 , ( k + p + 1 , n ) ) + O p ( n a n 2 ) d L 1 ( s ) L 1 ( v ) ( s v ) ( 1 v ) 1 [ L 1 ( 1 ) L 1 ( v ) ] .
So, combined with (18)−(22), it can be obtained that
Ξ 1 d sup v 1 * v v 2 * | V ( v ; 0 , 1 ) | 0 v V 2 ( s ; 0 , v ) d s + v 1 V 2 ( s ; v , 1 ) d s 1 / 2 ,
the proof of Theorem 1 is completed. □
Theorem 2.
Let X t be defined by (9). If the alternative hypothesis is true and X t satisfies Assumptions 1 and 2, then, Ξ 1 = R 1 ( k * ) . When n ,
n 1 a n Ξ 1 d v * ( 1 v * ) ( ρ 1 + + ρ p 1 ) | Δ 1 | 0 v * V 2 ( s ; 0 , v ) d s + v * 1 V 2 ( s ; v * , 1 ) d s 1 / 2 .
Proof of Theorem 2.
We consider the case of k * < k . The residuals e ^ 1 , t can be written as
e ^ 1 , t = X t 1 k t = 1 k X t = Δ 1 I ( t > k * ) k k * k + e t e ˙ , t = 1 , 2 , , p .
We can define that S 1 , t = Δ 1 I ( t > k * ) k k * k and S 2 , t = e t e ˙ . Since t = l k + l p 1 S 1 , t S 1 , t j = O p ( n ) , we combine the proof of Theorem 1 for all j, then
t = l k + l p 1 e ^ 1 , t e ^ 1 , t j = t = l k + l p 1 ( S 1 , t + S 2 , t ) ( S 1 , t j + S 2 , t j ) = O p ( a n 2 ) , l = 1 , 2 , , p ,
then, we substitute (2) into (23) and obtain that
e ^ 1 , t = ρ 1 e ^ 1 , t 1 + + ρ p e ^ 1 , t p + ϕ ^ t ,
where ϕ ^ t = ( ρ 1 + + ρ p 1 ) e ˙ + ( S 1 , t ρ 1 S 1 , t 1 ρ p S 1 , t p ) + η t . Since the second term in ϕ ^ t is not a random variable, the following result can be obtained by using B–N decomposition again:
t = p k 1 e ^ 1 , t ϕ ^ t + 1 = t = p k 1 q = 0 d q ϕ ^ t + 1 ϕ ^ t q = q = 0 d q t = p k 1 ( S 1 , t + 1 ρ 1 S 1 , t ρ p S 1 , t + 1 p ) ( S 1 , t q ρ 1 S 1 , t 1 q ρ p S 1 , t q p ) + O p ( a n ) = O p ( n ) .
Let
N = n n n ,
According to the proof of Theorem 1, (24), and (25), we can obtain that
θ ^ θ = N ( N e ^ e ^ A ) 1 A N N 1 e ^ ξ ^ = O p ( a n 2 n ) , O p ( a n 2 n ) , , O p ( a n 2 n ) .
So, for t = p + 1 , p + 2 , , k ,
η ^ 1 , t = e ^ 1 , t ( ρ ^ 1 e ^ 1 , t 1 + + ρ ^ p e ^ 1 , t p ) = l = 1 p ( ρ l ρ ^ l ) e ^ 1 , t l + ( ρ 1 + + ρ p 1 ) e ˙ + ( S 1 , t ρ 1 S 1 , t 1 ρ p S 1 , t p ) + η t .
Then, for p + 1 i k ,
t = p + 1 i ( η ^ 1 , t η ^ ¯ 1 , ( p + 1 , k ) ) = t = p + 1 i ( S 1 , t ρ p S 1 , t p ) i p k p t = p + 1 k ( S 1 , t ρ p S 1 , t p ) + O p ( a n ) = O p ( n ) .
Then,
t = p + 1 k ( η ^ 0 , t η ^ ¯ 0 , ( p + 1 , n ) ) = O p ( n )
and
t = k + p + 1 i ( η ^ 2 , t η ^ ¯ 2 , ( k + p + 1 , n ) ) = O p ( a n )
can be proved in the same way. (28)−(30) imply Ξ 1 = O p ( 1 ) . If k < k * , Ξ = O p ( 1 ) can be equally concluded. Finally, we let k = k * , multiply the denominator of R 1 ( k ) by a n 1 , and use the proof of Theorem 1 again, thus we obtain
a n 1 n 1 / 2 i = p + 1 k * t = p + 1 i ( η ^ 1 , t η ^ ¯ 1 , ( p + 1 , k * ) ) 2 + i = k * + p + 1 n t = k * + p + 1 i ( η ^ 2 , t η ^ ¯ 2 , ( k * + p + 1 , n ) ) 2 1 / 2 d 0 v * V 2 ( s ; 0 , v 1 * ) d s + v * 1 V 2 ( s ; v * , 1 ) d s 1 / 2 ,
It is not difficult to find that (28) and (30) imply the limit distribution of the molecular part:
n 1 t = p + 1 k * ( η ^ 0 , t η ^ ¯ 0 , ( p + 1 , n ) ) d v * ( 1 v * ) Δ 1 ( ρ 1 + + ρ p 1 ) .
Hence, (31) and (32) imply Theorem 2. □

4. Simulations

In this section, we conduct some simulations to verify the effectiveness of our ratio test. Due to the representativeness of first-order autoregressive models, we consider a first-order autoregressive model:
X t = μ + e t , 1 < t k 1 * , μ + Δ 1 + e t , k 1 * + 1 t < n .
where e t = ρ e t 1 + η t . Without losing generality, set the significance level α = 0.05 , μ = 0 . The results are as follows:
Comparing Figure 1, Figure 2, Figure 3 and Figure 4, it is not difficult to find that Ξ 1 has a relatively stable empirical size and a high empirical power, but when the change point appears in the latter half of the observation sequence the empirical power is not satisfactory.
It is not difficult to find that the heavy-tailed index κ is in the limit distribution. Although some scholars have proposed a series of methods to estimate κ (see [21,22]), their effectiveness is not satisfactory. Therefore, in order to avoid estimating κ , we recommend using bootstrap sampling to approximate the limit distribution of the statistic under the null hypothesis, so as to obtain an accurate critical value. The specific steps are as follows:
Step 1. Use the regression of X t on the intercept to calculate the ordinary least squares (OLS) residual e ^ j , and then calculate the OLS residual η ^ j from the regression of e ^ j on e ^ j 1 .
Step 2. Compute the centered residuals
η t 0 = η ^ t + 1 ( n 1 ) 1 j = 2 n η ^ j ,
η t 1 = η ^ t + 1 ( k 1 ) 1 j = 2 k η ^ j ,
and
η t 2 = η ^ t + 1 ( n k 1 ) 1 j = k + 2 n η ^ j .
Step 3. For a fixed number m < n , we extract bootstrap samples η ˇ 2 , , η ˇ m from η 2 0 , ⋯, η n 0 , η ~ 2 , , η ~ k from η 2 1 , ⋯, η k 1 , and η ~ k + 2 , , η ~ m from η k + 2 2 , η k + 3 2 , ⋯, η n 1 .
Step 4. Constructing the bootstrap statistic:
Ξ ~ 1 * = max m v 1 k m v 2 R ~ 1 ( k ) ,
R ~ 1 ( k ) = j = 2 k ( η ˇ j η ˇ ¯ m ) m 1 / 2 i = 2 k j = 2 i ( η ~ j η ~ ¯ k ) 2 + i = k + 2 m j = k + 2 i ( η ~ j η ~ ~ k ) 2 1 / 2 ,
where η ˇ ¯ m = ( m 1 ) 1 j = 2 m η ˇ j , η ~ ¯ k = ( k 1 ) 1 j = 2 k η ~ j , and η ~ ~ k = ( m k 1 ) 1 j = k + 2 m η ~ j .
Step 5. Repeat steps 3 and 4 1000 times, then we obtain a set of statistics { Ξ ~ 1 * 1 , , Ξ ~ 1 * 1000 } .
Step 6. Calculate the α -quantile Ξ ~ 1 * ( α ) of { Ξ ~ 1 * 1 , , Ξ ~ 1 * 1000 } . If Ξ 1 > Ξ ~ 1 * ( α ) , then reject the null hypothesis.
Remark 4.
Choosing a suitable m is very difficult, but Mcmurry et al. [23] provided an ideal choice for controlling the empirical power: m = [ 4 n log n ] . Therefore, it is necessary to use this numerical value for the simulation experiments.
As we guessed, Figure 5 shows a satisfactory result, especially for the case of ρ = 0.5 , where the increase in empirical power is significant, and this indicates that the position of the change point has a small impact on the newly constructed statistic.
To expand the comparison, the jackknife method is used, the steps of which are as follows.
Step 1. Use the regression of X t on the intercept to calculate the ordinary least squares (OLS) residual e ^ j , and then calculate the OLS residual η ^ j from the regression of e ^ j on e ^ j 1 .
Step 2. Compute the centered residuals
η t 0 = η ^ t + 1 ( n 1 ) 1 j = 2 n η ^ j ,
η t 1 = η ^ t + 1 ( k 1 ) 1 j = 2 k η ^ j ,
and
η t 2 = η ^ t + 1 ( n k 1 ) 1 j = k + 2 n η ^ j .
Step 3. We extract jackknife samples η ˙ 2 , , η ˙ n 1 from η 2 0 , ⋯, η n 0 , η ¨ 2 , , η ¨ k 1 from η 2 1 , ⋯, η k 1 , and η ... k + 2 , , η ... n 1 from η k + 2 2 , η k + 3 2 , ⋯, η n 2 .
Step 4. Constructing the jackknife statistic:
Ξ ~ 2 = max n v 1 k n v 2 R 2 ,
R 2 = j = 2 k ( η ˙ j η ˙ ¯ n 1 ) n 1 / 2 i = 2 k j = 2 i ( η ¨ j η ¨ ¯ k ) 2 + i = k + 2 n 1 j = k + 2 i ( η ... j η ... ¯ k ) 2 1 / 2 ,
where η ˙ ¯ n 1 = ( n 2 ) 1 j = 2 n 1 η ˙ j , η ¨ ¯ k = ( k 2 ) 1 j = 2 k η ¨ j , and η ... ¯ k = ( n k 2 ) 1 j = k + 2 n 1 η ... j .
Step 5. Repeat steps 3 and 4 1000 times, then we obtain a set of statistics { Ξ ~ 2 1 , , Ξ ~ 2 1000 } .
Step 6. Calculate the α -quantile Ξ ~ 2 ( α ) of { Ξ ~ 2 1 , , Ξ ~ 2 1000 } . If Ξ 1 > Ξ ~ 2 ( α ) , then reject the null hypothesis.
Similarly, the power of the jackknife statistic can be obtained as shown in Figure 6.
As a classical sampling method, the limiting distribution of a statistic under the null hypothesis can also be approximated using the subsampling method to obtain more accurate critical values and power, as follows:
Step 1. Use the regression of X t on the intercept to calculate the ordinary least squares (OLS) residual e ^ j , and then calculate the OLS residual η ^ j from the regression of e ^ j on e ^ j 1 .
Step 2. Compute the centered residuals η t 0 = η ^ t + 1 ( n 1 ) 1 j = 2 n η ^ j .
Step 3. For a fixed number b < n , we extract n 1 b processes with length b, and satisfy the null hypothesis. For l = 1 , 2 , , n 1 b , the l-th process is determined by η l 0 , η l + 1 0 , , η l + b 1 0 .
Step 4. We extract Ξ b , l based on η l 0 , η l + 1 0 , , η l + b 1 0 :
Ξ b , l = max 0 v 1 R 3 ,
R 3 = b 1 / 2 j = l l + ( b 1 ) v ( η j 0 η ¯ l , l + b 1 0 ) i = l l + ( b 1 ) v j = l i ( η j 0 η ¯ l , l + ( b 1 ) v 0 ) 2 + i = l + ( b 1 ) v + 1 l + b 1 j = l + ( b 1 ) v + 1 i ( η j 0 η ¯ l + ( b 1 ) v , l + b 1 ) 2 1 / 2 ,
where η ¯ m , n 0 = ( n m 1 ) 1 j = m n η j 0 .
Step 5.  Ξ b ( α ) is the α -quantile of the empirical distributions of the n b 1 value Ξ b , l . When Ξ 1 > Ξ b ( α ) , we reject the null hypothesis.
Combining Figure 5, Figure 6 and Figure 7, it can be seen that there is a large increase in the power by reconstructing the bootstrap statistic, the jackknife statistic, and the subsampling statistic from the original. Combining the comparisons it is easy to see that the bootstrap method has the most significant effect on the increase in power and the jackknife method has the least effect. The reason for this may be because bootstrap sampling is a nonparametric statistical method, the basic idea of which is to obtain more reliable estimates of the statistical properties of the original sample by drawing a large number of subsamples from the original sample and then making statistical inferences on these subsamples. Since the original series may contain extreme values, these extreme values may have an impact on the statistical inference. The bootstrap sampling method is effective in reducing the impact of extreme values, which may occur less frequently when subsamples are drawn.
Since jackknife sampling excludes only one observation per iteration, it may not be effective in reducing the impact of extreme values or “heavy tails” for data with these points. In contrast, bootstrap sampling can better simulate the distribution of the original data by generating a number of random subsets of the original data size (via put-back sampling), especially for non-normally distributed data. The bootstrap method can be applied to any form of statistic, giving it greater flexibility in dealing with more complex statistical problems.

5. Application

We consider the closing price data of British Petroleum (BP) between February 2019 and July 2021 (783 observations). Upon observation, it is easy to see that there seems to be a point of mean change in this sequence of observations, but it is unscientific to rely on visual perception alone. Therefore, the test statistic needs to be utilized to verify that it has a change point.
According to the description in the previous section, the bootstrap method has the best performance, so we consider using the bootstrap method for this problem.
Based on John [24], Jin, et al. [13] believe that the data can be fitted as a heavy-tailed sequence with a tailed index κ = 1.732 . Under the premise that the process has infinite variance, we use the ARMA model to fit a causal AR model, and using the Bayesian information criterion, the sample partial autocorrelation function of the AR(1) model is almost zero after a first-order lag, and its autoregressive coefficient is 0.5788. Jin, et al. [25] proposed a mean change point estimator based on a heavy-tailed sequence and proved the consistency of the estimator. Based on this, we obtain the change position k * = 444 (see Figure 8). Then, the entire sequence is divided into two parts (1,444) and (445,783), with mean values μ ^ 1 = 41.23 and μ ^ 2 = 23.34 , respectively. A mean-corrected sequence is based on μ t , where μ t = X t μ ^ 1 , t = 1 , , 444 , μ t = X t μ ^ 2 , t = 445 , , 783 . It is not difficult to find that the heavy-tailed character of the original data has not changed in the corrected sequence.
For the statistic Ξ 1 , we obtain a critical value of 5.9551 (see Table 1), and the critical value 3.0627 based on bootstrap testing. By replacing with the original data, Ξ 1 = 5.4446 can be obtained. As we guessed, Ξ 1 is greater than the critical value obtained based on the bootstrap method, we reject the null hypothesis and believe that there is a change point. But Ξ 1 < 5.9951 , that is, in the case of the original statistic, we cannot consider the sequence to have a change point, so this also reflects the superiority of the bootstrap method. Therefore, this is sufficient to demonstrate the good performance of our proposed method.

6. Conclusions

This paper proposes a modified ratio test based on a heavy-tailed AR(p) model, we obtain that the asymptotic distributions of these modified statistics are functionals of Lévy processes and that there are consistencies under the alternative hypothesis. These sampling methods provide more asymptotically accurate critical values, making the final simulation results show that the ratio test based on these sampling methods have good empirical power. Meanwhile, we consider the closing price of BP, further highlighting the rationality and superiority of our proposed methods.
The disadvantages of the statistic are obvious. For example, the performance of our statistics when the position of the change point is behind is actually not satisfactory enough. Moreover, if there are multiple change points in the sequence, our statistics may not be applicable. Future research work should not only focus on the issues of mean changes, but also consider more possible situations.

Author Contributions

Methodology, T.X.; Software, T.X.; Writing—original draft, T.X.; Supervision, Y.W.; Funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, 61806155; Natural Science Foundation of Anhui Province, 1908085MF186 and Natural Science Foundation of Anhui Higher Education Institutions of China, KJ2020A0024.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors are very grateful to the editor and anonymous reviewers for their comments, which enables the authors to greatly improve this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pepelyshev, A.; Polunchenko, A. Real-time financial surveillance via quickest change-point detection methods. Stat. Interface 2017, 10, 93–106. [Google Scholar] [CrossRef]
  2. Ghosh, P.; Vaida, F. Random change point modelling of HIV immunologic responses. Stat. Med. 2007, 26, 2074–2087. [Google Scholar] [CrossRef] [PubMed]
  3. Punt, A.E.; Szuwalski, C.S.; Stockhausen, W. An evaluation of stock-recruitment proxies and environmental change points for implementing the US Sustainable Fisheries Act. Fish Res. 2014, 157, 28–40. [Google Scholar] [CrossRef]
  4. Csörgo, M.; Horváth, L. Limit Theorems in Change-Point Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1997. [Google Scholar]
  5. Horváth, L.; Hušková, M. Change-point detection in panel data. J. Time Ser. Anal. 2012, 33, 631–648. [Google Scholar] [CrossRef]
  6. Horváth, L.; Rice, G. Extensions of some classical methods in change point analysis. Test-Spain 2014, 23, 219–255. [Google Scholar] [CrossRef]
  7. Gao, M.; Ding, S.S.; Wu, S.P.; Yang, W. The asymptotic distribution of CUSUM estimator based on α-mixing sequences. Commun. Stat.-Simul. Comput. 2020, 51, 6101–6113. [Google Scholar] [CrossRef]
  8. Ding, S.; Fang, H.; Dong, X.; Yang, W. The CUSUM statistics of change-point models based on dependent sequences. J. Appl. Stat. 2021, 49, 2593–2611. [Google Scholar] [CrossRef]
  9. Xu, M.; Wu, Y.; Jin, B. Detection of a change-point in variance by a weighted sum of powers of variances test. J. Appl. Stat. 2019, 46, 664–679. [Google Scholar] [CrossRef]
  10. Guillaume, D.M.; Dacorogna, M.M.; Davé, R.R.; Müller, U.A.; Olsen, R.B.; Pictet, O.V. From the bird’s eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange markets. Financ Stoch. 1997, 1, 95–129. [Google Scholar] [CrossRef]
  11. Mittnik, S.; Rachev, S. Stable Paretian Models in Finance; Wiley: New York, NY, USA, 2000. [Google Scholar]
  12. Qin, R.B.; Yang, X.Q.; Chen, Z.S. Ratio detections for change point in heavy tailed observations. Commun. Stat.-Simul. Comput. 2019, 51, 2487–2510. [Google Scholar] [CrossRef]
  13. Jin, H.; Wang, A.; Zhang, S.; Liu, J. Subsampling ratio tests for structural changes in time series with heavy-tailed AR(p) errors. Commun. Stat.-Simul. Comput. 2022, 1–27. [Google Scholar] [CrossRef]
  14. Aue, A.; Horváth, L. Structural breaks in time series. J. Time Ser. Anal. 2013, 34, 1–16. [Google Scholar] [CrossRef]
  15. Moon, J.; Hossain, M.B.; Chon, K.H. AR and ARMA model order selection for time-series modeling with ImageNet classification. Signal Process. 2021, 183, 108026. [Google Scholar] [CrossRef]
  16. Jiang, H.; Duan, S.; Huang, L.; Han, Y.; Yang, H.; Ma, Q. Scale effects in AR model real-time ship motion prediction. Ocean Eng. 2020, 203, 107202. [Google Scholar] [CrossRef]
  17. Movahed, T.M.; Bidgoly, H.J.; Manesh, M.H.K.; Mirzaei, H.R. Predicting cancer cells progression via entropy generation based on AR and ARMA models. Int. Commun. Heat Mass. 2021, 127, 105565. [Google Scholar] [CrossRef]
  18. Kokoszka, P.; Wolf, M. Subsampling the mean of heavy-tailed dependent observations. J. Time Ser. Anal. 2004, 32, 217–234. [Google Scholar] [CrossRef]
  19. Shao, X.F. A simple test of changes in mean in the possible presence of long-range dependence. J. Time Ser. Anal. 2011, 32, 598–606. [Google Scholar] [CrossRef]
  20. Phillips, P.C.B.; Solo, V. Asymptotics for linear processes. Ann. Stat. 1992, 20, 971–1001. [Google Scholar] [CrossRef]
  21. Koedijk, K.; Schafgans, M.; de Vries, C. The tail index of exchange rate returns. J. Int. Econ. 1990, 29, 93–108. [Google Scholar] [CrossRef]
  22. Resnick, S.I. Heavy-Tail Phenomena Probabilistic and Statistical Modeling; Springer: New York, NY, USA, 2007. [Google Scholar]
  23. Mcmurry, T.L.; Politis, D.N. Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. J. Time Ser. Anal. 2010, 31, 471–482. [Google Scholar] [CrossRef]
  24. John, P.N. Numerical calculation of stable densities and distribution functions. Commun. Stat. Stoch. Models 1997, 13, 759–774. [Google Scholar]
  25. Jin, H.; Tian, Z.; Qin, R.B. Bootstrap tests for structural change with infinite variance observation. Stat. Probab. Lett. 2009, 79, 1985–1995. [Google Scholar] [CrossRef]
Figure 1. Empirical sizes of Ξ 1 with n = 200 , 500 , 800 .
Figure 1. Empirical sizes of Ξ 1 with n = 200 , 500 , 800 .
Mathematics 11 03988 g001
Figure 2. Empirical power of Ξ 1 with Δ 1 = 2 , k * = 0.3 n , and n = 200 , 500 , 800 .
Figure 2. Empirical power of Ξ 1 with Δ 1 = 2 , k * = 0.3 n , and n = 200 , 500 , 800 .
Mathematics 11 03988 g002
Figure 3. Empirical power of Ξ 1 with Δ 1 = 4 , k * = 0.3 n , and n = 200 , 500 , 800 .
Figure 3. Empirical power of Ξ 1 with Δ 1 = 4 , k * = 0.3 n , and n = 200 , 500 , 800 .
Mathematics 11 03988 g003
Figure 4. Empirical power of Ξ 1 with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Figure 4. Empirical power of Ξ 1 with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Mathematics 11 03988 g004
Figure 5. Empirical power of Ξ ~ 1 * with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Figure 5. Empirical power of Ξ ~ 1 * with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Mathematics 11 03988 g005
Figure 6. Empirical power of Ξ ~ 2 with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Figure 6. Empirical power of Ξ ~ 2 with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Mathematics 11 03988 g006
Figure 7. Empirical power of Ξ b , l with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Figure 7. Empirical power of Ξ b , l with Δ 1 = 2 , k * = 0.7 n , and n = 200 , 500 , 800 .
Mathematics 11 03988 g007
Figure 8. Closing price of BP.
Figure 8. Closing price of BP.
Mathematics 11 03988 g008
Table 1. A series of values related to the BP closing price.
Table 1. A series of values related to the BP closing price.
κ k * Critical value of Ξ 1 Ξ 1 Critical value of Ξ ~ 1 *
1.7324445.95515.44463.0627
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, T.; Wei, Y. Ratio Test for Mean Changes in Time Series with Heavy-Tailed AR(p) Noise Based on Multiple Sampling Methods. Mathematics 2023, 11, 3988. https://doi.org/10.3390/math11183988

AMA Style

Xu T, Wei Y. Ratio Test for Mean Changes in Time Series with Heavy-Tailed AR(p) Noise Based on Multiple Sampling Methods. Mathematics. 2023; 11(18):3988. https://doi.org/10.3390/math11183988

Chicago/Turabian Style

Xu, Tianming, and Yuesong Wei. 2023. "Ratio Test for Mean Changes in Time Series with Heavy-Tailed AR(p) Noise Based on Multiple Sampling Methods" Mathematics 11, no. 18: 3988. https://doi.org/10.3390/math11183988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop