Study on the Complexity Reduction of Observed Sequences Based on Different Sampling Methods: A Case of Wind Speed Data

Huai, Xiaowei; Yan, Pengcheng; Li, Li; Cai, Zelin; Xu, Xunjian; Hu, Xiaohui

doi:10.3390/atmos13111746

Open AccessArticle

Study on the Complexity Reduction of Observed Sequences Based on Different Sampling Methods: A Case of Wind Speed Data

by

Xiaowei Huai

^1,2,3

,

Pengcheng Yan

^4,*

,

Li Li

^1,3,

Zelin Cai

^1,3,

Xunjian Xu

^1,3 and

Xiaohui Hu

⁵

¹

State Key Laboratory of Disaster Prevention and Reduction for Power Grid Transmission and Distribution Equipment, Changsha 410000, China

²

Hunan Disaster Prevention Technology Co., Ltd., Changsha 410000, China

³

Disaster Prevention and Reduction Center, State Grid Hunan Electric Power Company Limited, Changsha 410000, China

⁴

Institute of Arid Meteorology, China Meteorological Administration, Lanzhou 730020, China

⁵

Zhangye Meteorological Bureau of Gansu Province, Zhangye 734000, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2022, 13(11), 1746; https://doi.org/10.3390/atmos13111746

Submission received: 27 July 2022 / Revised: 16 October 2022 / Accepted: 19 October 2022 / Published: 23 October 2022

(This article belongs to the Special Issue Multi-Scale Climate Change: Recent Trends, Current Progress and Future Directions)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Many studies have confirmed that the complexity of a time sequence is closely related to its predictability, but few studies have proposed methods to reduce the time sequence complexity, which is the key to improving its predictability. This study analyzes the complexity reduction method of observed time sequences based on wind speed data. Five sampling methods, namely the random method, average method, sequential method, max method and min method, are used to obtain a new time sequence with a low resolution from a high resolution time sequence. The ideal time sequences constructed by mathematical functions and the observed wind speed time sequences are studied. The results show that the complexity of ideal time series of periodic sequences, chaotic sequences and random sequences increases in turn, and the complexity is expressed by the approximate entropy (ApEn) exponent. Furthermore, the complexity of the observed wind speed is closer to the complexity of a random sequence, which indicates that the wind speed sequence is not easy to predict. In addition, the complexity of sub-time series change with different sampling methods. The complexity of sub-time series obtained by the average method is the lowest, which indicates that the average method can reduce the complexity of observed data effectively. Therefore, the complexity of sub-time series sampled from the high-resolution wind speed data is reduced by using the average method. The method that can reduce the complexity of wind speed substantially will help to choose the appropriate wind speed data, thus improving the predictability.

Keywords:

complexity reduction; observed data; multi-scale; sampling methods; sampling interval; ApEn

1. Introduction

As the complexity of the observed sequence is closely related to its predictability, it is vital to reduce the complexity of the observed dataset [1]. If the system is not so complex, it is easy to predict, such as a train moving at a constant speed or the simple pendulum with a periodic variation. However, if the time series is completely random or chaotic, it is not easy to predict, like the tossing of a coin or the three-body motion in the Poincare section [2]. In a weather or climate system, most motions are difficult to predict as they are randomly and chaotically forced [1,3,4]. Therefore, many researchers have focused on the complexity of a time sequence.

The complexity of an ideal sequence constructed by mathematical functions usually remains unchanged when the length of the sub-sequence is changed. Hou et al. [5,6] studied the logistic model and Lorenz model using the Lemper-Ziv complexity algorithm, and found that their complexities of time series with different lengths were basically the same. In fact, the complexity of the real (observed) data has been verified to be changeable, especially for an atmospheric system and a climatic system, which are difficult to predict. Chou [7] considered that the uncertainty of initial conditions, boundary conditions and physical laws is the main reason for the low predictability of the atmospheric system. In addition, the nonlinear effect causes predictability to decrease, and the forcing and dissipative effects causes predictability to increase, which means that the predictability (complexity) of a system is changeable. Jin and He et al. [8,9,10,11,12] identified a large number of observed time series and found that the complexity of observed sequences in different periods is dissimilar, and the complexity can be used to identify the change in dynamic structure.

The key to accurate predictions is to reduce data complexity. Some dimensionality reduction techniques have been proposed to reduce the complexity of observation data. They use a few modes to represent the main information of complex data to reduce data complexity, such as the empirical mode decomposition method [13], the k-means clustering method, the empirical orthogonal functions (EOF) method [14] and its improved methods, including the extended EOF (EEOF) [15], the rotated EOF (REOF) and the complex EOF (CEOF) [16]. In addition, some sampling methods have also been used to reduce the complexity of data sequences. Zhang et al. [17] resampled the data for radar quantitative precipitation estimation by transforming the data into the wavelet domain to reduce its complexity. The sampling interval has also been found to have some impact on the predictability of time series [18]. For the observed time series, there are different sampling intervals and different sampling methods when collecting data, due to the accuracy of observation instruments, the method for obtaining data and the purpose of the research. However, the complexity of the sequence obtained by these sampling methods, and whether or not it affects the predictability of the sequence, are still inconclusive. Thus, it is necessary to figure out their impacts on the complexity and predictability of the data sequence.

Among different meteorological elements, the predictability of wind speed is relatively low [19]. The wind speed time series has chaotic characteristics at different time scales because wind speed observation is significantly affected by microtopography, seasonal variation [20,21] and many other factors [22], including altitude, temperature, humidity and cloud thickness. These studies indicate that wind speed is difficult to predict. In recent years, some machine learning techniques have been used to predict wind speed, and presented good performance [23]. However, these prediction methods mainly focused on the prediction results. If the complexity of observation data itself can be reduced, the effectiveness of these forecasting methods will be improved. A couple of sampling methods have been used to analyze how to reduce data complexity.

Many exponents have been used to measure the complexity of time series, such as the information entropy [24], approximate entropy (ApEn) [8,9,10,25,26,27], complexity [28], Lyapunov exponent [29,30] and Hurst exponent [31,32]. These methods detect the complexity of time series based on different parameters, showing great potential in identifying abrupt changes and studying predictability. The ApEn exponent has been widely used in many fields to represent the complexity of sequences due to its strong anti-noise ability [8]. In addition, the exponent requires a shorter sequence length and has strong robustness [9].

In this study, the ApEn exponent is taken to express the complexity of wind speed sequence, and five different sampling methods are used to sample a new time series with a low time resolution from the high time resolution wind speed data. The remainder of this paper is organized as follows. Section 2 introduces the data and methods. The complexity of the ideal time series and observed wind speed is analyzed in Section 3. Section 4 provides the conclusion and discussion.

2. Data and Methods

2.1. A Description Method of Complexity—The Approximate Entropy

The ApEn is defined as the probability of generating a new pattern when the sequence dimension changes [25,26,27]. The larger the probability of generating a new pattern, the more complex the sequence and the larger the ApEn. The simple definition is as follows:

Step 1. Mark the time sequence as x₁, x₂, …, x_n.

Step 2. Reconstruct sub-sequences from the sequence x_n, each with a length of m, as shown in Equation (1), where k = n − m + 1.

| \begin{matrix} x_{1} & x_{2} & \dots & x_{m} \\ x_{2} & x_{3} & \dots & x_{m + 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n - m + 1} & \dots & \dots & x_{n} \end{matrix} | \Rightarrow | \begin{matrix} X_{11} & X_{12} & \dots & X_{1 m} \\ X_{i 1} & X_{i 2} & \dots & X_{i m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{k 1} & \dots & \dots & X_{k m} \end{matrix} | = | \begin{matrix} X_{1} \\ X_{i} \\ ⋮ \\ X_{k} \end{matrix} |

(1)

Step 3. For each i, i ∈ [1, k], find the number of X(j), j ∈ [1, k] according to Equation (2). The

C_{i}^{m} (r)

is defined as Equation (2).

C_{i}^{m} (r) = \frac{(n u m b e r o f X_{j} s u c h t h a t d [X_{i}, X_{j}] < r)}{k}

(2)

where

d [X_{i}, X_{j}]

expresses the maximum distance between X_i and X_j, as follows.

d [X_{i}, X_{j}] = \max {| X_{i p} - X_{j p} |}_{p = 1}^{m}

(3)

Step 4. Define the function

Φ^{m} (r) = \frac{1}{k} \sum_{i = 1}^{k} \log (C_{i}^{m} (r))

.

Step 5. Define the ApEn as Equation (4).

ApEn = Φ^{m} (r) - Φ^{m + 1} (r)

(4)

Generally, the parameter m = 2 or m = 3, and the parameter r = 0.2 × std, where std represents the standard deviation of the original time series.

2.2. Sampling Methods

The continuous variable in the actual atmosphere needs to be sampled when it is recorded by the sensor. Different sampling methods and sampling intervals differently represent the data characteristics. The daily/monthly/yearly time series indicate the daily/monthly/yearly variation of meteorological elements. For the long-term data, the daily time series and the monthly-averaged time series are featured by the annual cycle, but if the annual-averaged data is used, this cycle vanishes. Five different sampling methods are introduced as follows. The schematic diagram of Figure 1 is used for the simple introduction. The entire time series is marked as Qn, where n represents the length of the time series. A fragment marked as Sm near point i is obtained from the entire time series, where m represents the fragment’s length. The meteorological element value at point i can be represented by different sampling methods.

(1) The random method. For the observation instrument, the observation should be carried out at the set time point. However, due to random error, the time of data collection is around that time point. Thus, a random observation value from Sm is selected as the value at point i. This is called the random method.

(2) The average method. The value at point i is represented by the average of the Sm sequence, such as the daily average, monthly average or annual average of temperature. This is called the average method.

(3) The sequential method. The value at point j (where j could be 1, 2, 3… or n) in sequence Sm is sequentially selected to represent the value at point i. This is called the sequential method.

(4) The max method and the min method. The maximum or minimum value in the sequence may also be taken as the value at point i, such as the daily maximum and minimum temperature. These two methods are called the max method and the min method.

The above five sampling methods, respectively marked as random, average, sequential, max and min, are applied to obtain sub-sequences from the original time sequence with a high resolution.

2.3. Ideal Time Series and the Observed Wind Speed

The data used as the ideal time series are constructed by mathematical equations. The complexity of time series is expressed by the value of ApEn. Three ideal time series marked as x₁, x₂ and x₃ are constructed with Equation (5). The ideal time series x₁ represents the wave function, which is ordered. The ideal time series x₂ is the numerical solution of the logistic model. The values can be calculated by rewriting the logistic model as its difference form x(n) = x(n − 1) + τk[x(n − 1) − μ][ν − x(n − 1)], where the values of parameters are ν = −1.0, μ = 1.0, k = 135, τ = 0.01 and x(0) = 0.0005. The ideal time series x₃ is created with random numbers. According to previous studies [33,34], the ideal time series x₂ and x₃ are chaotic.

{\begin{cases} x_{^{1}} (i) = \sin (\frac{π \cdot i}{a}) + \cos (\frac{π \cdot i}{b}) \\ x_{2} (i) = > \frac{d x}{d t} = k (x - u) (v - x) \\ x_{3} (i) = r a n d o m (i) \end{cases}

(5)

Based on these three ideal time series, two mixed ideal time series are constructed by mixing x₁ and the other two time series, as shown in Equation (6).

{\begin{cases} y_{^{1}} (i) = α_{1} x_{1} (i) + β_{1} x_{2} (i) \\ y_{2} (i) = α_{2} x_{1} (i) + β_{2} x_{3} (i) \end{cases}

(6)

The ideal time series y₁ is created by combining x₁ and x₂, where the parameters α₁ and β₁ are mixing coefficients, and α₁ + β₁ = 1. The ideal time series y₂ is created with x₁ and x₃, and the sum of mixing coefficients α₂ and β₂ is also 1. Furthermore, it is noticed that the ideal time series x₁, x₂ and x₃ are normalized before they are combined into y₁ and y₂.

The observed wind speed with high time resolution using a Gill Instruments Windmaster Pro 3D, which was set in the Zhangye National Climate Observatory. The observatory (100°17′ E, 39°05′ N, 1456 m) is located in the desert area in northwestern China, and it is one of the first five national climate observatory demonstration stations established by the China Meteorological Administration, focusing on the observation of desert land boundary layer and ecological environment. The observed wind speed data are regionally representative. The time resolution is 0.1 s (10 Hz), and more instrument parameters are shown in Table 1. The observation period was from 25 November 2017 to 4 December 2017. Ten days were included for a total of 8,640,000 time points, and they are marked as d1, d2, …, d10. For each point, the wind speed in U and V directions at 50 m height were observed. The total wind speed was studied.

The ideal sequences were used to verify the description of sequence complexity by the ApEn. The complexity of observed data was compared with that of the ideal sequence to determine the complexity of wind speed, and then the complexity of sub-sequence was studied by using the sampling methods.

3. Results

3.1. The Complexity of Ideal Time Series

The three ideal time series constructed by Equation (5) are shown in Figure 2. The curve (x₁) in Figure 2a is an ideal sequence constructed by the wave function, which periodically changes. However, the other two curves (x₂ and x₃), constructed with the logistic model and random numbers in Figure 2b,c, are both chaotic. In order to study the symmetrical characteristics of these ideal sequences, the quartile values were analyzed (Table 2). The probability distribution of time series x₁ is completely symmetric about the value of 0. For the time series x₃, the distribution is almost symmetric about 0, which is also a composite normal distribution. However, the distribution of time series x₂ is skewed. The ApEn values for x₁, x₂ and x₃ are 0.0001, 0.3086 and 0.5396, respectively. The result shows that the complexity of ordered time sequences constructed by the wave function is almost 0. The complexity of the sequences constructed by complex functions like the logistic model is higher than that of the ordered sequence. The complexity of the completely random sequence is the highest.

To study the influence of the time-series length on series complexity, the entire time series were divided into 10 sub-series with the length of 1000 and 20 sub-series with the length of 500. The ApEn values for each sub-series are shown in Figure 3. The values of each sub-series intercepted from x₁ are all 0.0001, indicating that the complexity of time series x₁ does not vary with length. For the sub-series from time series x₂, there is no significant difference between the ApEn values of sub-series with the lengths of 1000 and 500. Their average ApEn values are 0.3422 and 0.3475, respectively. Compared with the original ideal time series, the bias rates of the ApEn values of length−1000 and length−500 sub-series are 10.87% and 12.61%, respectively. The average ApEn values of the length−1000 and length−500 sub-series intercepted from x₃ are 0.5884 and 0.5840, respectively, and the bias rates are 9.04% and 8.22%, respectively. The ApEn values of the sub-series from x₂ and x₃ also indicate that their complexities almost do not change with the time-series length. To further study the relationship between the time-series length and the complexity, different lengths of ideal time series were selected and the average complexities were calculated, as shown in Figure 4. The ApEn values are still 0.0001 for all sub-series from x₁. For x₂ and x₃, the ApEn values of the sub-series are around 0.3363 and 0.5676, respectively, when the length is less than 5000. If the lengths of the sub-series from x₂ and x₃ are longer than 5000, the ApEn values of each sub-series remain unchanged, which are 0.3086 and 0.5396, respectively. Therefore, for a time series constructed by the mathematical function, the complexity is consistent with that of the original time series, which is consistent with the conclusion of Hou [5,6].

By mixing the ideal sequences, more complex sequences were obtained. In this way, we can study the change in the complexity of these new sequences. The complexities of mixed ideal time series y₁ and y₂ with the ideal time series x₁, x₂ and x₃ were studied, and the ApEn values of 20 sub-series with the length of 1000 are shown in Figure 5, with different colors representing different sub-series. The box plots of ApEn values are shown on the right side. The horizontal ordinate represents the mixing-rate parameter α. When α = 1.0, the mixed time series y₁ and y₂ respectively represent the original ideal time series x₂ and x₃; when α = 0.0, both y₁ and y₂ represent the ideal time series x₁. In addition, the ApEn values of sub-series with the length of 1000, which are intercepted from ideal time series x₁, are marked as red stars. Specifically, the ApEn values of sub-series from y₁ are between 0.2394 and 0.7669, and the ApEn values of sub-series from y₂ are between 0.3949 and 0.7796. By comparing the values of box plots and red stars, it is found that the complexity of the mixed time series is remarkably greater than that of the wave function no matter the value of α, and their ApEn values are closer to the chaotic components. It is verified that the complexity of the time series mainly depends on the chaotic component. In other words, the sequence complexity represents the chaotic characteristics of the sequence.

3.2. The Complexity of the Observed Wind Speed

The observed wind speed for 10 days (marked as d1, d2, …, d10) and the complexity of the daily sequence changing with time are shown in Figure 6. The wind speed changes are almost disorderly, but the daily-averaged wind speed is slightly different. The maximum daily-averaged wind speed is about 4.97 m·s⁻¹ on d2. The minimum daily-averaged wind speed is about 2.12 m s⁻¹ on d10. The ApEn values are shown in Figure 6b, and the daily complexity does not seem to be related to the average wind speed. The wind-speed complexity is the lowest on d1, with the ApEn value being 0.2401. The complexity of the wind speed on d4 is the highest, with the ApEn value being 0.8251. It shows that the complexities of the wind speed at the interval of 0.1 s are completely different on each day, which is closer to the ApEn values of the random sequence. It verifies that the chaotic characteristics of the wind speed time series are more complex than the ordered time series, and that the wind speed time series is difficult to predict. The variation of weak wind speed might be very complicated, while that of large wind speed may be orderly. According to the ApEn value, the wind speed on d1 and d4 were selected to be further studied.

The time series were processed into an interval of 1 s by using different sampling methods from the original wind speed time series with an interval of 0.1 s. There was no significant difference between the ApEn values of the time series obtained by using the sequential method and the random method on d1, with both methods being carried out ten times, as shown in Figure 7a. For the sequential method, the maximum ApEn value is 0.7982 during the ten times, the minimum ApEn value is 0.6865, and the range is 0.1117. Whereas, for the random method, the range is 0.1523, with the maximum and minimum ApEn values being 0.8571 and 0.7048, respectively. The ApEn range of the random method is wider than that of the sequential method. The ApEn values of the other three methods are shown in Figure 7b. As can be seen, the average method has the lowest complexity. This result means that for the wind speed on d1 with the time interval of 1 s, neither the sequential method nor the random method can reduce the complexity of the wind speed time series, but the max and min methods can increase the sequence complexity. Whereas, for the time series of d2, the ApEn value with the min method (0.6853) is the least, and the average ApEn value with the random method is 0.6894, which is almost equal to that with the min method. The ApEn value with the max method (0.8025) is the largest. The comparison of the ApEn values between the two days shows that there is little difference among the complexities obtained by different sampling methods when the time interval is 1 s.

We further increased the sampling interval and studied the complexity. The sampling time interval was separately increased to 1 s, 2 s, 3 s, 4 s, 5 s, 6 s, 10 s, 12 s, 15 s, 20 s, 30 s, 60 s, 120 s, 180 s, 240 s and 300 s. Then, the ApEn values of different methods were calculated, as shown in Figure 8, and the ApEn values by the sequential method and the random method were the average ApEn values. On average, the ApEn values by the average method are lower than those by the other four methods, whether on d1 or d4. When the time interval is less than 6 s, the ApEn value by the average method on d1 is lower than those by other methods, while it is different on d2. When the time interval is greater than 60 s, the situation is similar on d1 and d4. The comparison between Figure 8c,d shows that the ApEn values by the average method are remarkably lower than those by other methods both on d1 and d4, with the time intervals of 6–60 s. The above results show that the average method can reduce the sequence complexity.

When the time intervals were 6–60 s, the ApEn values on all 10 days were calculated, as shown in Figure 9. As the ApEn values by the sequential method and the random method were still greater than those by other methods after several calculations, the two methods were not used. In Figure 9a, the 10-day-averaged ApEn values by the average method are less than those by the max method and the min method for all different time intervals of 6–60 s, except for 20 s. The average ApEn values for different time intervals by the average method, max method and min method are 0.6906, 0.7190 and 0.7376, respectively. For daily wind speed, the ApEn values by the average method are less than those by the other two methods, except for d7. The ApEn values on d4 and d5 are lower than those on the other days, and the ApEn value on d7 is the highest. Figure 9 shows that, in most cases, the complexity of the wind speed time series obtained by the average method is lower than those obtained by the other methods. The above analysis shows that the average method can obtain sequences with lower complexity.

In conclusion, because the wind speed data used in this study is 10 Hz, the complexity of the sequence after reducing the resolution (sampling intervals from 1 s to 60 s) shows that the average method can reduce the complexity of new sequences. For the daily or hourly observation data, using the average method to reduce the resolution of the sequence may obtain the sequence with lower complexity, which is easier to predict. This will be carried out in future research.

4. Conclusion and Discussion

Most of the research on wind speed prediction focuses on how to improve the prediction method, and there has been little research on reducing the complexity of the observation sequence itself. In this study, we used five methods to obtain a new time series from the observed time series with a high resolution and studied their complexities, which are expressed by the ApEn exponent. The main conclusions are as follows.

Based on three ideal time series constructed by the wave function, logistic model and random numbers, the ApEn values are confirmed to be useful to represent the complexity of time series. The more chaotic the time series, the larger the ApEn value. The ApEn value of any part intercepted from the ideal time series is the same as that of the original series, indicating that the complexity is an attribute feature of the time series, especially for the system with a certain complexity. The complexity is related to the chaotic term rather than the ordered term after the time series is mixed with time series with different complexities. The complexity of the time series mainly depends on its chaotic component. This also means that it is vital to separate or reduce chaotic components from the original time series for prediction. In fact, in the theory of short-term climate predictions proposed by Chou et al. [35], it is the linear separation technology that separates the time series into predictable stable components and unpredictable chaotic components. The theory has been successfully applied to the study of short-term climate predictions [36].

The ApEn value of wind-speed observation data is closer to that of the ideal series constructed by random numbers, indicating that the observed wind speed has a greater complexity. Many methods have been used to predict wind speed and have achieved good performance, such as mesoscale numerical models, statistical methods and some machine learning methods. However, if the sequence complexity is reduced before the prediction, the observation ability of wind speed can be further improved, which is completely different from the improvement of the prediction method. The ApEn values obtained by different sampling methods and with different lengths are almost the same, which means that the wind-speed time series is complex and difficult to predict. The wind speed is sampled by five methods, namely the sequential method, average method, random method, max method and min method. The ApEn shows that, for the time series with intervals of 6–60 s, the complexity of the wind speed obtained by the average method is the lowest. This indicates the average wind speed is easy to predict, while the maximum and minimum wind speed is as difficult to predict as the wind speed. However, the instantaneous maximum wind speed is often the most harmful. In crops, bridge construction, wind power generation and even rocket launches, it is necessary to predict the maximum wind speed, but from the research results, this kind of prediction is very difficult. This problem is common in the prediction of wind farms with a time resolution of 15 min. However, if the time interval is shorter (i.e., 1 min) with a lower complexity, we can predict the maximum wind speed more accurately.

The complexity of observed wind-speed time series is related to its variation, and it is closer to the complexity of the ideal sequence of random numbers. This means that the observed wind-speed is difficult to predict. By studying different sampling methods, we find that the complexity of time series with low time resolution obtained by the average method from the high-resolution series is lower than those obtained by other methods. Therefore, it is necessary to observe the wind speed with high time resolution first, and then use the average method to obtain the wind speed with low time resolution. Thus, the complexity can be effectively reduced. For example, the observation interval used in this study was 0.1 s, but such high accuracy is often not required in practical applications. For wind power generation, a time interval of 15 min can meet demands. In this way, the low-resolution data obtained from high-resolution observation data can bring more accurate wind speed predictions. In addition, with the high predictability of low resolution data, we can also make a prediction profile as the high-resolution adjoint function.

It is noted that only the complexity reduction of wind speed under the complex terrain in northwest China has been studied here, while the complexity reduction method for other meteorological elements also needs to be studied in future.

Author Contributions

X.H. (Xiaowei Huai), L.L., Z.C. and X.X. contributed to the conception of the study. P.Y. performed the data analyses and wrote the manuscript. X.H. (Xiaohui Hu) helped preform the analysis with constructive discussions and collect the observed data. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the State Grid Project Research on numerical prediction technology of grid galloping in micro-topographical area (No. 5216A019007J), which was jointly funded by Xiaowei Huai, Zelin Cai and Xunjian Xu.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code, data and material used in this study are available from the corresponding authors (Yan).

Acknowledgments

We thank the Northwest Regional Numerical Forecasting Innovation Team (GSQXCXTD-2020-02) and Zhangye National Climate Observatory.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rind, D. Complexity and Climate. Science 1999, 284, 105–107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nesvorny, D.; Morbidelli, A. Three-Body Mean Motion Resonances and the Chaotic Structure of the Asteroid Belt. Astron. J. 1998, 116, 3029–3037. [Google Scholar] [CrossRef]
Lorenz, E.N. Deterministic nonperiodoc flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
Lorenz, E.N. The predictability of a flow which possesses many scales of motion. Tellus 1969, 21, 289–308. [Google Scholar] [CrossRef]
Hou, W.; Feng, G.L.; Gao, X.Q.; Chou, J.F. Investigation on the time series of ice core and stalagmite based on the analysis of complexity. Acta Phys. Sin. 2005, 54, 2441–2447. [Google Scholar] [CrossRef]
Hou, W.; Feng, G.L.; Dong, W.J. Investigation about the Lorenz model and logistic equation based on the complexity. Acta Phys. Sin. 2005, 54, 3940–3946. [Google Scholar] [CrossRef]
Chou, J.F. Predictability of the atomosphere. Adv. At. Sci. 1989, 6, 335–346. [Google Scholar]
Jin, H.M.; He, W.P.; Zhang, W.; Feng, A.X.; Hou, W. Effects of noises on moving cut data-approximate entropy. Acta Phys. Sin. 2012, 61, 129202. [Google Scholar]
Jin, H.M.; He, W.P.; Hou, W.; Zhang, D.Q. Effects of different trends on moving cut data-approximate entropy. Acta Phys. Sin. 2012, 61, 069201. [Google Scholar]
Jin, H.M.; He, W.P.; Liu, Q.Q.; Wang, J.S.; Feng, G.L. The applicability of research on moving cut data-approximate entropy on abrupt climate change detection. Theor. Appl. Climatol. 2016, 124, 475–486. [Google Scholar] [CrossRef]
He, W.P.; Wu, Q.; Zhang, W.; Wang, Q.G.; Zhang, Y. Comparison of characteristics of moving detrended fluctuation analysis with that of approximate entropy method in detecting abrupt dynamic change. Acta Phys. Sin. 2009, 58, 2862–2871. [Google Scholar]
He, W.P.; He, T.; Chen, H.Y.; Zhang, W.; Wu, Q. A new method to detect abrupt change based on approximate entropy. Acta Phys. Sin. 2011, 60, 813–821. [Google Scholar]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A Math. Phys. Eng. Sci. Lond. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Pearson, K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef] [Green Version]
Weare, B.C.; Nasstrom, T.S. Examples of extended Empirical Orthogonal Function analyses. Mon. Weather. Rev. 1982, 110, 481–485. [Google Scholar] [CrossRef]
Eugene, M.R.; Phillip, A.A.; Chen, W.Y.; John, B.J. Biennial Variations in Surface Temperature over the United States as Revealed by Singular Decomposition. Mon. Weather. Rev. 1981, 109, 587–598. [Google Scholar]
Zhang, C.J.; Wang, H.Y.; Zeng, J.; Ma, L.M.; Guan, L. Short-Term Dynamic Radar Quantitative Precipitation Estimation Based on Wavelet Transform and Support Vector Machine. J. Meteorol. Res. 2020, 34, 413–426. [Google Scholar] [CrossRef]
Shi, Z.; Ding, R.Q.; Li, J.P.; Wang, Z.G. Impacts of sampling interval and interpolation on the estimatimation of the predictability of chaotic systems. Mar. Forecast. 2015, 32, 66–73. [Google Scholar]
Ding, R.Q.; Li, J.P. The temporal-spatial distributions of weather predictability of different variables. Acta Meteorol. Sin. 2009, 67, 343–354. [Google Scholar]
Tian, Z.D. Preliminary research of chaotic characteristics and prediction of short-term wind speed time series. Int. J. Bifurc. Chaos 2020, 30, 2050176. [Google Scholar] [CrossRef]
Neeraj, D.B.; Zaher, M.Y.; Gorm, B.A. ForecastTB—An R Package as a Test-Bench for Time Series Forecasting—Application of Wind Speed and Solar Radiation Modeling. Energies 2020, 13, 2578. [Google Scholar]
Mi, X.W.; Zhao, S. Wind speed prediction based on singular spectrum analysis and neuralnetwork structural learning. Energy Convers. Manag. 2020, 216, 112956. [Google Scholar] [CrossRef]
Vikram., B.; Ajay, K.; Satyam, G. Deep Learning based Wind Speed Forecasting—A Review. In Proceedings of the 2019 9th International Conference on Cloud Computing, Noida, India, 10–11 January 2019. [Google Scholar] [CrossRef]
Feng, A.X.; Gong, Z.Q.; Huang, Y.; Wang, Q.G. Spatiotemporal analysis of information entropy of the global temperature. Acta Phys. Sin. 2011, 60, 1358–1364. [Google Scholar]
Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [Green Version]
Pincus, S.M.; Goldberger, A.L. Physiological time-series analysis, what does regularity quantify? Am. J. Physiol. 1994, 266, 1643–1656. [Google Scholar] [CrossRef]
Pincus, S.M.; Viscarello, R.R. Approximate entropy, a regularity measure for heart rate analysis. Obstet. Gynecol. 1992, 79, 249–255. [Google Scholar]
Pawelzik, K.; Schuster, H.G. Generalized dimensions and entropies from a measured time series. Phys. Rev. A 1987, 35, 481–484. [Google Scholar] [CrossRef]
Wolf, A.; Swift, J.B.; Swinney, H.L.; Vastano, J.A. Determining Lyapunov exponents from a time series. Phys. D Nonlinear Phenom. 1985, 16, 285–317. [Google Scholar] [CrossRef] [Green Version]
Gottwald, G.A.; Melbourne, I. A New Test for Chaos. Physics 2002, 460, 603–611. [Google Scholar]
Carr, J.R. Statistical self -affinity, fractal dimension, and geological interpretation. Eng. Geol. 1987, 48, 269–282. [Google Scholar] [CrossRef]
Rao, A.R.; Bhattachary, D. Hypothesis testing for long -term memory in hydrologic series. J. Hydrol. 1999, 216, 183–196. [Google Scholar] [CrossRef]
Yan, P.C.; Feng, G.L.; Hou, W. A method for predicting the uncompleted climate transition. Nonlinear Process. Geophys. 2020, 27, 489–500. [Google Scholar] [CrossRef]
Yan, P.C.; Hou, W.; Feng, G.L. Transition process of abrupt climate change based on global sea surface temperature over the past century. Nonlinear Process. Geophys. 2016, 23, 115–126. [Google Scholar] [CrossRef] [Green Version]
Chou, J.F.; Zheng, Z.H.; Sun, S.P. The think about 10–30 d extended-range numerical weather prediction strategy—Facing the atmosphere chaos. Sci. Meteorol. Sin. 2010, 5, 569–573. [Google Scholar]
Gong, Z.Q.; Zhao, J.H.; Feng, G.L.; Chou, J.F. Dynamic-statistics combined forecast scheme based on the abrupt decadal change component of summer precipitation in East Asia. Sci. China Earth Sci. 2015, 58, 404–419. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of sampling methods for time series.

Figure 2. Ideal time series constructed by (a) wave function (x₁), (b) logistic model (x₂) and (c) random number (x₃).

Figure 3. The ApEn values of sub-series with lengths of 10,000, 1000 and 500. The value of parameter L represents the length of sub-series.

Figure 4. The ApEn values of sub-series with different lengths.

Figure 5. The ApEn values of two mixed ideal series with different mixing ratios. (a) Time series y₁ mixed by x₁ and x₂; (b) time series y₂ mixed by x₁ and x₃.

Figure 6. (a) The observed wind speed and (b) the ApEn values of daily sequence changing with time. Note: The time interval of wind speed is 1 s for displaying, while the ApEn value has been calculated for the original data with the time interval of 0.1 s.

Figure 7. The ApEn values of 1-s observed wind speed obtained by the sequential order method and the random method (ten times) from 0.1-s time sequences for (a) d1 and (c) d2. The average and the box diagram of ApEn values of 1-s observed wind speed obtained by different sampling methods for (b) d1 and (d) d2.

Figure 8. The ApEn values of observed wind speeds with different time intervals obtained by different sampling methods for (a) d1 and (b) d2; the average ApEn values for d1 and d4 with (c) different time intervals and (d) with the time interval of 6–60 s.

Figure 9. The average ApEn values (a) with different time intervals and (b) by different methods of the average, maximum and minimum.

Table 1. Instrument parameters.

Parameters		Values
Wind Direction (°)	Resolution	0.1
Wind Direction (°)	Range	0~359
Wind Speed (m·s⁻¹)	Resolution	0.01
Wind Speed (m·s⁻¹)	Range	0~65

Table 2. The quartile values for ideal series.

Quartile	x₁	x₂	x₃
0%	−1.35	−2.03	−1.72
25%	−0.83	−0.67	−0.88
50%	0.00	0.26	0.00
75%	0.83	0.78	0.86
100%	1.35	1.27	1.73

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huai, X.; Yan, P.; Li, L.; Cai, Z.; Xu, X.; Hu, X. Study on the Complexity Reduction of Observed Sequences Based on Different Sampling Methods: A Case of Wind Speed Data. Atmosphere 2022, 13, 1746. https://doi.org/10.3390/atmos13111746

AMA Style

Huai X, Yan P, Li L, Cai Z, Xu X, Hu X. Study on the Complexity Reduction of Observed Sequences Based on Different Sampling Methods: A Case of Wind Speed Data. Atmosphere. 2022; 13(11):1746. https://doi.org/10.3390/atmos13111746

Chicago/Turabian Style

Huai, Xiaowei, Pengcheng Yan, Li Li, Zelin Cai, Xunjian Xu, and Xiaohui Hu. 2022. "Study on the Complexity Reduction of Observed Sequences Based on Different Sampling Methods: A Case of Wind Speed Data" Atmosphere 13, no. 11: 1746. https://doi.org/10.3390/atmos13111746

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Study on the Complexity Reduction of Observed Sequences Based on Different Sampling Methods: A Case of Wind Speed Data

Abstract

1. Introduction

2. Data and Methods

2.1. A Description Method of Complexity—The Approximate Entropy

2.2. Sampling Methods

2.3. Ideal Time Series and the Observed Wind Speed

3. Results

3.1. The Complexity of Ideal Time Series

3.2. The Complexity of the Observed Wind Speed

4. Conclusion and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI