Next Article in Journal
Land Surface Phenology Retrieval through Spectral and Angular Harmonization of Landsat-8, Sentinel-2 and Gaofen-1 Data
Next Article in Special Issue
An Approach for Predicting Global Ionospheric TEC Using Machine Learning
Previous Article in Journal
Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation
Previous Article in Special Issue
Assessment of Contemporary Antarctic GIA Models Using High-Precision GPS Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Spatial Filtering Algorithm for Noisy and Missing GNSS Position Time Series Using Weighted Expectation Maximization Principal Component Analysis: A Case Study for Regional GNSS Network in Xinjiang Province

1
School of Geodesy and Geomatics, Wuhan University, 129 Luoyu Road, Wuhan 430079, China
2
GNSS Research Center, Wuhan University, Wuhan 430079, China
3
Land Satellite Remote Sensing Application Center, Ministry of Natural Resources, Beijing 100048, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(5), 1295; https://doi.org/10.3390/rs14051295
Submission received: 18 December 2021 / Revised: 20 February 2022 / Accepted: 6 March 2022 / Published: 7 March 2022

Abstract

:
Common Mode Error (CME) presents a kind of spatially correlated error that is widespread in regional Global Navigation Satellite System (GNSS) networks and should be eliminated during postprocessing of a GNSS position time series. Several spatiotemporal filtering methods have been developed to mitigate the effects of CME. However, such methodologies become inappropriate when missing and noisy data exists. In this research, we introduce a novel spatial filtering algorithm called Weighted Expectation Maximization Principal Component Analysis (WEMPCA) for detecting and removing CME from noisy GNSS position time series with missing values, among which formal errors of daily GNSS solutions are utilized to weight the input data. Compared with traditional PCA and the special case of EMPCA, simulation experiments demonstrate that the new WEMPCA algorithm always has outstanding performance over others. The WEMPCA algorithm was then successfully used to extract the CME from real noisy and missing GNSS position time series in Xinjiang province. Our results show that only the first principal component exhibits significant spatial response, with average values of 70.11%, 66.53%, and 52.45% for North, East, and Up (NEU) components, respectively, indicating that it represents the CME of this region. After removing CME, the canonical correlation coefficients and root mean square error of GNSS residual time series, as well as the amplitudes of power-law noises (PLN), are obviously decreased in all three directions. However, the white noise (WN) amplitudes are found to diminish exclusively in the North and East component, not in the Up components. Moreover, the average velocity differences before and after filtering CME are 0.19 mm/year, 0.03 mm/year, and −0.56 mm/year for the NEU components, respectively, indicating that CME has an influence on the GNSS station velocity estimation. The velocity uncertainty is also reduced by 43.51%, 38.64%, and 40.39% on average for the NEU components, respectively, implying that the velocity estimates are more reliable and accurate after removing CME. Therefore, we conclude that the new WEMPCA approach provides an efficient solution to detect and mitigate CME from the noisy and missing GNSS position time series.

1. Introduction

With the expanding number of GNSS stations and accumulation of GNSS observations, GNSS technology has provided abundant fundamental data for geophysical studies, such as crustal deformation monitoring [1,2], earthquakes [3,4,5], geocentric motion [6,7,8], and glacial isostatic adjustment [9,10], etc. According to previous research [11,12,13], a spatially associated error known as common mode error (CME) exists in regional GNSS networks, which has a significant effect on the accuracy and reliability of GNSS velocity field. The primary drivers of CME include satellite orbit errors, reference frame realization, large-scale environmental loading, and other factors [14,15]. Along with spatial-correlated errors, each GNSS coordinate time series also contains time-related noise [16,17], e.g., white noise (WN), flickering noise (FN), random walk noise (RWN), and so on, as well as missing data due to instrument failure or equipment damage. Thus, the ability to promptly and effectively identify CMEs from noisy and missing GNSS coordinate time series becomes critical for increasing the signal-to-noise ratio and refining the GNSS velocity field.
At present, a variety of approaches have been proposed for detecting and mitigating the effect of CME. Wdowinski et al. [14] first applied regional stacking filtering to deduct CME from the GNSS network in Southern California in order to detect and analyze transient signals caused by the 1992 Landers earthquake. Since then, weights based on formal error [18], between-station correlation coefficients [19], interstation distance [20], or any combination of them [21] have been introduced into stacking filtering to increase the signal-to-noise ratio. However, abnormal sites in the regional network can significantly affect the extraction of CME in such stacking methods. To extract CME more objectively and accurately, the statistical signal decomposition techniques widely utilized in geodetic research [22,23] were introduced to suppress CME. Dong et al. [24] proposed the Principal Component Analysis (PCA) approach for identifying and removing the CME for the southern California integrated GPS network. This method has been widely used to eliminate the impact of CME in GNSS networks because it successfully suppresses the effect of abnormal sites and improves the detection of CME [25,26,27]. Other filtering methods, such as Multi-channel Singular Spectrum Analysis [28] and Independent Component Analysis [29,30], have also been demonstrated to be effective at filtering CME from regional GNSS networks. These approaches, however, are incapable of processing incomplete GNSS coordinate time series, and interpolation must be performed beforehand. If the selected GNSS time series contain an excessive amount of missing data, the derived CME may be distorted.
To address drawbacks of the above methods, improved PCA [31], Probabilistic PCA [15], and Variational Bayesian PCA [32] were developed to directly extract CME from the incomplete GNSS time series, and satisfactory results can be achieved. Nevertheless, since measurement precision varies with epochs, it is not reasonable to regard GNSS daily solution as the same weight when extracting CME. Li et al. [33] then introduced the formal errors of GNSS time series as weighting factors and successfully extracted CME from the 25 GNSS stations in Yunnan Province, China. This method, however, cannot be employed when any two GNSS time series share no or just less common observation epochs. Furthermore, due to larger matrix operation, approaches like PCA and IPCA have a high computational complexity, which is particularly noticeable when dealing with huge databases. In addition, enormous amounts of computer memory are occupied since all eigenvectors must be calculated regardless of whether they are necessary or not.
To compensate for the above shortcomings, we present a new computationally efficient and flexible method, called Weighted Expectation Maximization PCA (WEMPCA) [34], to filter out CME from noisy and missing GNSS position time series. In comparison to previous methods, the WEMPCA algorithm (1) uses an iterative procedure to search the subspace, which allows obtaining a few eigenvectors and eigenvalues without explicit calculation of covariance; (2) can easily handle missing values that present at arbitrary locations in the observation datasets by simply setting their weight as zero; (3) optimizes principal component to describe the true CME without being unduly affected by measurement noise; and (4) computes very efficiently with a light burden for computers. Since CME in a regional network is usually presented in the first PC, it would be more convenient and speedy to extract CME using the novel WEMPCA approach, which does not need to solve all eigenvectors.
The remainder of this paper is arranged in the following manner: Section 2 describes the processing of GNSS data and the WEMPCA algorithm. Section 3 investigates the performance of the WEMPCA approach and the influence of CME on GNSS stations using simulated and real GNSS position time series. Section 4 conducts some discussions. Finally, Section 5 summarizes the conclusions.

2. Data and Methodology

2.1. GNSS Position Time Series

28 GNSS stations located in Xinjiang, China, are collected from the Tectonic and Environmental Observation Network of Mainland China (CMONOC II). The data set is provided by China Earthquake Networks Center (http://data.earthquake.cn, last accessed 10 February 2021). All the GNSS position time series were processed with GAMIT/GLOBK software (version 10.4) and aligned to the ITRF 2000 reference frame. The observation time span is about 10 years, from 2010.4918 to 2019.9986. Any epoch with a formal error of larger than 10,10,20 mm for the North, East, and Up (NEU) is tagged as outliers and excluded. Then, following Bevis and Brown [35], we modeled the linear trend, annual/semiannual terms, offset, and post-seismic deformation of NEU components, which were subtracted from raw position time series to obtain residuals of each component. Those residuals that exceeded three interquartile range (IQR) thresholds were also treated as outliers and discarded. Finally, we obtained the GNSS residual time series with an average deletion ratio of 9.6% for CME extraction. The maximum missing rate is 19.3% at both stations XJYT and XJBY, whereas the minimum missing rate is 3.5% at station XJBL. The geographical location and percentage of missing data for all 28 stations are present in Figure 1.

2.2. The Proposed Weighted Expectation Maximization PCA Algorithm

The traditional PCA method has been widely used to filter CME for regional GNSS networks. However, its standard algorithm [36,37] can neither handle missing data nor make use of any noise estimates or weights for the observed datum. In addition, the covariance matrix and eigenvalue decomposition steps in the standard algorithm produce significant computational complexity, especially for larger databases. To solve these problems, this research introduces a Weighted Expectation Maximization algorithm for PCA (WEMPCA) to directly extract CME from noisy and missing GNSS position time series, which allows efficient extraction of a few leading principal components (PCs) from a large dataset. It could also naturally incorporate measurement noise estimation errors to weight the input data while solving eigenvectors and coefficients, making the PCs less susceptible to measurement noise and more sensitive to true underlying signals [34].
For regional GNSS network, the observed datum matrix X with d rows of variables and n columns of observations can be decomposed as:
X d × n = [ x 1 , x 2 , , x n ] = U d × k V k × n + ε d × n
where x i ( i = 1 , 2 , , n ) is the vector formed by a column i of matrix X , U denotes a matrix whose columns are the PCs to solve, V is the reconstruction coefficients matrix, matrix ε d × n represents the measurement noise, while d , n , and k are the number of epochs, GNSS stations, and PCs respectively. After solving U and V of Equation (1) with the weighted Expectation Maximization algorithm, the CME can be reconstituted by using significant PCs whose spatial responses are usually identical [24].
The Weighted Expectation Maximization algorithm is an iterative technique that mainly includes two steps [38]: the Expectation (E) step, calculating the expected value of latent variables V given the current model parameters U , and the Maximization (M) step, updating the parameters U to U n e w by maximized the likelihood function based on derived V from the E-step. Repeating E-step and M-step until convergence occurs, the optimal global solution can then be determined. Most importantly, both of the above steps can properly weight the observed datum X for solving U and V . Here, we use W as weight matrix, which has the same dimensions as X . Implementation details of the WEMPCA algorithm are as follows:
  • (Initialization): set columns of U to random orthogonal vectors.
  • (Repeat): for t = 1,2, …, convergence:
    • E-step:
      for   j = 1 , , n :     c = diag ( w j )     v j = ( U T c U ) 1 U T c x j V = [ v 1 , v 2 , , v j ]
    • M-step:
      for   p = 1 , , k :    for   i = 1 , , d :      u i , p = j = 1 n w i , j x i , j v p , j j = 1 n w i , j v p , j v p , j    X n e w = X u p v p r o w U new = [ u 1 , u 2 , , u p ] Orthogonalize   :   U
    • (Until): Max | U n e w U | < 10 6 stop
  • (Output): U ,   V
where d ,   n , and k are defined in the same way as in Equation (1), w j , v j , and x j are vectors formed by the j - th column of matrix W , V , and X , respectively, diag ( ) defines the diagonal matrix, c is the diagonal matrix formed by w i , w i , j represents the scalar element at row i column j of matrix W , which is also true for x i , j , u i , p , and v p , j , vectors u p are the p - th column of the matrix U , vectors v p r o w represent the p - th row of matrix V , while X n e w and U n e w stand for the updated matrix.

2.3. Weight Determination for WEMPCA

The widely used PCA method treats all GNSS data identically for calculating the principal components, which may not be appropriate for extracting CME from the regional GNSS network. The previously published studies [39,40] have established that the accuracy of individual GNSS stations changes over epochs. The formal errors of daily solutions are higher during summer than in winter (e.g., the formal errors of station XJKC in the North component, shown in Figure 2). If all GNSS data are given identical weight, the solution of U and V can unduly be influenced, and the CME cannot be optimized extraction. Thus, in this application, the GNSS observational data with a high signal-to-noise ratio will be given more weight than data with a low signal-to-noise ratio. Considering that the formal errors are related to GNSS observation errors, we assigned the weight w i , j of each station at different epochs as follows:
w i , j = E j σ i , j 2 ( 1 σ 1 , j 2 + 1 σ 2 , j 2 + + 1 σ d , j 2 ) , ( i = 1 , 2 d , j = 1 , 2 n )
where w i , j ( i = 1 , 2 d , j = 1 , 2 n ) is the scalar element at row i and column j of weights matrix W , d and n have been defined in Equation (1), σ i , j denotes the formal error of station j at epoch i , and E j indicates the total number of epochs for the j - th station.
If all elements of W are set to 1, the WEMPCA algorithm will treat all data in the matrix X identically and hence fail to account for measurement noise appropriately. To investigate the influence of noise on CME detection and extraction, we defined a special case called EMPCA (assign weight zero to missing elements and one to others) and compared it to the WEMPCA and PCA methods. If there is no missing data in the GNSS position time series, the EMPCA and traditional PCA will produce identical results.

2.4. Handling Missing Data

When GNSS measurement data is incomplete or missing, which is frequently the case in practice, traditional approaches typically handle missing data by explicitly recovering them. Although short-term missing data can be filled well by interpolation, long-term missing data are difficult to recover. Moreover, most data interpolation methods focus exclusively on single-point time series, disregarding the spatial correlation among points in a region [41], which may result in incorrect extraction of CME.
The WEMPCA method presented here directly treated the missing values in GNSS observations as a limiting case of weight = 0 and optimized the principal components by properly accounting for measurement noise. To crosscheck whether missing values are handled correctly, we set the missing data in the selected time series as equal to 9999.0, which can considerably affect the extracted CME if the WEMPCA technique does not correctly ignore these data. Additionally, assuming that all PCs have been calculated, missing data can be easily recovered using U V .

3. Results

3.1. Simulation Experiments and Analysis

Simulation experiments are designed first to evaluate the performance of WEMPCA method in extracting CME from GNSS position time with both missing elements and noise. For this purpose, we initially randomly picked 14 stations from the GNSS network of Xinjiang province and then used the classic PCA approach to extract the CME of these selected stations as the true signal (CMEtrue). Moreover, we used the Hector software [42] to generate noise ε based on the formal errors of associated GNSS stations. The noise models “White + GGM (Generalised Gauss Markov)” were employed in the simulation processing, with the fraction of white noise ϕ = 0.35 and spectral index φ = 0.6 , respectively. Finally, the simulated dataset matrix X sim was constructed by X sim = C M E true + ε . For quantitative comparison of WEMPCA, EMPCA, and PCA methods, we repeated the same experiment 100 times, with different simulated datasets X sim but the same initialization scheme.
To evaluate the influence of noise amplitude on CME extraction, we scaled the formal error from 0.2 to 1.0 with 0.2 increments to generate the noise ε, and no missing data was introduced into the dataset X sim in this simulation experiment. Table 1 shows the RMS of the difference between extracted CME (CMEextract) and real CMEtrue (hereinafter referred to as RMSdiff) for the North component under varying scaled formal errors using different filtering methods. It reveals that the RMSdiff values of these three methods become larger with increasing magnitude of formal errors, and WEMPCA outperforms the other two approaches, namely EMPCA and PCA, during all the 100 simulation experiments. The mean relative improvements of WEMPCA over EMPCA and PCA are 5.20%, 6.13%, 7.39%, 9.68%, and 12.89%, respectively, under formal error scaled from 0.2 to 1.0 with 0.2 increments. Notably, the results of EMPCA and PCA are identical since no missing values were introduced into the dataset X sim . Similar results can be obtained for the East and Up components. Thus, we may conclude that the magnitude of noise can significantly affect CME extraction, and that the WEMPCA method can be used to remove CME from noisy GNSS position time series with high accuracy.
To illustrate the impact of missing data on the precision of CME recognition, Figure 3 shows the RMSdiff of the North component with varying percentages of missing data using PCA, EMPCA, and WEMPCA methods. It should be noted that in each of the 100 simulation experiments, no noise ε is added, and elements are cut off from CMEtrue at 4% to 20% with 4% increments to simulate the missing data. In this case, WEMPCA and EMPCA have the same weight matrix W , and there will be no difference between WEMPCA and EMPCA at this point.
From Figure 3, we can see that the WEMPCA/EMPCA methods are much more robust to missing data, and the extracted results are closer to the original CME. The results of PCA method rise considerably with the increasing amount of missing data, whereas the WEMPCA/EMPCA algorithm results change very little, even when 20% of the observations are deleted from the time series. Therefore, we conclude that the WEMPCA/EMPCA method is effective at extracting CME with minimizing the impact of missing data.
Finally, to explicitly show the performance improvement of WEMPCA in extracting CME from GNSS position time series that contain both missing values and noise, we take the simulated X sim corresponding to 0.6 scaled factor in Table 1 as the initial dataset and then pick up 4% to 20% with 4% increments elements as missing data by assigning them a constant value of 9999.0 to crosscheck the WEMPCA algorithm. Figure 4 shows the results of RMSdiff and the relative improvement percentage of applying WEMPCA, EMPCA, and PCA to the simulated X sim of the North component. As we can see, the RMSdiff values of all three methods rise with increasing missing data, and the WEMPCA algorithm always has outstanding performance over the other two methods. With a data deletion rate ranging from 4% to 20%, the relative improvements of WEMPCA over PCA and EMPCA ranged from 5.80% to 1.83%, 6.96% to 7.81%, respectively. As a result, it is critical to account for formal errors while filtering GNSS position time series, and our approach is to confirm this to be an effective alternative way to obtain CME from noisy and missing GNSS position time series.

3.2. WEMPCA Filtering of Real GNSS Position Time Series

Figure 5 shows the first three scaled PCs (top panels) and their corresponding spatial responses (bottom panels) in the North, East, and Up directions for the 28 Xinjiang GNSS stations obtained using the WEMPCA approach. To make the comparison more intuitive, we slightly modify the quantitative criteria defined in Dong et al. [21] by scaling all the eigenvectors using the maximum absolute value of the first eigenvector. We can see that only the first scaled eigenvectors have the same sign and exhibit a significant positive spatial response with between-station differences for all three components. The mean spatial responses are 70.11%, 66.53, and 52.45% for the NEU component, respectively, whereas the minimum spatial responses are 43.24% for the North (XJJJ), 35.77% for the East (XJJJ), and 34.99% for the Up (XJKL).
The second and third PCs, on the other hand, have entirely distinct properties. Both show the uneven geographical distribution and oscillate interactively in spatial patterns. Additionally, for the second and third PCs, some stations, such as station XJYN in the North and Up components and station XJBL in the East component, exhibit significantly larger spatial responses than the other stations, suggesting that they reflect mostly local effects. Hence, we will only use the first PC and its corresponding eigenvectors to reconstruct CME for the following analysis.

3.3. Analysis of CME Time Series

Figure 6 presents the average CME (right panels) of the 28 GNSS stations in the North, East, and Up components, as well as the associated Lomb Scargle power spectral density (left panels). It reveals that the amplitude of CME in the Up direction is larger than that in the North and East. The standard deviation of CME is 1.28 mm, 1.22 mm, and 4.08 mm for the North, East, and Up components, respectively. In addition, the draconitic signal [43], which has frequencies of 1.04 cycle per year (cpy) and its higher harmonics, shows significant peaks in all three CME Lomb Scargle power spectral density, and we illustrated the first eight draconitic harmonics using brown-dash lines in the right panel of Figure 6. These signals are thought to be primarily related to the GPS constellation’s repetition periods in inertial space with regard to the Sun. Thus, we can conclude that the draconitic signal also played a part in regional CME. Furthermore, the slopes of CME spectra in the N and U components are close to −1, implying that the CME noise characteristics in these directions are proximate to flicker noise. However, the slopes of CME in the E direction show slightly different characteristics, with flicker noise presenting at low frequencies and white noise emerging at high frequencies.
The noise characteristic of the average CME time series was then analyzed using the Hector software. The result shows that the fractions of flicker and white noise in the average CME are 0.54, 0.76, 1.0 and 0.46, 0.24, 0.00 for North, East, and Up, respectively. Hence, we deduce that CME primarily involves flicker noise, with white noises present in some directions.

3.4. CME Impact on the GNSS Position Time Series

Figure 7 illustrates the canonical correlation coefficients of the 28 GNSS residual time series, as well as their corresponding histograms, before and after filtering the CME. Here, we utilized the Canonical Correlation Analysis [44] (CCA) instead of traditional Pearson’s correlation to calculate the correlation coefficients of two GNSS stations. CCA is a multivariate statistical technique that allows us to identify and explore the relationship among multidimensional variables. It can be used to model correlations between two multivariate datasets by finding linear combinations of variables that maximally correlate. Since the CME are regionally correlated, it is more appropriate to evaluate the influence of CME by examining the overall correlation changes in all three directions rather than individual changes. Our CCA result indicates that the interstation correlations for unfiltered residual time series exhibit strong associations among the 28 GNSS stations, with an average value of 0.64 and a maximum correlation of 0.88 between station XJKC and XJZS (about 240 km apart).
In addition, histograms of unfiltered residual time series reveal that the correlation coefficients vary between 0.40 and 0.90, with most values (about 77.25%) centered around 0.55~0.70. These findings confirm that there do exist significant spatial correlation errors in the regional GNSS network, which must be removed before scientific research and application. After filtering the CME by WEMPCA, the canonical correlation coefficients were significantly reduced to 0.29, representing an average reduction of 54.7%. Its histograms similarly reveal a significant drop, with majority values (79.90%) clustering around 0.15~0.35. Therefore, we conclude that our approach can efficiently reconstruct CME and decrease the spatial correlation error for the regional GNSS network.
Figure 8 reveals the RMS reduction ratios of NEU directions after removing CME from the 28 GNSS residual time series. It can be noticed that the RMS clearly decreases for all three components after filtering CME, with average decline ratios of 34.88%, 34.09%, and 33.73% for the NEU components, respectively. The maximal RMS reduction ratios are 50.58% at station XJZS for the North component, 48.29% at station XJSH for the East, and 47.08% at station XJML for the Up component. By comparison, the minimal RMS reduction ratios are 15.56%, 15.62% and 9.10% at station XJKL for all three components. Particularly, GNSS stations (e.g., XJSH, XJBY, and XJDS) in the Tien-Shan area exhibit greater RMS reduction than others, which is likely due to the remarkable hydrological changes [45] in this region.
Table 2 lists the standard deviations of PLN and WN amplitudes for the 28 GNSS position time series in NEU directions before and after filtering CME with WEMPCA. The average standard deviations of PLN and WN are presented in the last row of Table 2, and the maximum amplitude change values in each direction are separately marked in bold black. As we can see, the PLN amplitudes are significantly decreased after filtering CME in all three directions, with an average reduction ratio of 45.12%, 42.00%, and 43.41% for the North, East, and Up components, respectively. The maximum PLN amplitude reduction ratio can be found at station XJZS (66.76%) for the North component, station XJSH (63.17%) for the East, and station XJSH (58.20%) for the Up. However, the WN amplitude shows different features, which are reduced in the North and East components but enlarged in the up direction. Specifically, the average WN reduction ratios in horizontal directions are 9.19% (North) and 21.54% (East), while the vertical WN amplitude rises from 0.01 to 1.78 at station XJFY. This is because the white noise is most certainly caused by site-specific errors that CME filtering cannot eliminate.
Figure 9 shows the velocity differences (filtered minus unfiltered) between filtered and unfiltered time series of the 28 GNSS stations estimated by Hector software using PLN plus WN noise models. We find that the horizontal velocity differences are less than those in the vertical component, with the former average at 0.19 mm/year (North) and 0.03 mm/year (East), and the latter average at −0.56 mm/year (UP). The maximum differences are 0.85 mm/year at station XJKL for the North component, 0.21 mm/year at station XJWU for the East component, and −2.67 mm/year at station XJKL for the Up component. Overall, our results indicate that CME has a certain influence on the velocity estimation of GNSS stations and must be addressed appropriately.
Correspondingly, Figure 10 presents the velocity uncertainty in NEU directions before and after removing CME from the 28 GNSS stations by the WEMPCA method. Obvious uncertainty reductions can be noticed for majority stations, with average reductions of 43.51%, 38.64%, and 40.39 for the NEU components, respectively. The maximum reduction ratios for the North, East, UP directions are 74.87% at station XJKC, 72.90% at station XJSH, and 61.29% at station XJBE. Hence, we conclude that CME will inevitably affect the precision of GNSS position, velocity, and related uncertainties, and that our method is confirmed to be an effective method for filtering out CME.

4. Discussion

Spatial correlated errors, specifically CME, exist in regional GNSS networks and can have a significant effect on the precision of GNSS position, velocity, and associated uncertainties [31,32]. Thus, it is necessary to accurately filter out CME from the noisy and missing GNSS time series. Traditional filtering methods, such as stacking filtering, PCA, and ICA, can eliminate the impact of CME in the GNSS networks. However, as previously indicated, these algorithms have limitations. The WEMPCA algorithm described here provides a number of improvements over previous methods and outperforms them in several ways. A primary advantage of WEMPCA is that it can naturally incorporate weights on the input data, making the output eigenvectors less susceptible to measurement noise and more sensitive to true underlying signals [34]. Nevertheless, this weighting strategy may cause unexpected results when GNSS stations have strong local effects but with small formal errors [15]. Thus, the WEMPCA method is more appropriate for GNSS stations with weak local effects, and the GNSS stations with strong local effects must be identified and removed prior to CME extraction [24,33]. Moreover, compared with traditional PCA, the WEMPCA algorithm enables iterative extraction of only the first few PCs from noisy and missing GNSS time series, avoiding the need to keep the entire dataset at each step and therefore consuming less memory [34,38]. As a result, the WEMPCA is a robust and efficient alternative method for extracting CME from noisy and missing GNSS time series, particularly for large datasets.

5. Conclusions

In this research, the WEMPCA algorithm, which can properly incorporate measurement noise, was proposed to extract CME from the noisy and missing GNSS position time series. The simulation experiments reveal that the WEMPCA algorithm always outperforms the others, namely, traditional PCA and EMPCA. The relative improvements of WEMPCA over PCA and EMPCA are 1.83% and 7.81%, respectively, even when the simulated dataset X sim contains 20% missing data and 0.6 scaled factor noise.
The WEMPCA method was then successfully used to process the 28 real noisy and missing GNSS position time series. Our results demonstrate that only the first PC exhibits significant positive spatial response, with average values of 70.11%, 66.53%, and 52.45% for North, East, and Up (NEU) components, respectively, indicating that they represent the CME. The analysis of CME also reveals that draconitic signal is definitely evident in lower harmonics, and CME predominantly consists of flicker noise with some white noise present in certain directions.
After removing CME, the canonical correlation coefficients significantly drop from 0.64 to 0.29, showing an average reduction ratio of 54.7%. The RMS values of residual time series are also obviously reduced for all three components, with an average decrease ratio of 34.88%, 34.09%, and 33.73% for the NEU components, respectively. In addition, the PLN and WN amplitudes exhibit distinct features after CME filtering, with the former declining significantly in all three directions, while the latter only falling in the North and East, but not in the Up component.
Meanwhile, the average velocity differences before and after filtering CME can reach 0.19 mm/year for the North, 0.03 mm/year for the East, and −0.56 mm/year for the Up, indicating that CME has a certain effect on GNSS station velocity estimation. After filtering the CME with WEMPCA, the velocity uncertainty can be decreased by 43.51%, 38.64%, and 40.39 on average for the NEU components, respectively. Therefore, we can conclude that the CME has an influence on the precision of GNSS position, velocity, and uncertainties, and that our strategy presents an effective method for mitigating the effects of CME.

Author Contributions

W.L. and Z.L.; Data curation, J.W.; Formal analysis, G.Z.; Funding acquisition, Z.L., W.J., Q.C. and G.Z.; Investigation, W.L.; Methodology, W.L.; Project administration, W.J.; Resources, Z.L.; Software, G.Z.; Supervision, W.J.; Validation, Q.C.; Visualization, J.W.; Writing—original draft, W.L.; Writing—review & editing, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (grant number 2018YFC1503600), National Natural Science Foundation of China (grant number 42174030, 42004017, and 42174041).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the Tectonic and Environmental Observation Network of Mainland China (CMONOC II) for providing the GNSS data. We also thank Machiel Bos, Rui Fernandes and Luisa Bastos for providing the Hector software (version 1.72, http://segal.ubi.pt/hector/, last accessed 10 September 2021). Thanks to anonymous reviewers and editors for their valuable feedback on the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Geng, J.; Pan, Y.; Li, X.; Guo, J.; Liu, J.; Chen, X.; Zhang, Y. Noise Characteristics of High-Rate Multi-GNSS for Subdaily Crustal Deformation Monitoring. J. Geophys. Res. Solid Earth 2018, 123, 1987–2002. [Google Scholar] [CrossRef]
  2. Zheng, K.; Zhang, X.; Li, X.; Li, P.; Chang, X.; Sang, J.; Ge, M.; Schuh, H. Mitigation of Unmodeled Error to Improve the Accuracy of Multi-GNSS PPP for Crustal Deformation Monitoring. Remote Sens. 2019, 11, 2232. [Google Scholar] [CrossRef] [Green Version]
  3. Geng, T.; Xie, X.; Fang, R.; Su, X.; Zhao, Q.; Liu, G.; Li, H.; Shi, C.; Liu, J. Real-time capture of seismic waves using high-rate multi-GNSS observations: Application to the 2015 Mw 7.8 Nepal earthquake. Geophys. Res. Lett. 2016, 43, 161–167. [Google Scholar] [CrossRef] [Green Version]
  4. Caporali, A.; Floris, M.; Chen, X.; Nurce, B.; Bertocco, M.; Zurutuza, J. The November 2019 Seismic Sequence in Albania: Geodetic Constraints and Fault Interaction. Remote Sens. 2020, 12, 846. [Google Scholar] [CrossRef] [Green Version]
  5. Guns, K.A.; Pollitz, F.F.; Lay, T.; Yue, H. Exploring GPS Observations of Postseismic Deformation Following the 2012 MW7.8 Haida Gwaii and 2013 MW7.5 Craig, Alaska Earthquakes: Implications for Viscoelastic Earth Structure. J. Geophys. Res. Solid Earth 2021, 126, e2021JB021891. [Google Scholar] [CrossRef]
  6. Kang, Z.; Tapley, B.; Chen, J.; Ries, J.; Bettadpur, S. Geocenter motion time series derived from GRACE GPS and LAGEOS observations. J. Geod. 2019, 93, 1931–1942. [Google Scholar] [CrossRef]
  7. Meindl, M.; Beutler, G.; Thaller, D.; Dach, R.; Jäggi, A. Geocenter coordinates estimated from GNSS data as viewed by perturbation theory. Adv. Space Res. 2013, 51, 1047–1064. [Google Scholar] [CrossRef]
  8. Lavallée, D.A.; van Dam, T.; Blewitt, G.; Clarke, P.J. Geocenter motions from GPS: A unified observation model. J. Geophys. Res. Solid Earth 2006, 111, B05405. [Google Scholar] [CrossRef] [Green Version]
  9. Thomas, I.D.; King, M.A.; Bentley, M.J.; Whitehouse, P.L.; Penna, N.T.; Williams, S.D.P.; Riva, R.E.M.; Lavallee, D.A.; Clarke, P.J.; King, E.C.; et al. Widespread low rates of Antarctic glacial isostatic adjustment revealed by GPS observations. Geophys. Res. Lett. 2011, 38, L22302. [Google Scholar] [CrossRef] [Green Version]
  10. Simon, K.M.; Riva, R.E.M.; Vermeersen, L.L.A. Constraint of glacial isostatic adjustment in the North Sea with geological relative sea level and GNSS vertical land motion data. Geophys. J. Int. 2021, 227, 1168–1180. [Google Scholar] [CrossRef]
  11. Yuan, P.; Jiang, W.; Wang, K.; Sneeuw, N. Effects of Spatiotemporal Filtering on the Periodic Signals and Noise in the GPS Position Time Series of the Crustal Movement Observation Network of China. Remote Sens. 2018, 10, 1472. [Google Scholar] [CrossRef] [Green Version]
  12. He, X.; Montillet, J.-P.; Fernandes, R.; Bos, M.; Yu, K.; Hua, X.; Jiang, W. Review of current GPS methodologies for producing accurate time series and their error sources. J. Geodyn. 2017, 106, 12–29. [Google Scholar] [CrossRef]
  13. Kreemer, C.; Blewitt, G. Robust estimation of spatially varying common-mode components in GPS time-series. J. Geod. 2021, 95, 13. [Google Scholar] [CrossRef]
  14. Wdowinski, S.; Bock, Y.; Zhang, J.; Fang, P.; Genrich, J. Southern California permanent GPS geodetic array: Spatial filtering of daily positions for estimating coseismic and postseismic displacements induced by the 1992 Landers earthquake. J. Geophys. Res. Solid Earth 1997, 102, 18057–18070. [Google Scholar] [CrossRef]
  15. Gruszczynski, M.; Klos, A.; Bogusz, J. A Filtering of Incomplete GNSS Position Time Series with Probabilistic Principal Component Analysis. Pure Appl. Geophys. 2018, 175, 1841–1867. [Google Scholar] [CrossRef]
  16. He, X.; Bos, M.S.; Montillet, J.P.; Fernandes, R.M.S. Investigation of the noise properties at low frequencies in long GNSS time series. J. Geod. 2019, 93, 1271–1282. [Google Scholar] [CrossRef]
  17. He, X.; Bos, M.S.; Montillet, J.-P.; Fernandes, R.; Melbourne, T.; Jiang, W.; Li, W. Spatial Variations of Stochastic Noise Properties in GPS Time Series. Remote Sens. 2021, 13, 4534. [Google Scholar] [CrossRef]
  18. Nikolaidis, R. Observation of Geodetic and Seismic Deformation with the Global Positioning System; University of California: San Diego, CA, USA, 2002. [Google Scholar]
  19. Tian, Y.; Shen, Z. Correlation weighted stacking filtering of common-mode component in GPS observation network. Acta Seismol. Sin 2011, 33, 198–208. [Google Scholar]
  20. Márquez-Azúa, B.; DeMets, C. Crustal velocity field of Mexico from continuous GPS measurements, 1993 to June 2001: Implications for the neotectonics of Mexico. J. Geophys. Res. Solid Earth 2003, 108, 2450. [Google Scholar] [CrossRef]
  21. Tian, Y.; Shen, Z.-K. Extracting the regional common-mode component of GPS station position time series from dense continuous network. J. Geophys. Res. Solid Earth 2016, 121, 1080–1096. [Google Scholar] [CrossRef]
  22. Forootan, E. Statistical Signal Decomposition Techniques for Analyzing Time-Variable Satellite Gravimetry Data. Ph.D. Thesis, University of Bonn, Bonn, Germany, 2014. [Google Scholar]
  23. Cheng, P.; Cheng, Y.; Wang, X.; Wu, S.; Xu, Y. Realization of an Optimal Dynamic Geodetic Reference Frame in China: Methodology and Applications. Engineering 2020, 6, 879–897. [Google Scholar] [CrossRef]
  24. Dong, D.; Fang, P.; Bock, Y.; Webb, F.; Prawirodirdjo, L.; Kedar, S.; Jamason, P. Spatiotemporal filtering using principal component analysis and Karhunen-Loeve expansion approaches for regional GPS network analysis. J. Geophys. Res. Solid Earth 2006, 111, B03405. [Google Scholar] [CrossRef] [Green Version]
  25. Ji, K.H.; Herring, T.A. Transient signal detection using GPS measurements: Transient inflation at Akutan volcano, Alaska, during early 2008. Geophys. Res. Lett. 2011, 38, L06307. [Google Scholar] [CrossRef]
  26. He, X.; Hua, X.; Yu, K.; Xuan, W.; Lu, T.; Zhang, W.; Chen, X. Accuracy enhancement of GPS time series using principal component analysis and block spatial filtering. Adv. Space Res. 2015, 55, 1316–1327. [Google Scholar] [CrossRef]
  27. Ma, X.; Liu, B.; Dai, W.; Kuang, C.; Xing, X. Potential Contributors to Common Mode Error in Array GPS Displacement Fields in Taiwan Island. Remote Sens. 2021, 13, 4221. [Google Scholar] [CrossRef]
  28. Zhou, M.; Guo, J.; Shen, Y.; Kong, Q.; YUAN, J. Extraction of common mode errors of GNSS coordinate time series based on multi-channel singular spectrum analysis. Chin. J. Geophys. 2018, 61, 4383–4395. [Google Scholar] [CrossRef]
  29. Ming, F.; Yang, Y.; Zeng, A.; Zhao, B. Spatiotemporal filtering for regional GPS network in China using independent component analysis. J. Geod. 2017, 91, 419–440. [Google Scholar] [CrossRef]
  30. Li, W.; Li, F.; Zhang, S.; Lei, J.; Zhang, Q.; Yuan, L. Spatiotemporal Filtering and Noise Analysis for Regional GNSS Network in Antarctica Using Independent Component Analysis. Remote Sens. 2019, 11, 386. [Google Scholar] [CrossRef] [Green Version]
  31. Shen, Y.; Li, W.; Xu, G.; Li, B. Spatiotemporal filtering of regional GNSS network’s position time series with missing data using principle component analysis. J. Geod. 2014, 88, 1–12. [Google Scholar] [CrossRef]
  32. Li, W.; Jiang, W.; Li, Z.; Chen, H.; Chen, Q.; Wang, J.; Zhu, G. Extracting Common Mode Errors of Regional GNSS Position Time Series in the Presence of Missing Data by Variational Bayesian Principal Component Analysis. Sensors 2020, 20, 2298. [Google Scholar] [CrossRef] [Green Version]
  33. Li, W.; Shen, Y. The Consideration of Formal Errors in Spatiotemporal Filtering Using Principal Component Analysis for Regional GNSS Position Time Series. Remote Sens. 2018, 10, 534. [Google Scholar] [CrossRef] [Green Version]
  34. Bailey, S. Principal component analysis with noisy and/or missing data. Publ. Astron. Soc. Pac. 2012, 124, 1015. [Google Scholar] [CrossRef] [Green Version]
  35. Bevis, M.; Brown, A. Trajectory models and reference frames for crustal motion geodesy. J. Geod. 2014, 88, 283–311. [Google Scholar] [CrossRef] [Green Version]
  36. Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
  37. Golub, G.H.; Van Loan, C.F.; Press, J.H.U.; Van Loan, P.C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 1989. [Google Scholar]
  38. Roweis, S. EM algorithms for PCA and SPCA. Adv. Neural Inf. Processing Syst. 1998, 626–632. [Google Scholar]
  39. Owari, T.; Kasahara, H.; Oikawa, N.; Fukuoka, S. Seasonal variation of global positioning system (GPS) accuracy within the Tokyo University Forest in Hokkaido. Bull. Tokyo Univ. 2009, 120, 19–28. [Google Scholar]
  40. Bogusz, J.; Gruszczynski, M.; Figurski, M.; Klos, A. Spatio-temporal filtering for determination ofcommon mode error in regional GNSS networks. Open Geosci. 2015, 7, 140–148. [Google Scholar] [CrossRef] [Green Version]
  41. Liu, N.; Dai, W.; Santerre, R.; Kuang, C. A MATLAB-based Kriged Kalman Filter software for interpolating missing data in GNSS coordinate time series. GPS Solut. 2017, 22, 25. [Google Scholar] [CrossRef]
  42. Bos, M.; Fernandes, R.; Williams, S.; Bastos, L. Fast error analysis of continuous GNSS observations with missing data. J. Geod. 2013, 87, 351–360. [Google Scholar] [CrossRef] [Green Version]
  43. Amiri-Simkooei, A. On the nature of GPS draconitic year periodic pattern in multivariate position time series. J. Geophys. Res. Solid Earth 2013, 118, 2500–2511. [Google Scholar] [CrossRef] [Green Version]
  44. Hardoon, D.R.; Szedmak, S.; Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 2004, 16, 2639–2664. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Li, W.; Guo, J.; Chang, X.; Zhu, G.; Kong, Q. Terrestrial Water Storage Changes in the Tianshan Mountains of Xinjiang Measured by GRACE During 2003~2013. Geomat. Inf. Sci. Wuhan Univ. 2017, 42, 1021–1026. [Google Scholar] [CrossRef]
Figure 1. The geographical location and percentage of missing data for all 28 GNSS stations in Xinjiang province of China.
Figure 1. The geographical location and percentage of missing data for all 28 GNSS stations in Xinjiang province of China.
Remotesensing 14 01295 g001
Figure 2. Formal error of station XJKC in the North component.
Figure 2. Formal error of station XJKC in the North component.
Remotesensing 14 01295 g002
Figure 3. RMS of difference between extracted CME and real CME of North component with varying percentages of missing data by WEMPCA, EMPCA, and PCA filtering methods.
Figure 3. RMS of difference between extracted CME and real CME of North component with varying percentages of missing data by WEMPCA, EMPCA, and PCA filtering methods.
Remotesensing 14 01295 g003
Figure 4. RMS of difference between extracted CME and real CME of the North component with missing and noisy data by WEMPCA, EMPCA and PCA filtering methods.
Figure 4. RMS of difference between extracted CME and real CME of the North component with missing and noisy data by WEMPCA, EMPCA and PCA filtering methods.
Remotesensing 14 01295 g004
Figure 5. The first three scaled PCs and their associated spatial responses for the North (upper), East (middle), and Up (lower) components of the 28 Xinjiang GNSS stations computed using the WEMPCA approach.
Figure 5. The first three scaled PCs and their associated spatial responses for the North (upper), East (middle), and Up (lower) components of the 28 Xinjiang GNSS stations computed using the WEMPCA approach.
Remotesensing 14 01295 g005aRemotesensing 14 01295 g005b
Figure 6. Average CME (left panels) in the North, East, and UP component for the 28 Xinjiang GNSS stations, as well as the associated Lomb Scargle power spectral density (right panels). Brown-dash lines indicate the draconitic oscillations (1.04 cpy) and their harmonics up to the 8th.
Figure 6. Average CME (left panels) in the North, East, and UP component for the 28 Xinjiang GNSS stations, as well as the associated Lomb Scargle power spectral density (right panels). Brown-dash lines indicate the draconitic oscillations (1.04 cpy) and their harmonics up to the 8th.
Remotesensing 14 01295 g006
Figure 7. Canonical correlation coefficients of the 28 GNSS residual time series, as well as their corresponding histograms, before and after filtering CME.
Figure 7. Canonical correlation coefficients of the 28 GNSS residual time series, as well as their corresponding histograms, before and after filtering CME.
Remotesensing 14 01295 g007
Figure 8. RMS reduction ratios for the NEU component after removing CME from the 28 GNSS residual time series.
Figure 8. RMS reduction ratios for the NEU component after removing CME from the 28 GNSS residual time series.
Remotesensing 14 01295 g008
Figure 9. Velocity differences between filtered and unfiltered time series for the North (left panel), East (middle panel), and Up (right panel) components of the 28 GNSS stations estimated by Hector software with PLN plus WN noise models.
Figure 9. Velocity differences between filtered and unfiltered time series for the North (left panel), East (middle panel), and Up (right panel) components of the 28 GNSS stations estimated by Hector software with PLN plus WN noise models.
Remotesensing 14 01295 g009
Figure 10. Velocity uncertainty for the NEU directions before and after removing CME from the 28 GNSS stations by WEMPCA method.
Figure 10. Velocity uncertainty for the NEU directions before and after removing CME from the 28 GNSS stations by WEMPCA method.
Remotesensing 14 01295 g010
Table 1. The RMS of difference between extracted CME and true CME in the North component under varying scaled formal errors using WEMPCA, EMPCA and PCA filtering methods.
Table 1. The RMS of difference between extracted CME and true CME in the North component under varying scaled formal errors using WEMPCA, EMPCA and PCA filtering methods.
ScaledWEMPCAEMPCAPCAImprovement
(WEMPC Relative to PCA)
MaxMinMeanMaxMinMeanMaxMinMean
0.20.190.180.180.200.190.190.200.190.19 Remotesensing 14 01295 i001
0.40.380.350.360.410.380.390.410.380.39 Remotesensing 14 01295 i002
0.60.560.530.550.610.570.590.610.570.59 Remotesensing 14 01295 i003
0.80.750.710.730.850.760.810.850.760.81 Remotesensing 14 01295 i004
1.00.930.880.911.091.001.051.091.001.05  Remotesensing 14 01295 i005
Table 2. Standard deviations of PLN and WN amplitudes for the 28 GNSS position time series in NEU directions before and after filtering CME with WEMPCA.
Table 2. Standard deviations of PLN and WN amplitudes for the 28 GNSS position time series in NEU directions before and after filtering CME with WEMPCA.
No.StationNEU
UnfilteredFilteredUnfilteredFilteredUnfilteredFiltered
PLNWNPLNWNPLNWNPLNWNPLNWNPLNWN
1XJAL6.250.654.600.534.530.683.050.5020.800.1217.031.57
2XJBC5.170.662.290.584.840.952.460.7814.392.147.692.10
3XJBE5.300.762.790.634.320.652.340.5715.890.078.011.50
4XJBL6.470.683.220.666.181.154.060.9216.252.279.331.89
5XJBY4.690.632.220.574.040.742.070.6212.722.035.321.94
6XJDS5.090.632.000.593.910.741.530.5912.890.025.471.78
7XJFY4.330.642.540.593.630.572.220.5613.070.017.351.78
8XJHT5.070.613.010.594.580.862.860.6912.432.048.081.82
9XJJJ2.880.482.840.422.420.552.390.559.171.336.651.97
10XJKC4.510.541.600.493.820.701.440.5812.961.655.961.97
11XJKE4.440.642.700.633.900.802.530.6612.812.056.522.43
12XJKL6.710.894.680.805.580.874.150.6718.412.7913.953.08
13XJML4.170.542.350.473.130.562.030.3911.640.265.481.45
14XJQH4.260.652.700.583.580.332.150.1613.070.026.381.32
15XJQM3.720.452.680.433.350.632.160.4311.371.447.931.49
16XJRQ3.320.432.360.392.710.551.830.4110.010.666.330.58
17XJSH4.920.632.320.533.710.691.370.4813.461.285.631.93
18XJSS3.510.452.360.472.610.551.910.4810.600.016.051.68
19XJTC5.490.752.270.674.950.992.480.9115.651.708.482.16
20XJTZ4.140.542.350.473.670.731.920.5711.631.677.111.53
21XJWL4.580.551.950.543.300.691.460.5111.971.165.351.73
22XJWQ6.210.872.420.774.990.892.260.6616.781.647.702.30
23XJWU6.680.883.210.756.921.214.660.9719.182.9612.192.67
24XJXY5.110.631.780.663.980.691.670.5614.971.877.772.09
25XJYC5.560.632.840.565.690.963.310.7117.532.5610.992.25
26XJYN6.950.744.600.786.301.044.470.8624.463.4517.093.50
27XJYT4.530.542.550.344.280.782.420.4813.591.688.941.34
28XJZS5.400.611.800.604.450.811.850.6313.801.485.971.97
Mean4.980.632.680.574.260.762.470.6014.341.448.241.92
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, W.; Li, Z.; Jiang, W.; Chen, Q.; Zhu, G.; Wang, J. A New Spatial Filtering Algorithm for Noisy and Missing GNSS Position Time Series Using Weighted Expectation Maximization Principal Component Analysis: A Case Study for Regional GNSS Network in Xinjiang Province. Remote Sens. 2022, 14, 1295. https://doi.org/10.3390/rs14051295

AMA Style

Li W, Li Z, Jiang W, Chen Q, Zhu G, Wang J. A New Spatial Filtering Algorithm for Noisy and Missing GNSS Position Time Series Using Weighted Expectation Maximization Principal Component Analysis: A Case Study for Regional GNSS Network in Xinjiang Province. Remote Sensing. 2022; 14(5):1295. https://doi.org/10.3390/rs14051295

Chicago/Turabian Style

Li, Wudong, Zhao Li, Weiping Jiang, Qusen Chen, Guangbin Zhu, and Jian Wang. 2022. "A New Spatial Filtering Algorithm for Noisy and Missing GNSS Position Time Series Using Weighted Expectation Maximization Principal Component Analysis: A Case Study for Regional GNSS Network in Xinjiang Province" Remote Sensing 14, no. 5: 1295. https://doi.org/10.3390/rs14051295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop