Next Article in Journal
Optimized Deep-Learning-Based Method for Cattle Udder Traits Classification
Previous Article in Journal
(SDGFI) Student’s Demographic and Geographic Feature Identification Using Machine Learning Techniques for Real-Time Automated Web Applications
Previous Article in Special Issue
A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Variable Selection for Spatial Logistic Autoregressive Models

School of Science, China University of Petroleum, Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(17), 3095; https://doi.org/10.3390/math10173095
Submission received: 21 June 2022 / Revised: 30 July 2022 / Accepted: 22 August 2022 / Published: 29 August 2022
(This article belongs to the Special Issue Mathematical and Computational Statistics and Their Applications)

Abstract

:
When the spatial response variables are discrete, the spatial logistic autoregressive model adds an additional network structure to the ordinary logistic regression model to improve the classification accuracy. With the emergence of high-dimensional data in various fields, sparse spatial logistic regression models have attracted a great deal of interest from researchers. For the high-dimensional spatial logistic autoregressive model, in this paper, we propose a variable selection method with for the spatial logistic model. To identify important variables and make predictions, one efficient algorithm is employed to solve the penalized likelihood function. Simulations and a real example show that our methods perform well in a limited sample.
MSC:
62F12; 62G08; 62G20; 62J07T07

1. Introduction

As a branch of modern econometrics, spatial econometrics has been widely used in many traditional economic fields such as regional economy, real estate economy, demand analysis, and labor economics as well as ecology, epidemiology, and other disciplines. At present, many modeling methods have been used to deal with spatial econometrics. In many models, our commonly used models are Spatial Autoregressive Model (SAR), SpatialLag of X Model (SLX), Spatial Error Model (SEM), Spatial Autoregressive Combined Model (SAC), and Spatial Durbin Model (SDM), etc. Among them, the spatial autoregressive (SAR) model proposed by Ord (1975) [1] has been popularly used. In the framework of spatial autoregression, Anselin (1980) [2] discussed the estimation method of parameters. Cliff and Ord (1981) [3] investigated the maximum likelihood estimation based on this work. Lee (2004) [4] applied the maximum likelihood estimation and quasimaximum likelihood estimation to the spatial econometric model and strictly deduced the asymptotic distribution of the estimated parameters. SAR models can be applied to many fields including social Sciences (Ma (2020) [5], Darmofal (2015) [6]), real estate (Osland (2010) [7]), crime incidents (Ahmar et al. (2018) [8]), analyzing poverty (Islamy et al. (2021) [9]) and ecological analysis (Jay et al. (2018) [10]). According to the SAR model, we can view the factors affecting dependent variables as a natural combination of the independent variables and the spatial spillover effects of the dependent variables. Thence, the model can conveniently deal with traditional covariates and network dependence problems.
The main focus of the existing spatial econometrics literature is the statistical inference of the spatial mean regression model whose dependent variable is continuous, which only reflects the location information of the conditional distribution of the explained variable. As many scholars have found that many dependent variables in practical problems are discrete variables in empirical research, spatial logistic autoregressive model has attracted the attention of theoretical econometrists and applied researchers. Spatial logistic regression model studies the influence of covariates on the correlation response of spatial discrete values. Spatial logistic regression model based on classification technology to model spatial data is a new field of spatial econometrics, and the related research is still limited.
In recent years, many studies on spatial autoregressive model have proposed several methods to analyze the regression under high-dimensional data. Penalized techniques and their variants have attracted people’s attention to high-dimensional data analysis by shrinking inactive coefficients to 0. Such as LASSO (Tibshirani (1996) [11]), SCAD (Fan and Li (2001) [12]) and MCP (Zhang (2010) [13]) for mean regression. For high dimensional spatial data, Han et al. (2017) [14] proposed the estimation and model selection of higher-order spatial autoregressive models through an efficient Bayesian approach. They developed a more efficient algorithm based on the exchange algorithm, in order to solve the problem of computing the Jacobian determinant in the likelihood function of the parametric posterior distribution when the number of cross-sectional spatial units is large. Liu et al. (2018) [15] developed a penalized quasimaximum likelihood method for simultaneous model selection and parameter estimation in the spatial autoregressive model with independent and identical distributed errors. Michael (2019) [16] proposed two global-local shrinkage priors under the background of high-dimensional matrix exponential spatial specifications. Especially when the number of parameters to be estimated surpasses the number of observations, both simulations and real data results reveal that they perform particularly well in high-dimensional settings. Song et al. (2021) [17] propose a class of penalized robust regression estimators on the basis of exponential squared loss with independent and identical distributed errors for general spatial autoregressive models.Numerical studies demonstrate that the proposed method is especially robust and applicable when there are outliers or intensive noise in the observations or when the estimated spatial weight matrix is imprecise. To alleviate the problems of computational time for Bayesian model-averaging for spatial autoregressive models, Justin M. Leach et al. (2022) [18] proposed a novel approach to using the spike-and-slab prior with the elastic net when predictors display spatial structure. The elastic net may outperform LASSO when the number of predictors far exceeds the sample size and the predictors behave strong correlations. Romina et al. (2022) [19] studied the variable selection for spatial regression models with locations on irregular lattices and errors according to Conditional or Simultaneous Autoregressive (CAR or SAR) models. The strategy is to whiten the residuals by estimating their spatial covariance matrix and then proceed by performing the standard L1-penalized regression LASSO for independent data on the transformed model. The above studies are only for spatial autoregressive models with continuous response variables. As far as we know, there are still no studies on variable selection for spatial logistic autoregression in high dimensional space data.
In this paper, we put forward a class of penalized regression estimators on the basis of quasimaximum likelihood with independent and identical distributed errors for general spatial logistic autoregressive models. Consider estimating β = ( β 1 , , β p ) T by solving the following optimization problem,
min ( β , ρ ) ln [ L ( β , ρ ) ] + 2 n j p p λ β j .
In this work, we presented a variable selection method for spatial logistic autoregressive based on the quasimaximum likelihood loss function and the SCAD penalty. This method was capable of selecting significant predictors while estimating regression coefficients. The following are the main contributions of this work.
  • We construct a variable selection method for high-dimensional spatial logistic regression model.
  • We propose a new optimization algorithm to solve the penalized spatial logistic regression model and then construct the model selection criteria to select the optimal tuning parameter.
  • We conducted specific numerical studies and verified the effectiveness of the proposed method in selecting significant variables. Numerical studies indicate that the proposed method far outperforms the comparative methods in terms of the number of correctly identified zero coefficients, the number of incorrectly identified nonzero coefficients and ME.
The outline of the remainder of this paper is as follows. Section 2 discusses the general specification of the spatial autoregression model and the spatial logistic regression model. Section 3 proposes a penalized spatial logistic autoregressive model and a optimization algorithm to solve this model. Section 4 performs a simulation study to evaluate the effect of variable selection in spatial logistic regression. Section 5 applies the model to a real example and Section 6 concludes.

2. Materials and Methods

2.1. Spatial Autoregressive Model (SAR)

Consider a network with n nodes. We can describe the structure of the network by the matrix A R n × n . Define a i j = 1 when node i follows node j, and a i j = 0 otherwise. If we have a n × 1 vector of observations on the dependent variable Y and a n × d matrix of regressors X, we can establish the following SAR model:
Y = ρ W Y + X β + ε ,
where ρ R is network autocorrelation coefficient and β = ( β 1 , . . . , β d ) T R d is the regression coefficient vector. W is the row-normalized version of A such that w i j = a i j / j = 1 n a i j . Let θ = ( ρ , β T ) T R d + 1 be the estimator and denote ε = ( ε 1 , , ε n ) T be the error vector, we assume it is i . i . d . with zero mean and finite variance σ 2 .
Denote G = I ρ W , S = Y ρ W Y X β , then we have the log likelihood function of SAR model:
ln L ( θ , σ 2 ) = n 2 ln ( 2 π ) n 2 ln σ 2 + ln | G | 1 2 σ 2 S T S .

2.2. Spatial Logistic Regression Model

Spatial logistic regression model is a combination of spatial autoregressive model and logistic regression model, and the response variables of logistic regression model can be binary classification or multi-classification, nevertheless, we only consider that the response variables are binary.
The model (1) can be written as:
y = ( I ρ W ) 1 ( X β + ε ) = ( I ρ W ) 1 X β + ( I ρ W ) 1 ε = H X β + e , e M V N ( 0 , Ω )
To make it easily distinguish, we use y instead of Y , where H is an ( n × n ) matrix, define H = ( I ρ W ) 1 , define the ith component of H X β as [ H X β ] i , e is an ( n × 1 ) vector, define e = ( I ρ W ) 1 ε . Latent variable y has binary category which is defined as variable y:
y i = 1 ,   for   y i > 0 0 ,   for   y i 0
Therefore, the probability of P y i = 1 and P y i = 0 is:
P y i = 1 X i = P y i > 0 = P [ H X β ] i + e > 0 = P e [ H X β ] i = 1 1 + exp [ H X β ] i
P y i = 0 X i = P y i 0 = P [ H X β ] i + e 0 = P e > [ H X β ] i = 1 P e [ H X β ] i = 1 1 1 + exp [ H X β ] i
When we assume the mean value of e is 0 and the variance is Ω , then we get
P y i = 1 = 1 1 + exp [ H X β ] i Ω i i
where Ω i i is diagonal element of Ω , which is formed as Ω = ( I ρ W ) T ( I ρ W ) 1 , so the same can be obtained P y i = 0 .
The estimation of spatial logistic regression parameters can be acquired by maximum likelihood estimation (MLE). The parameter is estimated by maximizing likelihood function of random variable y i , which follow Bernoulli distribution:
L ( β , ρ ) = i = 1 n 1 1 + exp [ H X β ] i Ω i i y i 1 1 1 + exp [ H X β ] i Ω i i 1 y i
Then, the likelihood function is transformed by natural log( l n ) as follows:
ln [ L ( β , ρ ) ] = i = 1 n y i ln 1 1 + exp [ H X β ] i Ω i i + i = 1 n 1 y i ln 1 1 1 + exp [ H X β ] i Ω i i
To estimate β , we use the maximization Formula (9), then define β ^ = a r g m a x ln [ L ( β , ρ ) ] .

3. Results

3.1. Variable Selection with Linear Constraints

In this section, we consider the variable selection of high-dimensional spatial logistic regression model. The objective function of the model consists of the likelihood function of the Spatial Logistic regression Model ( ln [ L ( β , ρ ) ] ) and the penalty function.By minimizing the objective function, we can get the estimated parameters.
We will study the variable selection of high-dimensional spatial logistic regression model:
( β ^ , ρ ^ ) = arg min ( β , ρ ) ln [ L ( β , ρ ) ] + 2 n j p p λ β j .
where p λ ( ) is the penalty function, the shrinkage degree of penalty is determined by the tune parameter λ in the penalty term. Some possible choices include:
(1)
the LASSO penalty with p λ ( t ) = λ | t | ;
(2)
the SCAD penalty with p λ ( t ) = λ 0 | t | min 1 , ( a t / λ ) + / ( a 1 ) d t , a > 2 where v + denotes its positive part, that is, v I ( v 0 ) ;
(3)
the MCP with p λ ( t ) = λ 0 | t | ( 1 t / ( λ a ) ) + d t , a > 1 .
Fan and Li (2001) [12] use unbiasedness, sparsity and continuity to evaluate penalty functions. LASSO is not unbiased, and MCP calculation is relatively complex. Fan and Li (2001) [12] pointed out that LASSO does not have the properties of Oracle, but SCAD has them. Compared with ridge regression, SCAD method reduces the prediction variance of the model. At the same time, compared with LASSO, SCAD method reduces the deviation of parameter estimation, so it has received extensive attention. so we choose to use SCAD penalty here.

3.2. A Feasible Algorithm

This is a nonconcave optimization problem that maximizes the penalty likelihood function Q ( θ ) . In the study of classical linear regression models, some kinds of algorithms have been developed to find and compute the local solutions of nonconcave penalized function, such as local quadratic approximation (LQA) algorithm (Fan, 2001 [12]), local linear approximation (LLA) algorithm and coordinate descent algorithm. Unfortunately, owing to the spatial correlation of the model, we find that the algorithm mentioned above unable to be straight forward calculate the minimum value of the nonconcave penalized likelihood function Q ( θ ) . Therefore, we propose a new iterative algorithm:
Step 1. Initialize θ ( 0 ) = σ ( 0 ) , ρ ( 0 ) , β ( 0 ) .
Step 2. Update σ ( m + 1 ) = arg min σ ( 0 , ) l 1 ( σ ) = ln L n σ , ρ ( m ) , β ( m ) .
Step 3. Update ρ ( m + 1 ) = arg min ρ ( 1 , 1 ) l 2 ( ρ ) = Q σ ( m + 1 ) , ρ , β ( m ) .
Step 4. Update β ( m + 1 ) = arg min β R k l 3 ( β ) = Q σ ( m + 1 ) , ρ ( m + 1 ) , β .
Step 5. Iterate Step 2 to Step 4 until convergence and denote the final estimator of σ 2 , ρ , β as σ ^ 2 , ρ ^ , β ^ , then θ ^ = σ ^ 2 , ρ ^ , β ^ T T .
In Steps 2 and 3, since they are both one-dimensional nonlinear optimization problems, they can be solved by the Brent method (Press et al. known). Therefore, the model (1) can be written as the following linear model:
Y n = X n β + E n ,
where Y n = Y n ρ W n Y n . Thence, we can apply the LQA algorithm in the classic linear regression model to accomplish this step. We also need to determine the tuning parameters a and λ in the SCAD function. Here, we accept the suggestion of Fan and Li (2001) and set a = 3.7 .

3.3. The Selection of Tuning Parameter

Based on the above, we chose to use SCAD penalty. The penalty function is defined as:
p λ ( | β | ) = λ β j , 0 β j < λ , β j 2 2 a λ β j + λ 2 / ( 2 a 2 ) , λ β j < a λ , ( a + 1 ) λ 2 / 2 , β j a λ ,
where λ 0 and a > 2 are tune parameters. In the article of Fan and Li (2001), it is suggested that a should be 3.7, and λ determines the shrinkage strength of parameter estimation. In this paper, λ is determined by Bayesian information criterion (BIC).
The selection of tuning parameter λ is an important application of degrees of freedom. We will use Bayesian information criterion (BIC) (Schwarz (1978)) as the criteria for model selection in this paper: For determining the value of λ , we use the Bayesian information criterion
BIC ( λ ) = 2 ln L n ( θ ^ ) + α ( λ ) log n
where α ( λ ) = j = 1 k + 2 I θ ^ j 0 . Then a choice of λ is λ ^ = arg min λ { BIC ( λ ) } .

4. Simulation Studies

In former sections, we put forward an advanced spatial logistic regression model, and here use R code to fulfil Monte Carlo simulations to assess and test the performance of variable selection of this model. The data for the simulated experiments are originated from Formula (1), in which the covariates are identified as following a ( q + 3 ) -dim normal distribution with zero mean and a covariance matrix σ i j . Its concrete expression is σ i j = 0.5 | i j | . We make up our mind to set the sample size n { 60 , 90 , 120 } and the number of inessential covariates q { 5 , 10 , 35 , 85 } in the subsequent simulation studies. Consequently, X can be thought of as an n × ( q + 3 ) matrix.
In regard to SAR model, the network autocorrelation coefficient ρ arises from the uniform distribution ranging from ( ρ 1 1 ) to ( ρ 1 + 1 ) . The feasible value of ρ 1 can be 0.2 , 0.5 and 0.8 , representing the spatial coefficient of different intensities. To compare model performance, we also think over setting ρ = 0 , that suggests no spatial dependence in this model, thence model (2.1) will be a classic linear model.
Additionally, let the spatial weight matrix is W = I R B m . In which, B m is ( 1 / ( m 1 ) ) 1 m · 1 m T I m , and “⊗” expresses Kronecker product. 1 m is an m-dim column vector of ones. In this formula, we premeditate m = 3 and several not the same values of R, for instance R { 10 , 20 , 30 , 40 } .
The regression coefficients are set to β = 3 , 2 , 1.6 , 0 q T , where ( β 1 , β 2 , β 3 ) is generated from a 3-dimensional normal distribution with mean vector (3, 2, 1.6) and covariance matrix 0.001 I 3 , 0 q is a zero vector of q dimension. The response variable y is given by the following formula:
y = I n ρ W 1 X β + ε n
Then we turn the response variable into category variable by the following formula:
Y i = 1 ,   for   y i > 0 0 ,   for   y i 0
Thus, the response variable Y of the binary classification is obtained. In order to verify the robustness of the model, we consider two error distributions: ε n N ( 0 , σ 2 I n ) , which is denoted as ε 0 , and Mixed Gaussian distribution: ε n 0.5 N ( 1 , 2 . 5 2 I n ) + 0.5 N ( 1 , 0 . 5 2 I n ) , denoted as ε 1 . σ 2 is generated by the Uniform distribution on the interval [ σ 1 0.1 , σ 1 + 0.1 ] , where σ 1 { 1 , 2 } . In the second case, E ( ε ) = 0 ,   M o d e ( ε ) = 1 .

4.1. Simulation Indicators

For each case, we repeat 100 times. In order to evaluate the variable selection ability of the model, we define three indicators:
  • Correct: the average number of coefficients, of the true zeros correctly set to zero;
  • Incorrect: the average number of coefficients, of the true nonzeros incorrectly set to zero;
  • ME: the mean error between the true and estimator, which is defined by;
    1 100 i = 1 100 θ i θ ^ i
  • MAD: the median absolute deviations of parameter estimation;
  • MEAN: the means of parameter estimation;
  • SD: the standard deviations of parameter estimation.

4.2. Simulation Results

For each case, the following reference is based on 100 simulations. To facilitate comparison of model results, the variable selection results by our algorithm in the SAR model are written as SLR. Meanwhile, LLA represents the variable selection results presented by the LLA algorithm, where simulated samples are arised from the classical regression model. Furthermore, to contrast the effects of different penalty functions, LASSO penalty p λ ( δ ) = λ | δ | is introduced for variable selection. In Table 1 and Table 2, it is clear that as the number of samples n increases, the accuracy of variable selection in both models gradually improves. When the sample size n is 120, the performance of the SLR model reaches the ideal state. In this case, the model has a higher “Correct” and a lower “Incorrect” and ME, which are consistent with our speculations. When the spatial effect is weak or moderate (such as ρ 1 = 0.2 and 0.5 ) or does not exist ( ρ = 0 ) , the ME of the SLR model is significantly lower than that of the LLA model, and the number of variables which are correctly selected is evidently higher than that of the LLA. These show that under low to medium intensity spatial effects, the SLR model has high accuracy, and the influence of spatial effects on the model is weakened. When we set error is ε 1 , these two models exhibit good robustness. When the spatial effect is strong ( ρ 1 = 0.8 ) , the estimations are more inexact, suggesting that neglecting spatial effect will seriously bias the estimate.
Moreover, by comparing the ME between the two penalty functions, we find that the SCAD penalty is better than the LASSO penalty by the SLR algorithm, although they get closer as the sample size increases. From the perspective of Correct and Incorrect, although LASSO has a lower error rate, the accuracy rate of SCAD is always at a high level, and its estimation error is significantly lower. In Table 1, the number of coefficients with zero values chosen correctly is close to the true value, and as the number of samples increases, the average number of coefficients with zero values incorrectly chosen is close to zero. All simulation results are consistent with theoretical analysis. However, under the same conditions, the number of correctly chosen zero coefficients for models with LASSO penalties is less than half of the true value. Under the same number of observations, the average number of correct selecting zero-value coefficients by LASSO penalty is remarkably lower than by SCAD penalty, which may imply that SCAD penalty tends to give smaller models than LASSO penalty. These results are in accordance with the studies obtained by Fan and Li (2001) [12].
In terms of the Correct and Incorrect, Table 1 and Table 2 present the results with fixed q = 5 . In this case, the sample dimension is low. To explore the performance of the models in high-dimensional situations, we set q = 10 and q = 35 . In Table 3 and Table 4, in high-dimensional cases, we found that SLR model is significantly better than LLA model. As a result of the inferior performance of the LLA model, we only present the SLR model in Table 5 in order to save space. On the whole, the variable selection effect of the model is reduced, and the number of correctly selected zero coefficients is slightly different from the true value. However, it is not difficult to see that as the sample size increases, the maximum likelihood function of the penalty with SCAD and LASSO penalties effectively reduces the complicacy of the model, and the performance of the SCAD penalty is better than that of the LASSO penalty.
To measure the robustness of the model, we not only set up two error distributions, but also calculated the median absolute deviations of different parameters. MAD is a robust statistic that is more resilient than standard deviations for the treatment of outliers in a dataset. It is a good measure of model robustness.Based on the above conclusions, we have compared the performance of the SLR model under two different penalty functions in Table 6 and Table 7. We use means, standard deviations, median absolute deviations to measure the effect of the models. In the simulation study, we find that the means and variances of the errors for the different study combinations do not differ significantly and are not shown in the tables.
According to Table 6 and Table 7, we observe that the estimate of nonzero parameter ( β 1 , β 2 , β 3 ) gradually tends to true value as the sample size n increases, indicating that our model has higher accuracy. In Table 6, we are surprised to find that the model performed better when we set ε 1 . It is reflected in lower MAD values, lower variances and parameter estimates that are closer to the true value. In contrast to Table 6, the estimates of the nonzero coefficients are generally lower than the true values in the models with the LASSO penalty, indicating that the penalty is less effective in shrinkage degree than the SCAD penalty. By analyzing the estimation results of nonzero parameters, it is found that as the spatial effect increases, the estimation of the standard deviation becomes less precise.

5. Data Example

In this section, we provide a real-world example to demonstrate the performance of the variable selection procedure proposed in this paper for spatial logistic regression models.

5.1. The Sample Data

The dataset is a collection of different types of land area data (recorded every five years) from 1954 to 2012 for the 48 states of the United States, using the Spatial Logistic regression models to analyse land utilization data. The dependent variables are binary (“1” indicates that the land utilization rate is low, that is, most of the land has not been effectively developed; “0” indicates that the land utilization rate is high, that is, most of the land has been developed and utilized). The independent variables have eight attributes, including: Cropland used for crops, Cropland used for pasture, Cropland idled, Grassland pasture and range, Forest-use land grazed, Land in rural transportation facilities, Land in urban areas and Other idle land (showed in Table 8).
The spatial weight matrix is generally set by many basic principles, including Rook and Queen Contiguity, binary distance bands, inverse distance, k-nearest neighbors and kernel weights (Yrigoyen 2013 [20]). We choose to generate the spatial weight matrix by defining the common boundary (Queen criterion), which is the contiguity-based spatial weights matrix. For sensitivity and robustness analysis, we decided to use a different spatial weight matrix (distance-based spatial weight matrix) to estimate our model (showed in Appendix A). More ordinarily, the method to establish a space matrix is as follows: if they share common boundaries, the weight is 1, otherwise the weight is zero. Furthermore, spatial weight matrices are usually row-normalised in practice.

5.2. Model Selection and Estimation

Table 9 and Table 10 presents the results of the maximum likelihood estimate via the SCAD and LASSO penalties under the fitting of a Spatial Logistic regression model. For the Spatial Logistic regression model, the penalized maximum likelihood with the SCAD penalty shows Forest-use land grazed, Cropland used for crop and Land in rural transportation facilitie are unimportant.These three variables have a small effect on land utilisation rate (the absolute value of the coefficient is less than 0.0001) and can be ignored.In Table 9, it is significant to note that Cropland idled, Grassland pasture and range, Other idle land, these three variables are significant, with Cropland idled and Other idle land being the most significant. The results of this experiment are in line with our prediction.
Comparing the effects of these two penalty functions in practice, the SCAD penalty has a more significant shrinkage degree on the model. This is mainly due to the fact that the model with the SCAD penalty is able to select most of the important variables and discard the less important ones as much as possible. However, the LASSO penalty does not perform well in this respect. We can visually see in Table 10 that there are many coefficients with small estimates, but the limitation of the shrinkage degree of LASSO penalty causes them not to be estimated as 0 (Here, we consider coefficient estimates less than 0.0001 to be judged as 0).
From the empirical results, we are able to find that the model is less stable in terms of parameter estimation. For example, in Table 9, the parameter of the variable Cropland used for pasture is obtained as 0.423 through the land utilization data of 1954. However, when we use the land utilization data of 1992, the parameter estimate for this variable is 2.82. The reason for this analysis may be that the sample size in the dataset is too small and that there are some anomalies or errors in the data.

6. Conclusions

In this paper, a spatial autoregressive model is used as the basis and a logistic regression model is combined with it to generate a spatial logistic model. Because of the potential endogeneity issue of the SAR model, we choose to apply the penalized maximum likelihood method to select significant covariates and simultaneously estimate the unknown parameters. Owing to the complexness of the penalized likelihood function, we have put forward a sort of more appropriate iterative algorithm to optimise the objective function. Both simulated experiments and the real case illustrate that our method performs well in limited samples. Additionally, we have contrasted not the same procedures of variable selection between SCAD and LASSO penalty. Comparison with LASSO, we find that SCAD is more valid and outstanding in nearly all instances in which our algorithm is used. It is unclear whether the submited method has similar results for other more nimble spatial models, incorporating parametric, nonparametric and semiparametric spatial regression models. What is more, it is possible to consider whether constraints could be introduced into this model. Nonetheless, prime framework and substantial foundation have been established and we will proceed to work on these subjects henceforth.

Author Contributions

Conceptualization, Y.S. (Yunquan Song) and J.L.; software, J.L.; resources, Y.S. (Yunquan Song) and J.L.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L., Y.C., Y.S. (Yuqi Su) and S.X.; project administration, Y.S. (Yunquan Song); funding acquisition, Y.S. (Yunquan Song). All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the Fundamental Research Funds for the Central Universities (No. 20CX05012A), NSF project (ZR2019MA016) of Shandong Province of China, Statistical research project(KT028) of Shandong Province of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the source code could be freely downloaded from https://github.com/Lmargery/SAR-Project (accessed on 20 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In the Data Example, we use the Queen criterion to generate the spatial weight matrix. For the sake of sensitivity analysis and to avoid possible biases in the model results due to the choice of a single spatial weight matrix, we decided to estimate our model with a different spatial weight matrix, the minimum distance matrix (a distance-based spatial weight matrix). The results of the model are shown in Table A1 and Table A2. It is obvious that the model can still select significant variables and verify the robustness of the model.
Table A1. Parameter estimates using penalty estimates via SCAD under a spatial logistic model.
Table A1. Parameter estimates using penalty estimates via SCAD under a spatial logistic model.
YearCLand_CCLand_PCLand_IGrass_PLand_GLand_TLand_ULand_I
19540.00411.1816−1.60010.14381.5291
19591.6730−2.23421.6182
19641.4018−1.7498−0.76981.4987
19691.4567−2.09230.00280.8858
1974−0.15481.4706−1.9466−0.02781.4508
1978−0.04341.3910−1.43760.8428
19820.0248−0.00560.7797−0.0050−0.0055−0.0761−0.03140.0037
1987−0.0119−0.00131.9232−1.89520.0100−0.0021−0.70120.8860
19921.5005−1.6278−0.87511.3734
19971.0053−1.74341.0357
20020.8515−0.8234−0.61241.4318
20071.6466−1.3704−1.28061.0864
20120.00100.8936−0.80960.0047−0.00260.8552
Table A2. Parameter estimates using penalty estimates via LASSO under a spatial logistic model.
Table A2. Parameter estimates using penalty estimates via LASSO under a spatial logistic model.
YearCLand_CCLand_PCLand_IGrass_PLand_GLand_TLand_ULand_I
1954−0.23830.5704−0.9573−0.01041.4060
1959−0.39761.0474−0.6412−0.04750.8030
19640.00860.5267−0.0309−0.0976−1.24730.9454
1969−0.63771.2268−0.70670.00190.9814
1974−0.1366−0.00790.3802−0.0814−0.71130.9834
1978−0.0578−0.19200.4416−0.0011−0.1774−0.0066−0.99521.1208
1982−0.3635−0.20770.69770.0027−0.72010.9518
1987−0.1601−1.02310.53970.6990
1992−0.0132−0.53430.2966−0.0582−0.0428−0.13221.5956
1997−0.2571−0.30830.5104−0.0752−0.66511.0377
2002−0.0231−0.87810.2262−0.3388−0.10831.0491
2007−0.17370.4287−0.6469−0.45911.2079
20120.23140.4552−0.3429−0.07100.7424

References

  1. Ord, J. Estimation methods for models of spatial interaction. J. Am. Stat. Assoc. 1975, 70, 120–126. [Google Scholar] [CrossRef]
  2. Anselin, L. Estimation Methods for Spatial Autoregressive Structures: A Study in Spat1al Econometrics; Cornell University: New York, NY, USA, 1980. [Google Scholar]
  3. Cliff, A.D.; Ord, J.K. Spatial Processes: Models & Applications; Taylor & Francis: Abingdon, UK, 1981. [Google Scholar]
  4. Lee, L.F. Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for SpatialAutoregressive Models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
  5. Ma, Y.; Pan, R.; Zou, T. A Naive Least Squares Method for Spatial Autoregression with Covariates. Stat. Sin. 2020, 30, 653–672. [Google Scholar] [CrossRef]
  6. Darmofal, D. Spatial Analysis for the Social Sciences; Cambridge University Press: New York, NY, USA, 2015. [Google Scholar]
  7. Osland, L. An application of spatial econometrics in relation to hedonic house price modelling. J. Real Estate Res. 2010, 32, 289–320. [Google Scholar] [CrossRef]
  8. Ahmar, A.S.; Aidid, M.K. Crime Modeling using Spatial Regression Approach. J. Phys. Conf. Ser. 2018, 954, 012013. [Google Scholar] [CrossRef]
  9. Islamy, U.; Novianti, A.; Hidayat, F.P.; Kurniawan, M.H.S. Application of the Spatial Autoregressive (SAR) Method in Analyzing Poverty in Indonesia and the Self Organizing Map (SOM) Method in Grouping Provinces Based on Factors Affecting Poverty. Enthusiastic Int. J. Appl. Stat. Data Sci. 2021, 1, 76–83. [Google Scholar] [CrossRef]
  10. Ver Hoef, J.M.; Peterson, E.E.; Hooten, M.B.; Hanks, E.M.; Fortin, M.J. Spatial autoregressive models for statistical inference from ecological data. Ecol. Monogr. 2018, 88, 36–59. [Google Scholar] [CrossRef]
  11. Tibshirani, R.J. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  12. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  13. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  14. Han, X.; Hsieh, C.S.; Lee, L.F. Estimation and model selection of higher-order spatial autoregressive model: An efficient Bayesian approach. Reg. Sci. Urban Econ. March 2017, 63, 97–120. [Google Scholar] [CrossRef]
  15. Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar]
  16. Pfarrhofer, M.; Piribauer, P. Flexible shrinkage in high-dimensional Bayesian spatial autoregressive models. Spat. Stat. 2019, 29, 109–128. [Google Scholar] [CrossRef]
  17. Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
  18. Leach, J.M.; Aban, I.; Yi, N. Incorporating spatial structure into inclusion probabilities for Bayesian variable selection in generalized linear models with the spike-and-slab elastic net. J. Stat. Plan. Inference 2022, 217, 141–152. [Google Scholar] [CrossRef]
  19. Gonella, R.; Bourel, M.; Bel, L. Facing spatial massive data in science and society: Variable selection for spatial models. Spat. Stat. 2022, in press. [Google Scholar] [CrossRef]
  20. Yrigoyen, C.C. GeoDaSpace: A resource for teaching spatial regression models. Rect@ 2013, 4, 119–144. [Google Scholar]
Table 1. Simulation results of variable selection via SCAD penalty function (q = 5).
Table 1. Simulation results of variable selection via SCAD penalty function (q = 5).
Methodn= 60, q = 5n= 90, q = 5n= 120, q = 5
SLRLLASLRLLASLRLLA
ρ = 0.0Correct4.75003.06004.75002.87004.80003.0900
ε 0 Incorrect0.03000.03000.09000.03000.03000.0000
ME0.544339.7100.566727.5230.41897.7846
ρ = 0.0Correct4.77002.66004.83002.78004.84003.6100
ε 1 Incorrect0.03000.03000.05000.02000.00000.0000
ME0.526726.0150.508916.6970.43985.2516
ρ 1 = 0.2Correct4.77002.54004.74003.05004.83003.5100
ε 0 Incorrect0.02000.02000.05000.02000.01000.0000
ME0.662926.5530.571413.2380.52264.7464
ρ 1 = 0.2Correct4.73002.68004.75003.24004.85003.7800
ε 1 Incorrect0.05000.03000.07000.00000.02000.0000
ME0.620420.9370.58625.39930.51802.5285
ρ 1 = 0.5Correct4.47002.65004.75003.39004.76004.1400
ε 0 Incorrect0.03000.03000.02000.03000.01000.0000
ME1.481210.2991.16711.87351.21501.2959
ρ 1 = 0.5Correct4.54002.88004.77003.68004.74004.3900
ε 1 Incorrect0.07000.03000.04000.03000.01000.0100
ME1.35256.71711.19891.60301.21301.2991
ρ 1 = 0.8Correct2.04003.40003.69004.21003.90004.7600
ε 0 Incorrect0.04000.32000.01000.27000.01000.2300
ME6.44072.24735.65742.14685.78832.1607
ρ 1 = 0.8Correct3.28003.42003.36004.35003.80004.7100
ε 1 Incorrect0.03000.33000.01000.36000.02000.2600
ME6.48772.29875.93242.25575.74592.1878
Table 2. Simulation results of variable selection via LASSO penalty function (q = 5).
Table 2. Simulation results of variable selection via LASSO penalty function (q = 5).
Method n = 60 , q = 5 n = 90 , q = 5 n = 120 , q = 5
SLRLLASLRLLASLRLLA
ρ = 0.0Correct2.28000.80002.40001.07002.58001.4000
ε 0 Incorrect0.00000.00000.00000.00000.00000.0000
ME0.931520.3990.658313.6100.46137.1323
ρ = 0.0Correct2.27000.78002.09001.16002.16001.8800
ε 1 Incorrect0.00000.00000.00000.00000.00000.0000
ME1.055115.2740.68507.67180.61053.4521
ρ 1 = 0.2Correct2.30000.80002.52001.19002.50001.7200
ε 0 Incorrect0.00000.00000.00000.00000.00000.0000
ME0.973516.2720.67768.54730.46234.6528
ρ 1 = 0.2Correct2.34000.87002.26001.61002.15001.9100
ε 1 Incorrect0.00000.00000.00000.00000.00000.0000
ME1.059913.3320.68794.59130.56182.4616
ρ 1 = 0.5Correct2.52001.08002.63001.73002.56002.2600
ε 0 Incorrect0.01000.01000.00000.00000.00000.0000
ME1.00385.67600.74661.81780.71391.2749
ρ 1 = 0.5Correct2.35001.20002.51001.81002.48002.3700
ε 1 Incorrect0.00000.01000.00000.00000.00000.0100
ME1.14045.17900.77681.56650.70141.2904
ρ 1 = 0.8Correct2.27001.47002.26002.20002.31002.8300
ε 0 Incorrect0.18000.09000.09000.06000.03000.0200
ME2.47802.23622.40002.15202.43632.1664
ρ 1 = 0.8Correct2.24001.59002.32002.39002.40002.9700
ε 1 Incorrect0.19000.05000.09000.06000.04000.0500
ME2.43282.29052.20992.25882.32562.1934
Table 3. Simulation results of variable selection via SCAD penalty function ( q = 35 ).
Table 3. Simulation results of variable selection via SCAD penalty function ( q = 35 ).
Method n = 60 , q = 35 n = 90 , q = 35 n = 120 , q = 35
SLRLLASLRLLASLRLLA
ρ = 0.0Correct29.350022.040032.100028.670032.850032.0000
ε 0 Incorrect0.00000.00000.00000.00000.00000.0000
ME0.98021.36590.69390.90540.63880.7051
ρ = 0.0Correct27.220018.290028.130024.930030.310029.1100
ε 1 Incorrect0.03000.00000.00000.00000.00000.0000
ME1.15011.73820.92681.15630.71810.8876
ρ 1 = 0.2Correct28.050019.740030.700026.370031.650030.1600
ε 0 Incorrect0.00000.00000.00000.00000.00000.0000
ME1.36211.60030.87651.06870.77470.8413
ρ 1 = 0.2Correct27.520017.040028.220023.000030.460027.3400
ε 1 Incorrect0.00000.00000.00000.00000.02000.0000
ME1.50501.94081.05881.29880.92021.0093
ρ 1 = 0.5Correct22.900010.740027.000015.220033.800018.4600
ε 0 Incorrect0.00000.01000.00000.00000.01000.0000
ME3.27593.45032.88572.37481.80031.9243
ρ 1 = 0.5Correct21.10009.970027.150014.100030.170017.3200
ε 1 Incorrect0.00000.00000.00000.00000.00000.0000
ME3.26553.71522.30882.54362.01822.0512
ρ 1 = 0.8Correct12.83003.360015.67004.900022.50006.2600
ε 0 Incorrect0.16000.01000.00000.01000.01000.0000
ME11.832412.777611.14469.20037.37227.8126
ρ 1 = 0.8Correct12.65003.170017.45004.840021.11006.2300
ε 1 Incorrect0.05000.03000.01000.02000.11000.0000
ME11.590013.15409.49169.42307.08057.9880
Table 4. Simulation results of variable selection via LASSO penalty function ( q = 35 ).
Table 4. Simulation results of variable selection via LASSO penalty function ( q = 35 ).
Method n = 60 , q = 35 n = 90 , q = 35 n = 120 , q = 35
SLRLLASLRLLASLRLLA
ρ = 0.0Correct26.590021.240030.280016.720031.400014.4300
ε 0 Incorrect0.11000.00000.07000.00000.01000.0000
ME3.42546.41003.424110.10523.414313.2199
ρ = 0.0Correct26.260019.740030.170014.370031.260011.3600
ε 1 Incorrect0.14000.00000.05000.00000.06000.0000
ME3.44596.59853.432011.57423.421816.0966
ρ 1 = 0.2Correct26.180020.460030.180015.000031.400012.0700
ε 0 Incorrect0.08000.00000.12000.00000.03000.0000
ME3.43866.51983.439410.84553.418114.9343
ρ 1 = 0.2Correct25.860019.260029.920013.090031.61009.9100
ε 1 Incorrect0.12000.01000.08000.00000.02000.0000
ME3.45006.62663.447211.86483.428017.3861
ρ 1 = 0.5Correct25.190016.150029.36009.610031.00005.7400
ε 0 Incorrect0.27000.05000.17000.00000.07000.0000
ME3.50798.23653.483217.22513.481724.5215
ρ 1 = 0.5Correct25.310015.460029.16008.810030.53005.4500
ε 1 Incorrect0.22000.08000.16000.02000.08000.0000
ME3.50328.30323.493218.59773.492524.0086
ρ 1 = 0.8Correct24.830011.590028.64004.200030.33007.5600
ε 0 Incorrect0.54000.23000.38000.02000.30000.0100
ME3.614912.56543.589127.86453.591812.1578
ρ 1 = 0.8Correct24.800011.740028.58003.250030.26007.9300
ε 1 Incorrect0.51000.24000.43000.02000.28000.0200
ME3.614613.14173.600028.79483.594611.8813
Table 5. Simulation results of variable selection when the number of components of β is 10.
Table 5. Simulation results of variable selection when the number of components of β is 10.
Method n = 60 , q = 10 n = 90 , q = 10 n = 120 , q = 10
SCADLASSOSCADLASSOSCADLASSO
ρ = 0.0Correct7.66005.18007.60005.39007.49005.2300
ε 0 Incorrect0.01000.00000.03000.00000.02000.0000
ME0.54530.84220.52490.55100.47910.4343
ρ = 0.0Correct7.12004.52007.41004.74007.48004.6100
ε 1 Incorrect0.05000.00000.05000.00000.05000.0000
ME0.64081.04020.52470.74710.54180.5791
ρ 1 = 0.2Correct7.31005.10007.37005.24007.48005.2300
ε 0 Incorrect0.03000.00000.06000.00000.01000.0000
ME0.59600.86430.58560.56800.52500.4817
ρ 1 = 0.2Correct6.91004.16007.19004.62007.45004.6500
ε 1 Incorrect0.10000.00000.07000.00000.04000.0000
ME0.82971.08630.59880.73300.55980.5998
ρ 1 = 0.5Correct6.76004.48007.08004.94007.00005.1300
ε 0 Incorrect0.03000.01000.00000.00000.01000.0000
ME1.38271.07271.15090.82801.25360.8056
ρ 1 = 0.5Correct6.50004.71007.04004.70007.21004.6100
ε 1 Incorrect0.05000.01000.01000.00000.02000.0000
ME1.53471.16681.12420.83541.27820.8670
ρ 1 = 0.8Correct5.17003.32005.34004.07005.34004.0700
ε 0 Incorrect0.00000.13000.00000.05000.04000.0300
ME6.95273.21766.04242.91945.74292.9715
ρ 1 = 0.8Correct4.84003.41005.29003.99005.42003.8900
ε 1 Incorrect0.04000.17000.01000.05000.01000.0300
ME7.00233.35276.03562.87645.91723.1053
Table 6. Standard deviations and means of estimators of the nonzero regression coefficients.
Table 6. Standard deviations and means of estimators of the nonzero regression coefficients.
Method n = 60 , q = 5 n = 90 , q = 5 n = 120 , q = 5
MADMEANSDMADMEANSDMADMEANSD
SCAD β 1 0.24553.18680.35180.26363.20120.36040.20593.15600.2301
ρ 1 = 0.0 β 2 0.25962.13910.33670.17002.12220.31500.20032.12570.2507
ε 0 β 3 0.25451.67100.42750.32151.56040.56280.18561.60820.3618
σ 0.0697 0.0739 0.0771
SCAD β 1 0.32783.10610.31400.21933.11430.28180.20973.14530.3095
ρ 1 = 0.0 β 2 0.29802.09980.31970.25062.10970.35470.26002.11160.2416
ε 1 β 3 0.30921.64160.39990.25281.59300.41370.23821.68050.2548
σ 1 0.0697 0.0739 0.0771
σ 2 0.0602 0.0605 0.0848
SCAD β 1 0.34053.31750.47820.23453.25380.29980.24243.28240.4111
ρ 1 = 0.2 β 2 0.32152.20350.36700.29282.12810.30420.19882.21570.3676
ε 0 β 3 0.27291.72620.40740.28501.62970.45490.23291.71270.3269
σ 0.0697 0.0739 0.0771
SCAD β 1 0.30663.19770.35510.36343.21550.34510.27033.22360.3035
ρ 1 = 0.2 β 2 0.32182.12850.33400.27262.15410.30930.24472.11870.2909
ε 1 β 3 0.29001.63810.49310.23871.55750.48640.24331.69990.3470
σ 1 0.0697 0.0739 0.0771
σ 2 0.0602 0.0605 0.0848
SCAD β 1 0.58873.85620.99850.48643.79800.50110.43983.78950.5038
ρ 1 = 0.5 β 2 0.53122.59570.65530.42912.45810.45260.42172.57490.4995
ε 0 β 3 0.47642.07050.76080.40201.94870.46940.37181.98450.4893
σ 0.0697 0.0739 0.0771
SCAD β 1 0.60803.76180.69170.53053.77690.59670.44133.81220.4579
ρ 1 = 0.5 β 2 0.48642.53550.56070.41282.39890.55580.45262.51710.4376
ε 1 β 3 0.48591.84530.68940.49851.87540.53350.33392.05430.4598
σ 1 0.0697 0.0697 0.0771
σ 2 0.0602 0.0602 0.0848
SCAD β 1 2.01797.04422.37901.50727.07722.39101.73517.16792.1987
ρ 1 = 0.8 β 2 1.97504.95012.24121.31114.28761.51231.14714.60021.8087
ε 0 β 3 1.65583.71852.23761.15593.64191.52321.18653.87101.5290
σ 0.0697 0.0739 0.0771
SCAD β 1 2.11707.21012.58881.93997.14632.45961.83367.18322.1051
ρ 1 = 0.8 β 2 1.87404.95942.12581.40014.51721.66631.28854.61231.7462
ε 1 β 3 1.77713.69422.09291.35433.60271.59211.22163.70601.4169
σ 1 0.0697 0.0697 0.0771
σ 2 0.0602 0.0602 0.0848
Table 7. Standard deviations and means of estimators of the nonzero regression coefficients.
Table 7. Standard deviations and means of estimators of the nonzero regression coefficients.
Method n = 60 , q = 5 n = 90 , q = 5 n = 120 , q = 5
MADMEANSDMADMEANSDMADMEANSD
LASSO β 1 0.40582.46380.32340.30582.86030.35330.26662.94530.2548
ρ 1 = 0.0 β 2 0.43861.61100.34710.26651.79700.27710.19431.94270.2309
ε 0 β 3 0.32681.24910.32280.28531.38140.32170.21211.50100.2249
σ 0.0697 0.0739 0.0771
LASSO β 1 0.31992.40470.32500.31592.71760.29290.31142.76470.3266
ρ 1 = 0.0 β 2 0.37261.53600.37280.31921.78680.28970.29941.81350.2906
ε 1 β 3 0.36041.19140.34400.28471.32430.30180.25311.43560.2365
σ 1 0.0697 0.0739 0.0771
σ 2 0.0602 0.0605 0.0848
LASSO β 1 0.32472.49770.35870.33012.83980.29920.23473.02820.2689
ρ 1 = 0.2 β 2 0.34061.59420.36750.33321.79300.35260.21081.99340.2039
ε 0 β 3 0.28051.22110.33870.30171.37840.30360.26911.51150.2332
σ 0.0697 0.0739 0.0771
LASSO β 1 0.34362.43670.36800.34212.71360.30990.31852.88730.3071
ρ 1 = 0.2 β 2 0.37951.54980.36940.32001.77390.29540.25431.84480.2794
ε 1 β 3 0.38471.17310.38620.26991.31890.26420.27471.48650.2548
σ 1 0.0697 0.0739 0.0771
σ 2 0.0602 0.0605 0.0848
LASSO β 1 0.40182.79720.51570.38953.17330.37370.30603.38720.3614
ρ 1 = 0.5 β 2 0.50351.71610.42790.43421.95380.41560.27792.14100.3373
ε 0 β 3 0.53461.26900.49650.37981.43830.38570.27611.66480.2872
σ 0.0697 0.0739 0.0771
LASSO β 1 0.45142.65790.55360.47693.02530.43310.33433.22220.3860
ρ 1 = 0.5 β 2 0.49321.61280.52260.39601.88480.37540.34722.03640.3742
ε 1 β 3 0.43781.13780.43170.33151.39100.38360.34601.61780.3657
σ 1 0.0697 0.0739 0.0771
σ 2 0.0602 0.0605 0.0848
LASSO β 1 1.17583.28461.33611.14064.38001.88180.79714.60601.8911
ρ 1 = 0.8 β 2 1.09941.94831.36841.01622.15041.15290.80622.47701.2061
ε 0 β 3 1.03691.20041.37971.04461.31170.94790.76051.70020.8070
σ 0.0697 0.0739 0.0771
LASSO β 1 0.99963.47371.50041.06904.21061.41991.13674.52261.7562
ρ 1 = 0.8 β 2 1.14221.79391.42251.13692.06501.08780.89462.35021.0537
ε 1 β 3 0.95460.93011.01040.99341.28200.98650.64331.87590.9692
σ 1 0.0697 0.0739 0.0771
σ 2 0.0602 0.0605 0.0848
Table 8. Summary of variables.
Table 8. Summary of variables.
Variable NameDescription
CLand_CCropland used for crops
CLand_PCropland used for pasture
CLand_ICropland idled
Grass_PGrassland pasture and range
Land_GForest-use land grazed
Land_TLand in rural transportation facilities
Land_ULand in urban areas
Land_IOther idle land
Table 9. Parameter estimates using penalty estimates via SCAD under a spatial logistic model.
Table 9. Parameter estimates using penalty estimates via SCAD under a spatial logistic model.
YearCLand_CCLand_PCLand_IGrass_PLand_GLand_TLand_ULand_I
19540.98200.42300.6020
19591.7000−1.74000.30501.2700
19641.2800−0.61500.8920
19691.55000.7150−1.4900−0.38501.0600
19740.85401.52001.3900−2.2600−0.5960−0.65101.5200
19780.66301.9000−1.7200−0.2490−0.28700.61401.2600
19820.30201.45001.1200−1.1200−0.98401.1400
19871.15000.6640−1.11001.2500
19920.98100.17402.8200−0.9490−2.41001.8000
19970.56800.26202.7500−0.8770−1.93001.8700
20022.42001.6100−1.2100−1.00002.3900
20071.30000.85201.7600−1.2000−0.38302.4700
20122.5800−0.4680−0.79901.0200
Table 10. Parameter estimates using penalty estimates via LASSO under a spatial logistic model.
Table 10. Parameter estimates using penalty estimates via LASSO under a spatial logistic model.
YearCLand_CCLand_PCLand_IGrass_PLand_GLand_TLand_ULand_I
19540.70700.6400−0.45100.29300.1510
19590.06070.89500.0101−0.36600.15500.00850.4150
1964−0.00190.00071.3279−0.2782−0.0156−0.00100.3816
19690.00100.14300.36490.03330.07970.00350.8547
1974−0.24850.65340.2852−0.0021−0.0021−0.00470.7925
19780.07970.24190.28341.3200
1982−0.1152−0.08990.21290.0859−0.05490.0592−0.17921.9704
19870.04031.01580.03060.01480.03741.4981
19920.30980.55010.08060.4132−0.18160.8606
1997−0.00180.14441.0449−0.0427−0.10670.6831
2002−0.08161.2938−0.01040.02480.01800.6495
20070.2985−0.0101−0.01090.48340.00630.34950.02310.6077
20120.01320.00140.50380.01450.04200.00511.0461
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liang, J.; Cheng, Y.; Su, Y.; Xiao, S.; Song, Y. Variable Selection for Spatial Logistic Autoregressive Models. Mathematics 2022, 10, 3095. https://doi.org/10.3390/math10173095

AMA Style

Liang J, Cheng Y, Su Y, Xiao S, Song Y. Variable Selection for Spatial Logistic Autoregressive Models. Mathematics. 2022; 10(17):3095. https://doi.org/10.3390/math10173095

Chicago/Turabian Style

Liang, Jiaxuan, Yi Cheng, Yuqi Su, Shuyue Xiao, and Yunquan Song. 2022. "Variable Selection for Spatial Logistic Autoregressive Models" Mathematics 10, no. 17: 3095. https://doi.org/10.3390/math10173095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop