# g.ridge: An R Package for Generalized Ridge Regression for Sparse and High-Dimensional Linear Models

## Abstract

## 1. Introduction

## 2. Ridge Regression and Generalized Ridge Regression

#### 2.1. Linear Regression

#### 2.2. Ridge Regression

#### 2.3. Generalized Ridge Regression

#### 2.4. Significance Test

## 3. R Package: g.ridge

#### 3.1. Generating Data

#### 3.2. Performing Regression

#### 3.3. Technical Remarks on Centering and Standardization

## 4. Simulations

#### 4.1. Simulation Settings

**I**) $b=d=5$; (

**II**) $b=d=10$; (

**III**) $b=5$, $d=-5$; (

**IV**) $b=10$, $d=-10$. Errors $\mathit{\epsilon}$ were generated independently from the normal distribution or the skew-normal distribution [37]; both distributions had a mean of zero and standard deviation of one, and the skew-normal distribution had a slant parameter of ten (alpha = 10 in the R function “rsn(.)”). Figure 3 shows the remarkable difference between the two distributions. The skew-normal distribution was not previously examined in the simulation setting of Yang and Emura [15].

#### 4.2. Simulation Results

**I**)–(

**IV**) and two error distributions (normal and skew-normal). In conclusion, the generalized ridge estimator in the proposed R package seems to be the most recommended estimator for data with sparse and high-dimensional settings.

## 5. Data Analysis

^{−3}/µL; C-reactive protein, mg/dL). The responses and regressors are centered and standardized before fitting the linear model, as explained in Section 2.1 and Section 3.3.

## 6. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. GCV Function

**Figure 1.**Examples for generating design matrices using “X.mat(.)” [15].

**Figure 2.**The R code and output for calculating the ridge estimator using “g.ridge(.)”. The red circle in the graph shows the minimum value at $\widehat{\lambda}=31.66314$.

**Figure 3.**The histogram of the normal and skew-normal distributions (the slant parameter was ten; alpha = 10 in the R function “rsn(.)”). Both distributions have a mean of 0 and standard deviation of 1.

**Figure 4.**Centered responses $\mathit{y}-({\sum}_{i=1}^{n}{y}_{i}/n)\mathbf{1}$ against the predictors $X\widehat{\mathit{\beta}}$ based on the ridge estimator and generalized ridge estimator applied to the intracerebral hemorrhage dataset. The red lines were obtained by least squared estimation.

**Figure 5.**The residuals $\mathit{y}-({\sum}_{i=1}^{n}{y}_{i}/n)\mathbf{1}-X\widehat{\mathit{\beta}}$ for the generalized ridge estimator applied to a dataset on patients with intracerebral hemorrhage.

**Table 1.**The total mean squared error (TMSE) of the three estimators: (i) the ridge by “g.ridge(.)”, (ii) the generalized (g-) ridge by “g.ridge(.)”, and (iii) the ridge by “glmnet(.)”. The TMSE is computed by a Monte Carlo average over 500 replications.

Error Distribution | Regression Coefficients | $\mathit{p}$ | (i) ridge | (ii) g-ridge | (iii) glmnet |
---|---|---|---|---|---|

Normal | (I) $b=d=5$ | 50 | 0.463 | 0.385 | 0.306 |

100 | 0.950 | 0.682 | 2.182 | ||

150 | 1.146 | 0.658 | 1.996 | ||

200 | 1.520 | 0.920 | 2.199 | ||

(II) $b=d=10$ | 50 | 0.855 | 0.681 | 0.545 | |

100 | 2.151 | 1.562 | 8.688 | ||

150 | 3.008 | 1.482 | 7.904 | ||

200 | 4.929 | 2.687 | 8.691 | ||

(III) $b=5$ and $d=-5$ | 50 | 0.602 | 0.539 | 0.388 | |

100 | 0.990 | 0.628 | 2.025 | ||

150 | 1.219 | 0.703 | 2.132 | ||

200 | 1.589 | 0.953 | 2.226 | ||

(IV) $b=10$ and $d=-10$ | 50 | 1.541 | 1.290 | 0.737 | |

100 | 2.398 | 1.580 | 8.046 | ||

150 | 3.231 | 1.614 | 8.434 | ||

200 | 4.651 | 2.770 | 8.804 | ||

Skew-normal | (I) $b=d=5$ | 50 | 0.440 | 0.361 | 0.294 |

100 | 0.957 | 0.670 | 2.182 | ||

150 | 1.162 | 0.678 | 2.000 | ||

200 | 1.500 | 0.910 | 2.197 | ||

(II) $b=d=10$ | 50 | 0.821 | 0.655 | 0.527 | |

100 | 2.285 | 1.705 | 8.691 | ||

150 | 3.021 | 1.509 | 7.905 | ||

200 | 4.883 | 2.673 | 8.686 | ||

(III) $b=5$ and $d=-5$ | 50 | 0.576 | 0.519 | 0.376 | |

100 | 0.974 | 0.622 | 2.029 | ||

150 | 1.233 | 0.721 | 2.137 | ||

200 | 1.582 | 0.949 | 2.243 | ||

(IV) $b=10$ and $d=-10$ | 50 | 1.504 | 1.273 | 0.720 | |

100 | 2.449 | 1.508 | 8.054 | ||

150 | 3.224 | 1.616 | 8.453 | ||

200 | 4.618 | 2.731 | 8.860 |

**Table 2.**Fitted results for estimated regression coefficients (only with p-value < 0.05) sorted by p-values applied to a dataset on patients with intracerebral hemorrhage.

Ridge | Generalized Ridge | |||||
---|---|---|---|---|---|---|

${\widehat{\mathit{\beta}}}_{\mathit{j}}$ | SE | p-Value | ${\widehat{\mathit{\beta}}}_{\mathit{j}}$ | SE | p-Value | |

Lactate dehydrogenase | 0.122 | 0.047 | 0.008 | 0.145 | 0.055 | 0.008 |

Gamma-GT | 0.116 | 0.048 | 0.016 | 0.143 | 0.056 | 0.010 |

Respiratory rate | −0.120 | 0.052 | 0.020 | −0.140 | 0.059 | 0.018 |

Prothrombin time | 0.077 | 0.036 | 0.031 | 0.083 | 0.040 | 0.038 |

Blood platelet count | −0.100 | 0.049 | 0.040 | −0.114 | 0.056 | 0.044 |

C-reactive protein | None | None | >0.05 | 0.112 | 0.057 | 0.049 |

