# Optimizing the Predictive Ability of Machine Learning Methods for Landslide Susceptibility Mapping Using SMOTE for Lishui City in Zhejiang Province, China

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Area

^{2}, which is composed of mountainous areas (88.42%); cultivated land (5.25%); and streams, roads, and villages (collectively 6.06%). Lishui City is located within a typical subtropical monsoonal region with warm summers and cold winters. This city has a mean annual temperature of 17.8 °C, while the historical maximum and minimum temperatures are 43.2 °C and −10.7 °C, respectively. The average annual precipitation is 1568.4 mm, which generally decreases from south to north and ranges between 1350 mm and 2200 mm. Eighty percent of the annual rainfall occurs from March to September.

^{2}.

#### 2.2. Datasets

#### 2.2.1. Landslide Inventory

^{2}, while the largest landslide area is 40,000 m

^{2}, and the average area is 8605 m

^{2}. For small landslides, a single point per landslide has been proven to be effective in landslide susceptibility mapping [19,39]; therefore, all landslides in this study are represented by a single dot based on this conception.

#### 2.2.2. Landslide-Causing Factors

#### 2.3. Methodology

#### 2.3.1. Selecting the Landslide Conditioning Factors

_{1,}x

_{2...}x

_{n}} is a finite set of objects called the universe, A is the set of condition attributes, and D is the decision attribute.

**Definition**

**1.**

**Definition**

**2.**

**Definition**

**3.**

_{1,}X

_{2},…, X

_{N}by a decision D, B ⊆ A generates a neighborhood relation N

_{B}over U, and the lower and upper approximations of D with respect to the attributes B are defined as

_{i}in feature space B, and it can be computed using a distance function. More details on the NRS method can be found in Hu [50].

#### 2.3.2. Preparation of the Training and Validation Datasets

_{0}, its K-nearest neighbors are filtered by the smallest Euclidean distance from the feature space of the original sample, and one of them is randomly chosen (x

_{r}), where K is a manually input hyperparameter. The new synthetic SMOTE sample is defined as

#### 2.3.3. Slope Unit Delineation based on Terrain Curvature

^{2}, while size of the largest SU is 538,892 m

^{2}; the average area is 18,780 m

^{2}. These SUs are small enough to capture the spatial characteristics of landslides and large enough to reduce the computational complexity. Subsequently, all values of the landslide conditioning factors were calculated for each SU from the raster layers. The average values among all the grids in an SU represent the values of the corresponding continuous factors and the mode for the corresponding categorical factors.

#### 2.3.4. Support Vector Machine (SVM)

#### 2.3.5. Logistic Regression (LR)

_{i}is the i-th explanatory variable, β

_{0}is a constant, β

_{i}is the i-th regression coefficient. and e is the error. The probability (p) of the occurrence of y is

#### 2.3.6. Artificial Neural Network (ANN)

#### 2.3.7. Random Forest (RF)

#### 2.3.8. Evaluation and Comparison of Landslide Susceptibility Models

_{o}is the relative observed agreement and p

_{e}is the hypothetical probability of chance agreement.

_{i}and B

_{i}are the number of landslide SUs and the total number of SUs, respectively, in the i-th landslide susceptibility zone.

## 3. Results

#### 3.1. Elimination of Landslide Affecting Factors

#### 3.2. Performances of the Landslide Models

#### 3.3. Development of Landslide Susceptibility Maps

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

**Figure 3.**Thematic maps of the landslide-causing factors: (

**a**) slope; (

**b**) elevation; (

**c**) aspect; (

**d**) curvature; (

**e**) plan curvature elevation; (

**f**) profile curvature; (

**g**) distance to faults; (

**h**) distance to rivers; (

**i**) distance to roads; (

**j**) earthquake influence; (

**k**) annual precipitation in the wet season; (

**l**) annual precipitation in the dry season; (

**m**) annual precipitation; (

**n**) annual torrential rain days; (

**o**) land use; (

**p**) engineering geological type; (

**q**) NDVI; (

**r**) TWI; (

**s**) TRI; and (

**t**) TST.

**Figure 5.**Comparison of the effects of slope units (SUs) using two methods in a certain section of Lishui City: (

**a**) improved method and (

**b**) hydrology analysis-based method.

**Figure 6.**Explanation of the support vector machine (SVM principles): (

**a**) the kernel function and (

**b**) the optimal hyperplane.

**Figure 7.**Pearson’s correlation coefficients (PCCs) of the twenty initial conditioning factors. A1: slope; A2: aspect; A3: elevation; A4: curvature; A5: profile curvature; A6: plan curvature; A7: distance to faults; A8: distance to rivers; A9: distance to roads; A10: land use; A11: NDVI; A12: engineering geological type; A13: TST; A14: TRI; A15: TWI; A16: annual torrential rain days; A17: annual precipitation in the dry season; A18: annual precipitation in the wet season; A19: annual precipitation; and A20: earthquake influence.

**Figure 9.**Fitting performances: (

**a**) accuracy of each model with different training datasets; (

**b**) kappa index of each model with different training datasets; (

**c**) area under the curve (AUC) of each model with different training datasets; and (

**d**) percentage of improvement (POI) of each model between the first and 30th training datasets.

**Figure 10.**Predictive performances: (

**a**) accuracy of each model with different training datasets; (

**b**) kappa index of each model with different training datasets; (

**c**) AUC of each model with different training datasets; and (

**d**) POI of each model between the first and 30th training datasets.

**Figure 11.**Landslide susceptibility maps: (

**a**) SVM model; (

**b**) logistic regression (LR) model; (

**c**) artificial neural network (ANN) model; (

**d**) random forest (RF) model.

**Figure 12.**The very high susceptibility class areas of the landslide susceptibility maps: (

**a**) SVM model; (

**b**) LR model; (

**c**) ANN model; and (

**d**) RF model.

**Figure 13.**Different segmentation comparisons for several models using validation with different datasets (_pre means the previous segmentation and _1 and _2 mean the two different segmentations). (

**a**) SVM, (

**b**) LR, (

**c**) ANN, and (

**d**) RF.

Category | Conditioning Factors | Type | Range |
---|---|---|---|

Predisposing factors | Slope (°) | Continuous | (0, 80.39) |

Elevation (km) | Continuous | (0, 1.92) | |

Aspect | Categorical | Flat, North, West, South, Southeast, East, Northwest, Southwest, Northeast | |

Curvature | Continuous | (−26.54, 39.78) | |

Plan curvature | Continuous | (−17.76, 16.51) | |

Profile curvature | Continuous | (−25.50, 19.98) | |

Distance to faults (km) | Continuous | (0, 1.85) | |

Distance to rivers (km) | Continuous | (0, 3.22) | |

Distance to roads (km) | Continuous | (0, 5.95) | |

Land use | Categorical | Roads, Structures, Water, Planting Land, Desert and bare land, Forest and grass, Artificial heap, House building | |

Engineering geological type | Categorical | Group 1, Group 2, Group 3, Group 4, Group 5, Group 6, Group 7, Group 8, Group 9, Group 10 | |

NDVI | Continuous | (−1, 1) | |

TWI | Continuous | (0.74, 46.81) | |

TRI | Continuous | (0, 67.52) | |

TST | Continuous | (0, 100) | |

Triggering factors | Earthquake influence | Continuous | (0, 0.44) |

Annual precipitation in the wet season (mm) | Continuous | (1020.45, 1417.50) | |

Annual precipitation in the dry season (mm) | Continuous | (428.81, 588.96) | |

Annual precipitation (mm) | Continuous | (1459.57, 1990.30) | |

Annual torrential rain days (day) | Continuous | (13.23, 20.12) |

Index | PC1 | PC2 | PC3 | PC4 |
---|---|---|---|---|

Explained variance (%) | 90.434 | 5.784 | 3.780 | 0.002 |

Cumulative explained variance (%) | 90.434 | 96.219 | 99.998 | 100.000 |

Eigenvalues | 3.617 | 0.231 | 0.151 | 6.110 × 10^{−5} |

**Table 3.**The class-specific accuracies of different models (NOL: number of landslides; NOS: number of SUs; PLS: percentage of landslides to SUs (class-specific accuracy); PSS: percentage of SUs to all SUs in the study area).

Model | Index | Very Low | Low | Moderate | High | Very High |
---|---|---|---|---|---|---|

SVM | NOL | 20 | 15 | 46 | 91 | 116 |

NOS | 534,060 | 123,388 | 97,210 | 94,599 | 65,218 | |

PLS (%) | 0.0037 | 0.0122 | 0.0473 | 0.0962 | 0.1779 | |

PSS (%) | 58.4 | 13.49 | 10.63 | 10.34 | 7.14 | |

LR | NOL | 17 | 24 | 81 | 94 | 72 |

NOS | 294,866 | 210,885 | 197,711 | 158,891 | 52,122 | |

PLS (%) | 0.0058 | 0.0114 | 0.041 | 0.0592 | 0.1381 | |

PSS (%) | 32.24 | 23.06 | 21.62 | 17.38 | 5.7 | |

ANN | NOL | 3 | 3 | 9 | 36 | 237 |

NOS | 607,082 | 146,554 | 72,170 | 41,402 | 47,267 | |

PLS (%) | 0.0005 | 0.002 | 0.0125 | 0.087 | 0.5014 | |

PSS (%) | 66.39 | 16.03 | 7.89 | 4.53 | 5.16 | |

RF | NOL | 1 | 22 | 27 | 69 | 169 |

NOS | 542,796 | 193,917 | 99,872 | 42,435 | 35,455 | |

PLS (%) | 0.0002 | 0.0113 | 0.027 | 0.1626 | 0.4767 | |

PSS (%) | 59.36 | 21.21 | 10.92 | 4.64 | 3.87 |

Model | SVM | LR | ANN | RF |
---|---|---|---|---|

SVM | 1 | 0.54 | 0.57 | 0.55 |

LR | 0.54 | 1 | 0.43 | 0.49 |

ANN | 0.57 | 0.43 | 1 | 0.7 |

RF | 0.55 | 0.49 | 0.7 | 1 |

Reduced Factor | SVM | LR | ANN | RF |
---|---|---|---|---|

None | 0.79 | 0.77 | 0.82 | 0.77 |

NDVI | 0.63 | 0.63 | 0.70 | 0.62 |

Slope | 0.67 | 0.68 | 0.72 | 0.68 |

Land use | 0.68 | 0.68 | 0.72 | 0.67 |

PCI | 0.68 | 0.69 | 0.72 | 0.66 |

Elevation | 0.68 | 0.68 | 0.74 | 0.65 |

Distance to Rivers | 0.68 | 0.68 | 0.72 | 0.68 |

Aspect | 0.69 | 0.66 | 0.71 | 0.66 |

Distance to Faults | 0.69 | 0.67 | 0.72 | 0.68 |

Engineering geological type | 0.69 | 0.68 | 0.74 | 0.67 |

Distance to Roads | 0.69 | 0.69 | 0.76 | 0.66 |

Profile curvature | 0.70 | 0.67 | 0.72 | 0.66 |

TST | 0.71 | 0.70 | 0.75 | 0.66 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

