# A Novel Hybrid Method for Landslide Susceptibility Mapping-Based GeoDetector and Machine Learning Cluster: A Case of Xiaojin County, China

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Area and Data

#### 2.1.1. Study Area

^{2}, and its terrain is high in the northeast and low in the southwest. The average mountain ridge is about 4500 m with the Siguniang Mountain in the east, as high as 6250 m. The valley area is more than 3000 m and the vertical distance is 1500–2500 m.

^{3}/s and 103 m

^{3}/s, 2.9 billion m

^{3}and 1.2 billion m

^{3}, respectively. It is worth mentioning that the drop of these two rivers is very large, reaching 1960 m and 2340 m, respectively.

#### 2.1.2. Landslide Inventory Map

#### 2.1.3. Conditional Factors

- (i)
- Morphological factors

- (ii)
- Geological factors

- (iii)
- Land cover factors

- (iv)
- Hydrological factors

- (v)
- Anthropogenic factors

#### 2.2. Methods

#### 2.2.1. Conditional Factor Selection

_{h}and N are the number of strata h and global strata, respectively; ${\sigma}_{\mathrm{h}}^{2}$ and ${\sigma}^{2}$ are the variances of the dependent variable Y of strata h and the variance of the entire area, respectively.

_{1}and X

_{2}will change the explanatory power of the dependent variable Y when they work together, or the influence of these factors on $\gamma $ is independent. In the method of evaluation, the q values of X

_{1}and X

_{2}for Y: ${q\left(Y\right|X}_{1})$ and ${q\left(Y\right|X}_{2})$ are first calculated separately. Then, X

_{1}and X

_{2}are overlaid to form a new strata, and calculating the value of ${X}_{1}\cap {X}_{2}$ for Y: ${q\left(Y\right|X}_{1}\cap {X}_{2})$. Finally, the value of ${q\left(Y\right|X}_{1})$, ${q\left(Y\right|X}_{2})$, and ${q\left(Y\right|X}_{1}\cap {X}_{2})$ are compared to judge the interaction.

#### 2.2.2. Machine Learning Cluster

- (i)
- Artificial neural network (ANN)

- (ii)
- Bayesian network (BN)

- (iii)
- Logistic regression (LR)

_{i}value, ranges from 0 to 1, where 0 means that the probability of a landslide in the mapping unit i is 0, and 1 means that the probability of a landslide in the mapping unit i is 1.

- (iv)
- Support Vector Machine (SVM)

#### 2.2.3. Verification

## 3. Results

#### 3.1. Results of Conditional Select

#### 3.2. Accuracy Assessment of the Machine Learning Cluster

#### 3.3. Landslide Susceptibility Mapping

## 4. Discussion

#### 4.1. Factor-Detector and Interaction-Detector

#### 4.2. Machine Learning Cluster Performance

#### 4.3. New Contributions and Prospect of Model

## 5. Conclusions

## Appendix A. List of Acronyms

Acronym | Description |

ANN | Artificial neural network |

AUC | Area under the ROC curve |

BN | Bayesian network |

DEM | Digital elevation model |

GIS | Geographic Information System |

HAILS | Human activity intensity of land surface |

LR | Logistic regression |

LSM | Landslide susceptibility mapping |

MAE | Mean absolute error |

ML | Machine learning |

NDVI | Normalized Difference Vegetation Index |

ROC | Receiver operating characteristic |

RS | Remote Sensing |

SAGA | System for Automated Geoscientific Anal-yses |

SPI | Stream power index |

SVM | Support vector machines |

TPI | Topographic position index |

TWI | Topographic wetness index |

**Figure 1.**(

**a**) The location of the study area. (

**b**) Remote sensing images of the study area and the location of the three landslide cases. (

**c**) Landslide inventory map and elevation map of Xiaojin County. (

**d**–

**f**) The landslide cases in the study area, d, e, and f are flow, fall, and slide respectively.

**Figure 2.**Thematic maps of conditional factors. (

**a**) Elevation, (

**b**) slope, (

**c**) aspect, (

**d**) TPI, (

**e**) lithology, (

**f**) seismic density, (

**g**) fault, distance from mapping unit to the Longmen Shan Fault. (

**h**) land use, (

**i**) NDVI, (

**j**) soil erosion, (

**k**) HAILS, (

**l**) settlement, distance from mapping unit to the nearest settlement.

**Figure 5.**The q-statistic indices calculated by Factor-detector, Graphical representation of the relative contributions of potential factors to landslide formation (larger q value means greater contribution).

**Figure 6.**The interaction indices were calculated by Interaction detector (big value means strong interaction). Where a, b, …, s are settlement, elevation, aspect, fault, hails, land use, lithology, seismic density, fault, NDVI, plan curve, precipitation, river, road, slope, profile curve, soil erosion, TPI, SPI.

**Figure 7.**(

**a**) The prediction accuracy of machine learning cluster with training data and testing data. (

**b**) The receiver operator characteristics (ROC) curve of the machine learning cluster, AUC is the acronym of the area under the ROC curve.

**Figure 8.**Landslide susceptibility map of the study area, the bottom right corner of the picture is a landslide inventory map.

Cluster | Name | Data Description |
---|---|---|

Morphological | Elevation | Height above sea level |

Slope | Slope angle | |

Aspect | Slope aspect | |

Profile curve | Curvature along the slope | |

Plan curve | Curvature perpendicular to slope | |

TPI | Topographic position index | |

Geological | Lithology | Rock feature |

Seismic intensity | Magnitude of the earthquake | |

Fault | Distance to fault zone | |

Land cover | Land use | Land use |

NDVI | Normalized Difference Vegetation Index | |

Soil erosion | Hydraulic erosion and freeze-thaw erosion | |

Hydrological | Precipitation | Mean annual rainfall (1980–2010) |

River | Distance to river | |

SPI | Stream power index | |

TWI | Topographic wetness index, calculated by SAGA | |

Anthropogenic | HAILS | Human activity intensity of land surface |

Settlement | Distance to residential area | |

Road | Distance to road |

Model | Class | Pixel Number | Area (%) | Number of Landslides | Landslides (%) | SCAI |
---|---|---|---|---|---|---|

ANN | High | 140,711 | 8.23 | 317 | 51.46 | 0.16 |

Moderate | 728,149 | 42.59 | 228 | 37.01 | 1.15 | |

Low | 840,820 | 49.18 | 71 | 11.52 | 4.27 | |

BN | High | 193,365 | 11.31 | 258 | 41.88 | 0.27 |

Moderate | 661,817 | 38.71 | 263 | 42.69 | 0.91 | |

Low | 854,498 | 49.98 | 95 | 15.42 | 3.24 | |

LR | High | 135,236 | 7.91 | 325 | 52.76 | 0.15 |

Moderate | 689,856 | 40.35 | 224 | 36.36 | 1.11 | |

Low | 884,588 | 51.74 | 67 | 10.88 | 4.75 | |

SVM | High | 103,094 | 6.03 | 375 | 60.87 | 0.09 |

Moderate | 641,472 | 37.52 | 197 | 31.98 | 1.17 | |

Low | 965,114 | 56.45 | 44 | 7.14 | 7.91 |

