# Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Study Area and Data

## 3. Methodology

#### 3.1. Model Training Algorithm

#### 3.2. Over-Sampling and Under-Sampling

#### 3.3. Model Evaluation

#### 3.4. Data Processing

## 4. Results

#### 4.1. Consequences of Landslide Sensitivity Prediction

#### 4.2. Validation and Comparison of Models

## 5. Discussion

#### 5.1. Limitations or Shortcomings of This Study

- Only one unbalanced landslide dataset was used in this study, but no additional high-quality unbalanced datasets were collected for the experiments, which may limit the generalizability of the results.
- In this study, we trained three models using an unbalanced dataset and six models using a balanced dataset. With some metrics, we can visually compare the performance strengths and weaknesses of the models obtained from the training of these two datasets. However, we failed to use a suitable comprehensive metric to compare the two models, just as one cannot use the same set of rules to compare different things. This is a limitation of this study, and future research needs to explore more comprehensive metrics to evaluate the performance of the models.
- It was found that models trained on an unbalanced dataset and models trained on a downsampled balanced dataset achieved similar values for several evaluation metrics, suggesting the need to investigate the relationship between the two models in greater depth.
- All three algorithms chosen for this study are classical machine learning algorithms because they are well-interpretable compared to neural-network-based algorithms, such as Deep Residual Shrinkage Network [57] and Squeeze-and-Excitation Network [58] (SENet). Neural network algorithms change too much and are not reproducible during the learning training process, and even with the same environment and parameters, the models obtained the next time are often very different. However, the main idea of this study is to use the control variables method to highlight the influence of the dataset on the model, so the neural network-based algorithm is not applicable to this study. Nevertheless, future research could explore the possibility of using neural network algorithms in similar studies, which would help to extend the range of algorithm choices and improve the performance of the models.

#### 5.2. Future Research Directions

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth-Sci. Rev.
**2020**, 207, 103225. [Google Scholar] [CrossRef] - Azarafza, M.; Azarafza, M.; Akgün, H.; Atkinson, P.M.; Derakhshani, R. Deep learning-based landslide susceptibility mapping. Sci. Rep.
**2021**, 11, 24112. [Google Scholar] [CrossRef] [PubMed] - Nikoobakht, S.; Azarafza, M.; Akgün, H.; Derakhshani, R. Landslide susceptibility assessment by using convolutional neural network. Appl. Sci.
**2022**, 12, 5992. [Google Scholar] [CrossRef] - Ahmed, N.; Firoze, A.; Rahman, R.M. Machine learning for predicting landslide risk of Rohingya refugee camp infrastructure. J. Inf. Telecommun.
**2020**, 4, 175–198. [Google Scholar] [CrossRef][Green Version] - Shirzadi, A.; Shahabi, H.; Chapi, K.; Bui, D.T.; Pham, B.T.; Shahedi, K.; Ahmad, B.B. A comparative study between popular statistical and machine learning methods for simulating volume of landslides. Catena
**2017**, 157, 213–226. [Google Scholar] [CrossRef] - Tehrani, F.S.; Calvello, M.; Liu, Z.; Zhang, L.; Lacasse, S. Machine learning and landslide studies: Recent advances and applications. Nat. Hazards
**2022**, 114, 1197–1245. [Google Scholar] [CrossRef] - Xu, S.; Song, Y.; Hao, X. A Comparative Study of Shallow Machine Learning Models and Deep Learning Models for Landslide Susceptibility Assessment Based on Imbalanced Data. Forests
**2022**, 13, 1908. [Google Scholar] [CrossRef] - Wang, L.J.; Sawada, K.; Moriguchi, S. Landslide susceptibility analysis with logistic regression model based on FCM sampling strategy. Comput. Geosci.
**2013**, 57, 81–92. [Google Scholar] [CrossRef] - Nanehkaran, Y.A.; Mao, Y.; Azarafza, M.; Kockar, M.K.; Zhu, H.H. Fuzzy-based multiple decision method for landslide susceptibility and hazard assessment: A case study of Tabriz, Iran. Geomech. Eng.
**2021**, 24, 407–418. [Google Scholar] - Azarafza, M.; Ghazifard, A.; Akgün, H.; Asghari-Kaljahi, E. Landslide susceptibility assessment of South Pars Special Zone, southwest Iran. Environ. Earth Sci.
**2018**, 77, 1–29. [Google Scholar] [CrossRef] - Sharma, A.; Prakash, C.; Manivasagam, V. Entropy-based hybrid integration of random forest and support vector machine for landslide susceptibility analysis. Geomatics
**2021**, 1, 399–416. [Google Scholar] [CrossRef] - Zhang, S.; Wang, Y.; Wu, G. Earthquake-Induced Landslide Susceptibility Assessment Using a Novel Model Based on Gradient Boosting Machine Learning and Class Balancing Methods. Remote Sens.
**2022**, 14, 5945. [Google Scholar] [CrossRef] - Gupta, S.K.; Shukla, D.P. Handling data imbalance in machine learning based landslide susceptibility mapping: A case study of Mandakini River Basin, North-Western Himalayas. Landslides
**2022**, 20, 933–949. [Google Scholar] [CrossRef] - Wang, Y.; Wu, X.; Chen, Z.; Ren, F.; Feng, L.; Du, Q. Optimizing the predictive ability of machine learning methods for landslide susceptibility mapping using SMOTE for Lishui City in Zhejiang Province, China. Int. J. Environ. Res. Public Health
**2019**, 16, 368. [Google Scholar] [CrossRef][Green Version] - Shamsudin, H.; Yusof, U.K.; Jayalakshmi, A.; Khalid, M.N.A. Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset. In Proceedings of the 2020 IEEE 16th International Conference on Control & Automation (ICCA), Sapporo, Japan, 9–11 October 2020; pp. 803–808. [Google Scholar]
- Yap, B.W.; Abd Rani, K.; Abd Rahman, H.A.; Fong, S.; Khairudin, Z.; Abdullah, N.N. An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), Kuala Lumpur, Malaysia, 3–5 June 2014; Springer: Singapore, 2014; pp. 13–22. [Google Scholar]
- Junsomboon, N.; Phienthrakul, T. Combining over-sampling and under-sampling techniques for imbalance dataset. In Proceedings of the 9th International Conference on Machine Learning and Computing, Singapore, 24–26 February 2017; pp. 243–247. [Google Scholar]
- Dou, J.; Yunus, A.P.; Merghadi, A.; Shirzadi, A.; Nguyen, H.; Hussain, Y.; Avtar, R.; Chen, Y.; Pham, B.T.; Yamagishi, H. Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. Sci. Total Environ.
**2020**, 720, 137320. [Google Scholar] [CrossRef] [PubMed] - Zhang, H.; Song, Y.; Xu, S.; He, Y.; Li, Z.; Yu, X.; Liang, Y.; Wu, W.; Wang, Y. Combining a class-weighted algorithm and machine learning models in landslide susceptibility mapping: A case study of Wanzhou section of the Three Gorges Reservoir, China. Comput. Geosci.
**2022**, 158, 104966. [Google Scholar] [CrossRef] - Lee, S.; Ryu, J.H.; Won, J.S.; Park, H.J. Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Eng. Geol.
**2004**, 71, 289–302. [Google Scholar] [CrossRef] - Armaş, I. Weights of evidence method for landslide susceptibility mapping. Prahova Subcarpathians, Romania. Nat. Hazards
**2012**, 60, 937–950. [Google Scholar] [CrossRef] - Buckland, M.; Gey, F. The relationship between recall and precision. J. Am. Soc. Inf. Sci.
**1994**, 45, 12–19. [Google Scholar] [CrossRef] - Ri, J.; Kim, H. G-mean based extreme learning machine for imbalance learning. Digit. Signal Process.
**2020**, 98, 102637. [Google Scholar] [CrossRef] - Wang, M.; Qiao, J.p. Reservoir-landslide hazard assessment based on GIS: A case study in Wanzhou section of the Three Gorges Reservoir. J. Mt. Sci.
**2013**, 10, 1085–1096. [Google Scholar] [CrossRef] - Huang, C.; Zhou, Q.; Zhou, L.; Cao, Y. Ancient landslide in Wanzhou District analysis from 2015 to 2018 based on ALOS-2 data by QPS-InSAR. Nat. Hazards
**2021**, 109, 1777–1800. [Google Scholar] [CrossRef] - Wu, S.; Shi, L.; Wang, R.; Tan, C.; Hu, D.; Mei, Y.; Xu, R. Zonation of the landslide hazards in the forereservoir region of the Three Gorges Project on the Yangtze River. Eng. Geol.
**2001**, 59, 51–58. [Google Scholar] [CrossRef] - Li, S.; Li, X.; Dong, Y. Study on Ji’an landslide characters and origin in Wanzhou Chongqing. Chin. J. Rock Mech. Eng.
**2005**, 24, 3159–3164. [Google Scholar] - Song, Y.; Niu, R.; Xu, S.; Ye, R.; Peng, L.; Guo, T.; Li, S.; Chen, T. Landslide susceptibility mapping based on weighted gradient boosting decision tree in Wanzhou section of the Three Gorges Reservoir Area (China). ISPRS Int. J.-Geo-Inf.
**2018**, 8, 4. [Google Scholar] [CrossRef][Green Version] - Zhou, C.; Yin, K.; Cao, Y.; Ahmed, B.; Li, Y.; Catani, F.; Pourghasemi, H.R. Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China. Comput. Geosci.
**2018**, 112, 23–37. [Google Scholar] [CrossRef][Green Version] - Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards
**2012**, 63, 965–996. [Google Scholar] [CrossRef] - Moayedi, H.; Mehrabi, M.; Mosallanezhad, M.; Rashid, A.S.A.; Pradhan, B. Modification of landslide susceptibility mapping using optimized PSO-ANN technique. Eng. Comput.
**2019**, 35, 967–984. [Google Scholar] [CrossRef] - Montgomery, D.R.; Dietrich, W.E. A physically based model for the topographic control on shallow landsliding. Water Resour. Res.
**1994**, 30, 1153–1171. [Google Scholar] [CrossRef] - Kalantar, B.; Pradhan, B.; Naghibi, S.A.; Motevalli, A.; Mansor, S. Assessment of the effects of training data selection on the landslide susceptibility mapping: A comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomat. Nat. Hazards Risk
**2018**, 9, 49–69. [Google Scholar] [CrossRef] - Jaafari, A.; Najafi, A.; Pourghasemi, H.; Rezaeian, J.; Sattarian, A. GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int. J. Environ. Sci. Technol.
**2014**, 11, 909–926. [Google Scholar] [CrossRef][Green Version] - Hua, Y.; Wang, X.; Li, Y.; Xu, P.; Xia, W. Dynamic development of landslide susceptibility based on slope unit and deep neural networks. Landslides
**2021**, 18, 281–302. [Google Scholar] [CrossRef] - Hong, H.; Liu, J.; Bui, D.T.; Pradhan, B.; Acharya, T.D.; Pham, B.T.; Zhu, A.X.; Chen, W.; Ahmad, B.B. Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China). Catena
**2018**, 163, 399–413. [Google Scholar] [CrossRef] - Roodposhti, M.S.; Rahimi, S.; Beglou, M.J. PROMETHEE II and fuzzy AHP: An enhanced GIS-based landslide susceptibility mapping. Nat. Hazards
**2014**, 73, 77–95. [Google Scholar] [CrossRef] - Gong, W.; Hu, M.; Zhang, Y.; Tang, H.; Liu, D.; Song, Q. GIS-Based Landslide Susceptibility Mapping using Ensemble Methods for Fengjie County in the Three Gorges Reservoir Region, China; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Van Westen, C.; Rengers, N.; Soeters, R. Use of geomorphological information in indirect landslide susceptibility assessment. Nat. Hazards
**2003**, 30, 399–419. [Google Scholar] [CrossRef] - Cheng, J.; Dai, X.; Wang, Z.; Li, J.; Qu, G.; Li, W.; She, J.; Wang, Y. Landslide susceptibility assessment model construction using typical machine learning for the Three Gorges Reservoir Area in China. Remote Sens.
**2022**, 14, 2257. [Google Scholar] [CrossRef] - Wang, D.; Zhang, Y.; Zhao, Y. LightGBM: An effective miRNA classification method in breast cancer patients. In Proceedings of the 2017 International Conference on Computational Biology and Bioinformatics, Newark, NJ, USA, 18–20 October 2017; pp. 7–11. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst.
**2017**, 30. [Google Scholar] - Freedman, D.A. Bootstrapping regression models. Ann. Stat.
**1981**, 9, 1218–1228. [Google Scholar] [CrossRef] - Wu, W.; Zhang, Q.; Singh, V.P.; Wang, G.; Zhao, J.; Shen, Z.; Sun, S. A Data-Driven Model on Google Earth Engine for Landslide Susceptibility Assessment in the Hengduan Mountains, the Qinghai–Tibetan Plateau. Remote Sens.
**2022**, 14, 4662. [Google Scholar] [CrossRef] - Castellanos, F.J.; Valero-Mas, J.J.; Calvo-Zaragoza, J.; Rico-Juan, J.R. Oversampling imbalanced data in the string space. Pattern Recognit. Lett.
**2018**, 103, 32–38. [Google Scholar] [CrossRef][Green Version] - Liu, A.; Ghosh, J.; Martin, C. Generative Oversampling for Mining Imbalanced Datasets. In Proceedings of the DMIN, Prague, Czech Republic, 23–30 June 2007; pp. 66–72. [Google Scholar]
- Chawla, B.; Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res.
**2018**, 61, 863–905. [Google Scholar] [CrossRef] - Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man, Cybern. Part B
**2008**, 39, 539–550. [Google Scholar] - Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef] - Brodersen, K.H.; Ong, C.S.; Stephan, K.E.; Buhmann, J.M. The balanced accuracy and its posterior distribution. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 3121–3124. [Google Scholar]
- Huang, J.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng.
**2005**, 17, 299–310. [Google Scholar] [CrossRef][Green Version] - Ri, J.H.; Tian, G.; Liu, Y.; Xu, W.h.; Lou, J.g. Extreme learning machine with hybrid cost function of G-mean and probability for imbalance learning. Int. J. Mach. Learn. Cybern.
**2020**, 11, 2007–2020. [Google Scholar] [CrossRef] - Lobo, J.M.; Jiménez-Valverde, A.; Real, R. AUC: A misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr.
**2008**, 17, 145–151. [Google Scholar] [CrossRef] - Townsend, J.T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys.
**1971**, 9, 40–50. [Google Scholar] [CrossRef] - Visa, S.; Ramsay, B.; Ralescu, A.L.; Van Der Knaap, E. Confusion matrix-based feature selection. Maics
**2011**, 710, 120–127. [Google Scholar] - Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform.
**2019**, 16, 4681–4690. [Google Scholar] [CrossRef] - Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]

**Figure 1.**Location map of the study area. Part A shows the remote sensing image of the study area, and part B shows the geographical location of the study area.

Raw Data | Type | Source |
---|---|---|

Historic landslide | Vector | geological survey and remote sensing images |

DEM | Raster | Aster GDEM (https://earthdata.nasa.gov/) |

Landsat 8 OLI | Raster | USGS (https://earthexplorer.usgs.gov/) |

Lithology | Vector | local Land and Resources Bureau |

Meteorological data | Vector | Meteorological Bureau (http://www.cma.gov.cn/) |

Variables | Name | Variable Type | Classification |
---|---|---|---|

Y | Landslide | Binary | Landslide |

X1 | Elevation | Continuous | Topography |

X2 | Slope | Continuous | Topography |

X3 | Aspect | Discrete | Topography |

X4 | Curvature | Continuous | Topography |

X5 | Distance to river | Continuous | Hydrology |

X6 | NDVI | Continuous | Land cover |

X7 | NDWI | Continuous | Land cover |

X8 | Rainfall | Discrete | Triggered |

X9 | Seismic intensity | Discrete | Triggered |

X10 | Land use | Discrete | Triggered |

X11 | TRI | Continuous | Topography |

X12 | Lithology | Continuous | Topography |

Algorithm | Advantages | Disadvantages |
---|---|---|

LightGBM | High accuracy and efficiency | Prone to overfitting with small datasets |

Handles large datasets well | Requires careful tuning of hyperparameters | |

Can handle missing values | Less interpretable compared to simpler models | |

Random Forest | High accuracy and robustness | Can be computationally expensive with large datasets |

Handles high-dimensional data well | Limited interpretability compared to simpler models | |

Can handle missing values and categorical features | May not perform well with imbalanced datasets | |

Logistic Regression | Simple and interpretable | May not perform well with nonlinear relationships |

Fast and efficient | ||

Performs well with small datasets | Can be sensitive to outliers and influential observations |

Variables | Name | Variance Inflation Factor (VIF) |
---|---|---|

X1 | Elevation | 2.4049 |

X2 | Slope | 9.6671 |

X3 | Aspect | 1.0288 |

X4 | TRI | 9.3899 |

X5 | Curvature | 1.0156 |

X6 | Lithology | 1.6362 |

X7 | River | 1.9169 |

X8 | NDVI | 2.1679 |

X9 | NDWI | 1.4324 |

X10 | Rainfall | 1.4463 |

X11 | Earthquake | 1.6907 |

X12 | Land_use | 1.6336 |

**Table 5.**Recall, Accuracy, mean Average Precision, G-mean, F1_score, Precision and AUC of the nine models.

Models | mAP | G-Mean | Recall | Accuracy | F1_Score | Precision | AUC |
---|---|---|---|---|---|---|---|

LR | 0.5 | 0 | 0 | 0.95 | 0 | 0 | 0.801 |

U_LR | 0.765 | 0.764 | 0.815 | 0.72 | 0.227 | 0.132 | 0.824 |

O_LR | 0.772 | 0.771 | 0.803 | 0.744 | 0.24 | 0.141 | 0.835 |

RF | 0.5 | 0.016 | 0 | 0.95 | 0.001 | 0.158 | 0.79 |

U_RF | 0.682 | 0.666 | 0.537 | 0.813 | 0.224 | 0.142 | 0.79 |

O_RF | 0.811 | 0.804 | 0.705 | 0.907 | 0.432 | 0.311 | 0.932 |

LGRB | 0.501 | 0.046 | 0.002 | 0.949 | 0.004 | 0.169 | 0.797 |

U_LGBM | 0.729 | 0.729 | 0.71 | 0.746 | 0.219 | 0.13 | 0.797 |

O_LGBM | 0.81 | 0.81 | 0.826 | 0.796 | 0.29 | 0.176 | 0.882 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Song, Y.; Yang, D.; Wu, W.; Zhang, X.; Zhou, J.; Tian, Z.; Wang, C.; Song, Y.
Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models. *ISPRS Int. J. Geo-Inf.* **2023**, *12*, 197.
https://doi.org/10.3390/ijgi12050197

**AMA Style**

Song Y, Yang D, Wu W, Zhang X, Zhou J, Tian Z, Wang C, Song Y.
Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models. *ISPRS International Journal of Geo-Information*. 2023; 12(5):197.
https://doi.org/10.3390/ijgi12050197

**Chicago/Turabian Style**

Song, Yingze, Degang Yang, Weicheng Wu, Xin Zhang, Jie Zhou, Zhaoxu Tian, Chencan Wang, and Yingxu Song.
2023. "Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models" *ISPRS International Journal of Geo-Information* 12, no. 5: 197.
https://doi.org/10.3390/ijgi12050197