Next Article in Journal
Adaptive Evolution of Marine Organisms: A Bibliometric Analysis Based on CiteSpace
Next Article in Special Issue
Simulation Study on the Mechanical Effect of CO2 Geological Storage in Ordos Demonstration Area
Previous Article in Journal
A Study on Developing an AI-Based Water Demand Prediction and Classification Model for Gurye Intake Station
Previous Article in Special Issue
Experimental Study on the Stability of Shallow Landslides in Residual Soil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Debris Flow Scale Prediction Based on Correlation Analysis and Improved Support Vector Machine

1
Civil Engineering College, Chongqing Three Gorges University, Wanzhou, Chongqing 404100, China
2
Architecture and Engineering College, Sichuan Institute of Industrial Technology, Deyang 618500, China
*
Author to whom correspondence should be addressed.
Water 2023, 15(23), 4161; https://doi.org/10.3390/w15234161
Submission received: 2 October 2023 / Revised: 22 November 2023 / Accepted: 27 November 2023 / Published: 30 November 2023
(This article belongs to the Special Issue Effects of Groundwater and Surface Water on the Natural Geo-Hazards)

Abstract

:
The occurrence of debris flows are a significant threat to human lives and property. Estimating the debris flow scale is a crucial parameter for assessing disaster losses in such events. Currently, the commonly used method for estimating debris flow runoff relies on fitting techniques, which often yield low prediction accuracy and limited data representation capabilities. Addressing these challenges, this study proposes an improved grey wolf algorithm optimized support vector machine prediction model. The model’s effectiveness is validated using data from 72 debris flow events in Beichuan County. The results demonstrate a prediction accuracy of 95.9% using this approach, indicating its strong predictive capabilities for debris flow scale. Additionally, it is observed that the basin area, the basin relative, and the main channel length are the key factors influencing debris flow scale in Beichuan County.

1. Introduction

The scale of debris flow refers to the amount of loose solid material flushed out by the debris flow from its formation to movement. Usually, the scale of debris flows is defined as the volume size that eventually leads to their formation [1,2,3]. It can be divided into four categories according to the size of the debris flow:
  • Small debris flow refers to the amount of loose solid material flushed out if less than 10,000 cubic meters.
  • Medium-sized debris flow refers to the volume of loose solid materials flushed out between 10,000 cubic meters and 100,000 cubic meters.
  • Large debris flow refers to the volume of loose solid materials flushed out between 100,000 cubic meters and 1 million cubic meters.
  • Giant debris flow refers to the amount of loose solid material washed out if more than 1 million cubic meters.
Generally speaking, the larger the debris flow, the more serious the disaster, which may cause greater damage and losses to human society and the natural environment. Therefore, accurate assessment and prediction of the scale of debris flow is crucial to taking effective disaster prevention and mitigation measures.
De Haas et al. [4] designed a debris flow volume prediction model based on the area of debris flow accumulation fan. However, their research did not examine the impact of lithology and climate on the debris flow volume in the study area. The estimation of debris flow volume based on fan area and the lack of factors may lead to errors in the estimation of debris flow volume. Ma et al. [5] used only loose body volumes to establish a mathematical and statistical relationship with debris flow volumes. Although the correlation coefficient is as high as 0.928, other factors may also be significantly related to the prediction of debris flow. Gartner et al. [6] used multiple linear regression to predict debris flow volumes at seven different sites. They chose different influencing factors at each location. However, a linear regression model for one study site may not be suitable for another. Consequently, numerous studies are necessary for estimating mudslide volumes in other areas, which is an arduous task. Chang et al. [7] identified factors that affect debris flow volume, including watershed area, landslide area, stream length, average stream and watershed slope, form factor, and geological index. They constructed an empirical formula model, with significant results, particularly when applied to areas with heavy rainfall. However, it was demonstrated that predicting debris flow volume is a challenging undertaking, and any empirical model should be augmented with additional approaches. Upon analysis of the empirical formulae, it becomes apparent that the prediction models for debris flow sizes are statistically based and acquired through mathematical fitting. The formula selection of these prediction models is highly subjective. This leads to the selection of different functions which have a great influence on the fitting results, and the data representation is insufficient. Furthermore, considering the regional characteristics of mudslides, the factors influencing the size of mudslides vary greatly across different regions. Therefore, the same formula cannot be applied uniformly across all regions without considering their unique features and conditions.
With the development of machine learning, more and more intelligent algorithms are used in disaster prevention and prediction [8]. Tang and colleagues [9] utilized an artificial neural network (ANN) to predict the volume of debris flow, and the results showed that the model could achieve an accuracy of 78.33%. In addition, they mentioned that the more samples and meteorological data contained in the data set, the higher the prediction accuracy. They also found that the prediction accuracy of mesoscale and large-scale debris flow is higher than that of small scale. Lee et al. [10] also used an ANN to predict the volume of debris flow under extreme weather in Korea and compared the model with three regression equations. Their model results had an R2 value of 0.822 and an MSE value of 0.022. The three regression equations had R2 values of 0.703, 0.703, and 0.691, respectively, and none of them fitted as well as the ANN model. This also verifies our analysis that the same equations are not characterized by regional expansion. Huang et al. [11] employed an adaptive Boost machine learning algorithm that integrates extreme learning machine and particle swarm optimization to forecast the volume of debris flow. The model demonstrates high statistical validity and accuracy, yielding a MAPE of less than 0.1. The validation of their model in other study areas also produced MAPE results ranging between 0.11 and 0.16. Above examples of the application of the machine learning algorithms show that they can overcome the limitations of empirical formulae. Therefore, it highlights their great potential in practical applications.
Support vector machine is an algorithm based on a statistical theory proposed to minimize structural risks [12]. Intelligent algorithms have better promotion capabilities and can overcome the shortcomings of traditional statistical learning theories. Therefore, researchers widely use them in the recognition and classification texts, as well as medical and health, vehicle traffic, failure mode recognition, and other fields [13,14,15,16]. Nonetheless, the more rational the internal parameters of the SVM model are, the better the performance of the model [17].
Therefore, the implementation of optimization algorithms is essential for SVM optimization. Swarm Intelligence algorithms, also known as SI algorithms, are commonly used. These algorithms have demonstrated remarkable efficiency in solving complex problems within reasonable timeframes [18]. The Grey Wolf Optimization algorithm (GWO) is a novel SI algorithm. It is inspired by the hierarchical structure and hunting behavior of grey wolves and aims to identify the optimal solution [19]. In studies predicting landslide displacement, GWO is employed to identify optimal parameters for the ELM algorithm. The results show that the GWO-ELM model has superior generalization capability and higher prediction accuracy. In the shale gas geosteering discriminant model, GWO was utilized to identify the globally optimal parameters in SVM. The GWO-SVM model has a significant improvement in the average crossover rate and prediction accuracy. Compared with the original model, it increased by 5.38% and 7.74%, respectively [20,21]. In addition, the GWO exhibits exceptional competitiveness when compared to other optimization algorithms, such as PSO, GSA, DE, EP, and ES [19].
The aforementioned examples highlight the considerable benefits of the grey wolf algorithm in determining global parameters. Thus, the accuracy of the initial algorithm and the generalization ability of the model are improved. Consequently, this study introduces the grey wolf algorithm to identify the internal parameters of the SVM model and ameliorate its performance.
However, during the final stage of the GWO algorithm’s operation, all grey wolves within the population converge towards α wolves, which denote the optimal solution. This ultimately results in a loss of population diversity, local convergence, and premature algorithm convergence. It has been demonstrated that Levy flights effectively locate desirable solutions through random search. This paper presents Levy flights to optimize the Grey Wolf algorithm and address the issue of local and premature convergence in the algorithm’s later stages [22]. Inspired by this, using Levy flights to optimize it, an improved GWO algorithm (IGWO) is proposed. Then IGWO is utilized to optimize internal parameters of the SVM algorithm, resulting in an improved performance of the SVM algorithm. Finally, a debris flow volume prediction model is established based on the improved IGWO-SVM algorithm.
In this study, three input factors are selected using correlation analysis. These factors are then fed into the IGWO-SVR algorithm to predict the volume of 72 mudslides in Beichuan County. Section 2 details the correlation analysis, the model construction process, and the comparison of models. Section 3 presents the study area and model results. Lastly, Section 4 thoroughly discusses the contents of this paper and future work that needs further improvement. Section 5 presents the conclusions of the IGWO-SVR prediction model used in the study area of this paper.

2. Method

2.1. Spearman Correlation Analysis

Spearman correlation coefficient [23] is also called rank correlation coefficient or order correlation coefficient. It uses the rank of two variables for linear correlation analysis to measure whether the two variables are monotonically correlated. The correlation coefficient ρ of this method is defined as the Spearman correlation coefficient between the ranks of two n-dimensional random variables X = (X1, X2, X3... Xn) and Y = (Y1, Y2, Y3... Yn).
p = i = 1 n ( r i r ¯ ) ( s i s ¯ ) i = 1 n ( r i r ¯ ) 2 i = 1 n ( s i s ¯ ) 2
In equations, ri and si correspond to the ranks of xi and yi, respectively, for i = 1, 2..., n. The value of ρ falls within the range of [−1,1]. When there is no strong correlation between two variables, ρ is either equal or close to 0. When one variable monotonically increases with another, ρ = 1, and when one monotonically decreases, ρ = 1.

2.2. Grey Wolf Optimization Algorithm

The Grey Wolf Optimization algorithm (GWO) was proposed by Mirjalili and others in 2014. The basic principle is to imitate the population system of grey wolves, and divide them into four levels: α wolves, β wolves, δ wolves, and ω wolves. The above four levels correspond to the optimal solution, the optimal solution, the suboptimal solution, and the candidate solution of the optimization problem, respectively [24]. The optimization process of GWO is guided by α, β, and δ. After judging the prey position as the optimal solution, it guides ω around the prey and finds the optimal value through continuous iteration. The process of the Grey Wolf Optimization algorithm can be divided into three stages: encirclement, pursuit, and attack. The specific steps are as follows:
The hunting process, the gray wolf rounding up prey behavior is defined as follows:
D = | C X P ( t ) X ( t ) |
X ( t + 1 ) = X p ( t ) A · D
Equations (2) and (3) represent the distance between the wolf and the prey and the update distance of the wolf position, respectively. Specifically, they are the position vector of the grey wolf (potential solution vector) and the position vector of the prey (optimal global solution). t is the wolf pack position iteration Times, and both are coefficient vectors. Calculated as follows:
A = 2 α · r 1 α
C = 2 · r 2
The convergence factor, which linearly decreases from 2 to 0 as iterations, is a random vector [0,1].
In the optimization problem decision space, to better search for the position of the prey, it is usually guided by α, β, and δ. At the same time, other grey wolf individuals (including ω) update their positions according to the role of the optimal grey wolf individual. They gradually approach the prey. The mechanism of individual wolves tracking the location of their prey is shown in Figure 1.
The grey wolf individual tracking prey position model is as follows:
{ D α = | C 1 · X α X | D β = | C 1 · X β X | D δ = | C 1 · X δ X |
D a , D β , and D δ represent the distance between α, β, and δ, and other individuals, respectively. X a , X β , and X δ represent the current positions of α, β, and δ, respectively, and are random vectors, which are the existing positions of grey wolves.
{ X 1 = X α A 1 · D α X 2 = X β A 1 · D β X 3 = X δ A 1 · D δ
X ( t + 1 ) = X 1 + X 2 + X 3 3
Attacking means catching the prey, that is, finding the optimal solution. To simulate approaching the prey, it is mainly realized by the gradual decrease in the ground value in Equation (5). When the value linearly decreases from 2 to 0, the corresponding value changes in the interval [−2α, 2α]. At that time, the wolves can focus their attacks on their prey. At this time, the wolves will disperse from the position of the prey and enter the process of finding other local optimal solutions. This makes the grey wolf algorithm fall into the optimal local solution.

2.3. Levi Flight Improved Grey Wolf Optimization Algorithm

In the Grey Wolf Optimization algorithm, the position of α represents the optimal solution. The grey wolves in the later population all approached the α wolves, resulting in the loss of population diversity. Thus, they fall into local convergence and premature convergence. Aiming to address these shortcomings, this paper uses Levi flight to perform a global search on the group’s grey wolf individual α wolves. Levy flight is a random walk, which can expand the search range. Its flight step size satisfies a stable heavy-tailed distribution [19]. The new generation α wolf calculation formula improved by Levi’s flight is as follows:
X α ( t + 1 ) = X α ( t ) α l e v y ( β )
L e v y ( β ) = 0.01 μ | v | 1 / β ( X α ( t ) X α b e s t )
u = N ( 0 , σ u 2 ) ; v = N ( 0 , σ v 2 )
σ u = { Γ ( 1 + β ) sin ( π β 2 ) Γ [ ( 1 + β 2 ) ] β 2 ( β 1 ) / 2 } 1 / β σ v = 1
The parameter β is a random number of [0,3].

2.4. Debris Flow Outburst Scale Prediction Model Based on IGWO-SVM

Support vector machine shows great advantages in solving small sample, nonlin-ear, and high-dimensional identification. Therefore, this paper chooses this model as the basic prediction model. The core parameters of the SVM model are the penalty factor (c) and the kernel function parameter (g). Using default parameters may lead to overfitting or underfitting issues. Therefore, the proposed IGWO algorithm is employed to optimize these two parameters for SVM, resulting in a debris flow scale prediction model based on IGWO-SVM. The process of the IGWO-SVM debris flow outflow model is shown in Figure 2.
Specific steps are as follows:
  • Step 1: Set the parameters of IGWO and SVM algorithms and initialize the grey wolf population.
  • Step 2: Use the minimum recognition error rate of SVM for training set samples as the fitness function, calculate the fitness of all individuals in the population, and sort according to the size of the fitness value to determine the top three grey wolves.
  • Step 3: Update the current position of the grey wolf individual according to Equations (10) and (12).
  • Step 4: Update the value of the nonlinear convergence factor a according to Equation (13), and update the parameter vectors A and C according to Equations (8) and (9).
  • Step 5: Introduce the Levy flight strategy to the grey wolf population according to Equation (14) and adjust the position of the grey wolf.
  • Step 6: Determine whether the algorithm has reached the maximum number of iterations. If it is reached, the position of wolf a is returned as the optimal parameter value of SVM. If it is not reached, skip to step 2.
  • Step 7: Use the optimal penalty factor c and kernel function parameter g to train and learn the training set samples to obtain the IGWO-SVM fault diagnosis model.
  • Step 8: Input the test set samples into the trained IGWO-SVM model to predict the scale of debris flow outburst.
Firstly, Spearman correlation analysis is utilized to select input factors and eliminate poorly correlated factors to enhance model accuracy. After correlation analysis, 50 data are randomly used as the training set, and the remaining data are used for the prediction set. Subsequently, Levy flights are employed to optimize the GWO algorithm, resulting in the development of the Improved GWO algorithm (IGWO). The IGWO was then utilized to optimize the SVM algorithm to obtain the final prediction model for mudslide volume. The final model incorporates the training set for training, followed by validation with the prediction set. Figure 3 illustrates the complete workflow.

2.5. Back Propagation Neural Network

Back Propagation Neural Network (BPNN) [25,26], the most extensively applied and sophisticated neural network model, sees widespread use across various civil engineering domains. The network comprises an input layer, an implicit layer, and an output layer. Weight values between the layers are obtained via signal forward propagation and error backpropagation, culminating in the construction of the prediction model. BPNN serves as the comparison model in this study, facilitating performance comparisons with SVM, GWO-SVM, and IGWO-SVM models.
In this paper, the minimum error for training has been established as 0.001, with the number of training sessions set to 1000 and the learning rate set to 0.1.

3. Application Research and Method Comparison

3.1. Introduction to Geology and Hydrology of Study Area

Beichuan County is predominantly hilly, featuring high terrain in the western part of the north, moderate slopes in the central region, and lower mountains in the eastern portion of the southern area. The topography is primarily a result of erosion and dissolution. The county is located on the southeastern margin of the tectonic erosion feature known as Zhongshan. At the same time, it acts as the junction of the mountains in the geological area of Longmen Mountain. The range extends towards the northeast in a southwesterly direction. The topography of the county exhibits substantial variation, with high terrain in the northwest and low terrain in the southeast. The difference in altitude exceeds 1000 m. Gully valley slopes usually exceed 25 degrees, while some slope angles reach 40 to 50 degrees or even steeper. In the study area, the Paleozoic eras of Cambrian, Silurian, Devonian, and Carboniferous, along with the loose stacked strata of the Cenozoic era of Quaternary, are present. Figure 4 shows the geology of Beichuan County.
Beichuan County boasts plentiful water resources, sourced mainly from the Wai, Subao, Pingtong, and Duba rivers; the Waijiang River takes precedence in Beichuan County, serving as a premier tributary of the Fuling River. It originates in the northwest mountainous region of the county, flows through the area, exits via the southeast corner, and eventually empties into the Fuling River. The Waijiang River has a length of 47.9 kilometers and flows through Beichuan Qiang Autonomous County, with a watershed area covering 455.80 square kilometers. It has a natural drop of 203 m and an average specific drop of 4.2 per thousand. The multi-year average runoff measures 102.7 cubic meters per second, with a total annual average runoff of 3.257 billion cubic meters. Furthermore, the Waijiang River annually transports 4–5 million tons of sand. The study area boasts ample groundwater resources.
The hydrogeological conditions in Beichuan County prove intricate, influenced by the stratigraphic lithology, topography, and tectonics present in the region. The hydrogeological conditions in Beichuan County are quite intricate. Groundwater in the region is classified into loose rock-type pore water, clastic rock-type pore and fissure water, carbonate rock-type fissure cave water, and bedrock fissure water. The storage patterns of the distinct groundwater types vary, influenced by topography, lithology, tectonic part, and the spatial combination of tectonics. The pore water of loose rock is mainly deposited in the sand, pebble, and gravel layers of the fourth system. It is mainly distributed in the floodplains and low terraces of Waijiang River, Baicao River, Qingpian River, and its tributaries. The water level in the floodplain or first-class terrace of Waijiang, Baicao River, and Qingpian River ranges from 1–6 m deep, indicating a high-water content. Pore and fissure water contained in clastic rock are preserved in the fine sandstone, quartz sandstone, mud shale, dark grey and grey-green fine sandstone, and muddy sandstone within the Qingping Formation of the Lower Cambrian System and the Lower Devonian System. The argillaceous sandstone of the formation is relatively aquifuge, and the formation is typically a thick to extremely thick layer. The formation fissures are not well developed, resulting in less groundwater. Carbonate fracture cave water exists in the eastern part of the work area. The water-bearing zone is mainly distributed in the northeast direction. The fissure caves of Middle Devonian, Carboniferous, and Permian carbonate rocks are enriched in the two flanks of the dorsal incline and the core of the dorsal incline. The area is characterized by surface dissolution depressions, drop holes, funnels, caves, and even dark rivers. Bedrock fissure water includes tectonic fissure water and metamorphic fissure water. It is widely distributed in the western part of the work area and occurs in the Silurian Maoxian Group (Smx) strata. Tectonic fissure water is situated in high mountains, resulting in limited visible spring outcrops on the surface. Objective evaluation indicates that fewer outflows are present due to the location of the water source.

3.2. Parameter Selection

After the 5.12 Wenchuan earthquake, most of the loose sediments on the hillside produced many loose materials. These loose sediments provide favorable conditions for the development and occurrence of debris flow. After the torrential rain on 24 September, 214 geological disasters occurred, including 72 mudslides. The distribution is shown in Figure 5. It has brought great challenges to the resettlement and reconstruction work of residents in the disaster area. This article is based on a survey of debris flow information in 72 valleys in Beichuan County, Sichuan Province [27]. In Table 1, five factors that can comprehensively reflect the material and energy sources of debris flows, namely basin area, main channel length, basin relative relief, basin relative relief, and bed shifting ratio, are selected as influencing factors of the scale of debris flow outburst.

3.3. Data Presentation and Evaluation

The selected training data source is Wang’s thesis on the debris flow in Beichuan County following the fifth. This study analyses the data from 72 debris flow samples following a 2012 earthquake. Each sample included six parameters, including Loose source material reserves, Basin area, Drainage density, Basin relative relief, Shifting bed proportion, and Main channel length. These variables are the most common factors impacting the scale of mudslide outflow, the output parameter. Results of the parameter statistical analysis can be found in Table 2, and Figure 6 displays the frequency distribution graphs for each parameter. Where ‘n’ represents the frequency number, indicating the count of samples in each sub-interval of the variable, the frequency ratio of each sub-interval to the total sub-frequency of the variable is referred to as the frequency. Furthermore, ‘F’ denotes the cumulative frequency achieved through the incremental addition of frequencies of each sub-interval.
Considering each factor and the debris flow scale, respectively, the correlation results are shown in Table 3 and Figure 7. The single factor correlation analysis shows that the basin area, the relative channel of the basin, and the length of the main channel have a high correlation with the scale of debris flow. Therefore, these three are selected as the influencing factors of the debris flow scale to construct a prediction model of the debris flow scale.

3.4. Forecast of the Debris Flow Scale

Fifty debris flow data are randomly selected for model training, and the remaining data are used as prediction samples to test the prediction effect of debris flow scale. The results are shown in Figure 8. It can be seen from Figure 8 that the training and prediction of the model have good accuracy.

3.5. Model Performance Evaluation

To further evaluate the predictive performance of the method, this paper selects two traditional fitting methods (linear fitting and power function fitting) and the three intelligent algorithms (SVM, GWO-SVM, and BPNN) as comparisons.

3.5.1. Linear Regression Fitting

The basin area, the relative height difference, and the main channel length are used as independent variables, and the debris flow scale is used as the dependent variable. Linear regression is used. The results are as follows:
V = 14.818 + 10.334 S + 39.329 H 21.377 L
The R2 = 0.904 is pretty good in terms of model accuracy alone. However, in this result, the relationship between the length of the main gully and the scale of the debris flow is a negative growth relationship. That is, the longer the length of the main ditch, the smaller the scale of the debris flow. This contradicts the results of correlation analysis. After testing the model, the results are shown in Table 4. The VIF of the basin area and the length of the main river channel are 11.354 and 11.396, respectively, both of which are greater than 10. This shows that there is a more obvious collinearity relationship. This is the main reason for the negative main groove length in the line fitting.

3.5.2. Power Function Fitting

When using the power function fitting, the three correlation factors are artificially taken as positive values to obtain the correct correlation relationship. The least-square regression is used to fit the parameters to be sought. The relevant results are as follows:
V = a × S b × H c × L d
In the above equation a = 29.275040548; b = 0.416174002; c = 0.382748483; and d = 0.000000029.
The results show that R2 = 0.823, and the power function can predict the scale of debris flow. However, it is a pity that this formula is easy to mislead the analysis of debris flow impact factors. Because it is not difficult to conclude from the fitting that the length of the main ditch is not critical to the scale of the debris flow. However, the correlation analysis shows that the length of the main gully is highly correlated with the scale of debris flow.
It can be seen from the above two traditional fitting methods that these methods have good accuracy in fitting the debris flow scale. However, these methods often lead to misunderstandings about the factors determining the magnitude of debris flows. These methods make it difficult to find the key factors. To sum up, compared with the intelligent algorithm, the traditional fitting method that can intuitively reflect the influence factors of the debris flow scale does not seem to have any advantages. In summary, this paper is more inclined to use correlation analysis to determine the main influencing factors of debris flow scale, and then build a debris flow scale prediction model through support vector machine.

3.5.3. Comparison with Other Common Optimization Algorithms

To highlight the advantages of this method, this paper selects three intelligent algorithms for comparison. And the comparison results are shown in Figure 8.
In general, the four models can predict the debris flow scale, and the effect is good, but the overall IGWO-SVR is the closest to the actual value of the debris flow scale. To evaluate the impact of the prediction model more intuitively for the debris flow scale, this paper will evaluate the prediction model from accuracy and efficiency. The prediction error distribution of the BPNN prediction model is more discrete, and the distribution range of prediction error is more significant than that of SVR and its improved model. This shows that the prediction effect of the BPNN model is poor. Compared with the SVR and GWO-SVR, the IGWO-SVR error distribution is more inferior. It is concentrated near zero, and the error range is lower than the other three methods, which has better stability.
To analyze the overall performance of the prediction model, this paper selects root mean square error (RMSE), average absolute error (MAE), and coefficient of determination (R2) to evaluate the above four prediction models [28].
RMSE = 1 n i = 1 n ( y ^ i y i ) 2
MAE = 1 n i = 1 n | y ^ i y i |
R 2 = 1 ( y ^ i y i ) 2 ( y ^ i y i ) 2
In Equations (15)–(17): n is the size of the sample, i is the i-th data sample among n samples, y ^ i is the predicted debris flow scale, and y i is the accurate debris flow scale. The results are shown in Table 5.
It can be seen from Table 5 that all four methods can be used as a prediction model for the debris flow scale. The IGWO-SVR model has RMSE = 7.75, MAE = 7.00, R2 = 0.95, which is better than the other three models. It is worth noting that the prediction accuracy of the BPNN is significantly lower than that of the other three. This is due to the significant data demand for BPNN training, which is not suitable for debris flow scales with a small number of statistical samples.
The running time of each debris flow prediction model is calculated on the Intel(R) Core (TM) i5-9300H CPU 2.40 GHz Win10. The results are shown in Table 6. The running time of IGWO-SVR is 1.7876 s. Compared with BP neural network, SVR, and GWO-SVR, the efficiency is increased by 204.88%, 102.66%, and 29.46%, respectively. It shows that the prediction model proposed in this paper has high efficiency in predicting the debris flow scale and is more conducive to practical engineering applications.

4. Discussion

Sobol Method for Sensitivity Analysis

The Sobol method is a quantitative global sensitivity analysis algorithm based on variance decomposition [29]. This method decomposes the total variance of the objective function into individual parameter variances and multi-parameter interaction variances. It finds wide applications in sensitivity analysis. The results of first-order sensitivity indices and global sensitivity indices are shown in Figure 9.
From Figure 10, we can observe that the global sensitivities of “Basin area”, “Basin relative relief”, and “Main channel length” with respect to “the debris flow scale” are all greater than 0.2. The “Basin relative relief”, which directly represents the potential energy source for debris flows, has the highest first-order sensitivity and global sensitivity indices, which are 0.370 and 0.372, respectively. On the other hand, “Basin area” and “Main channel length”, as the most direct indicators of debris flow material sources, are closely related to debris flow discharge. In contrast, the sensitivity indices for “Drainage density” and “Shifting bed proportion” are relatively low, with first-order sensitivity indices of [0.012, 0.013] and global sensitivity indices of [0.080, 0.085]. This is because these two factors typically indirectly influence debris flow material sources, thus affecting the maximum debris flow discharge. It’s worth noting that these findings align with the results obtained from the correlation analysis of debris flow influencing factors. Through the sensitivity analysis, the main factors affecting the debris flow scale in Beichuan County are the basin area, the basin relative relief, and the main channel length. Among the above three influencing factors, the relative height difference has an apparent positive relationship with the debris flow scale. This is due to the large amounts of loose deposits produced after the Wenchuan earthquake on 12 May, and the above two influencing factors are the most closely related to loose deposits. Therefore, the debris flow in Beichuan County can be controlled from three aspects: drainage area, relative relief of the drainage basin, and the length of the main river channel. For example, the slope can be cut to reduce the load, reduce the height and slope of the slope, and reduce the risk of deformation and damage of the slope. Or use retaining structures, such as retaining walls, anti-slide piles, etc., to support and reinforce the slope and improve the stability of the slope.
It is worth noting that the above three factors are not the main factors for the debris flow in each region. For example, trench length and slope were chosen as dominant factors when Ikeya fitted the debris flow scale in the Pacific region [30]. Therefore, the debris flow scale has a strong regionality. So, it is necessary to carry out correlation analysis when calculating debris flow. At the same time, it is worth noting that although only five slope factors are selected for analysis, the scale of debris flow is affected by various internal and external factors. Lithology, weathering degree of rock, and plant distribution characteristics will all affect the debris flow scale. Therefore, this paper only provides a feasible method to predict the size of debris flow. To further improve the accuracy of the debris flow scale, the number of influencing factors of the debris flow scale should be increased, and quantitative theory should be used to analyze related factors further. And the analysis method in this paper is only for what has occurred or determines the debris flow scale that will happen. Therefore, it should be used in the actual debris flow detection, and it is recommended to be used in conjunction with the prediction of debris flow occurrence.

5. Conclusions

This study takes the 72 debris flow in Beichuan County as the research object. Through the correlation analysis, the main influencing factors of the debris flow scale are found, and the improved Grey Wolf Algorithm is used to optimize the support vector regression to train and predict the debris flow scale. By comparing two traditional methods and three machine learning methods, the following conclusions are obtained:
  • The leading factors of the debris flow scale in Beichuan County are the basin area, the basin relative relief, and the main channel length.
  • Aiming to address the shortcomings of support vector machines such as slow convergence speed and ease to fall into local extremes, the improved Grey Wolf Algorithm can improve the prediction speed and accuracy of debris flow scale.
  • With regard to the regional characteristics of Beichuan County, since the three influencing factors of basin area, relative height difference and main ditch length have a greater impact on debris flow, when designing the debris flow prevention and control programme, the focus should be on these three factors for consideration.
  • The enhanced Grey Wolf Algorithm outlined in this paper lessens the impact of personal opinions and biases on the Debris Flow Scale Prediction process, and the evaluation outcomes give a degree of confidence, thereby offering technological aid for the scientific assessment of Debris Flow danger.
  • In the next study, it may be considered to add more data sets using numerical simulation to improve the predictive accuracy of the model. However, increasing the data set will also increase the model run time. Finding a balance between increasing the data set and controlling the model run time is a future direction.

Author Contributions

Conceptualization, L.L.; Methodology, Z.Z.; Validation, B.N.; Investigation, H.W.; Data curation, H.L.; Writing—original draft, Z.Z.; Writing—review & editing, D.Z.; Visualization, Y.Q.; Project administration, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission (Grant No. KJZD-M202301205, KJQN202001218, KJQN202301260, KJQN202101206, KJQN202201238), the Research development and application of “big data intelligent prediction and early warning cloud service platform for geological disasters in the Three Gorges Reservoir Area” of Chongqing Municipal Education Commission (Grant No. HZ2021012), the Open fund of Chongqing Three Gorges Reservoir Bank Slope and Engineering Structure Disaster Prevention and Control Engineering Technology Research Center (Grant No. SXAPGC21ZD01), the Science and technology innovation project of Chongqing Wanzhou District Bureau of science and technology (Grant No. wzstc20230303), Nanjing 2022 “Science and Technology Three Gorges” Chongqing Wanzhou District counterpart support project of Chongqing Wanzhou District Bureau of science and technology (Grant No. 2022101S-02), and 2023 Chongqing Postgraduate Research Innovation Project (Grant No. CYS23736).

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fuchs, S.; Kaitna, R.; Scheidl, C.; Hübl, J. The application of the risk concept to debris flow hazards. Géoméch. Tunn. 2008, 1, 120–129. [Google Scholar] [CrossRef]
  2. He, K.; Liu, B.; Hu, X.; Zhou, R.; Xi, C.; Ma, G.; Han, M.; Li, Y.; Luo, G. Rapid characterization of landslide-debris flow chains of geologic hazards using multi-method investigation: Case study of the Tiejiangwan LDC. Rock Mech. Rock Eng. 2022, 55, 5183–5208. [Google Scholar] [CrossRef]
  3. Trujillo-Vela, M.G.; Ramos-Cañón, A.M.; Escobar-Vargas, J.A.; Galindo-Torres, S.A. An overview of debris-flow mathematical modelling. Earth-Sci. Rev. 2022, 232, 104135. [Google Scholar] [CrossRef]
  4. de Haas, T.; Densmore, A.L. Debris-flow volume quantile prediction from catchment morphometry. Geology 2019, 47, 791–794. [Google Scholar] [CrossRef]
  5. Ma, C.; Hu, K.; Tian, M. Comparison of debris-flow volume and activity under different formation conditions. Nat. Hazards 2013, 67, 261–273. [Google Scholar] [CrossRef]
  6. Gartner, J.E.; Cannon, S.H.; Santi, P.M.; DeWolfe, V.G. Empirical models to predict the volumes of debris flows generated by recently burned basins in the western U.S. Geomorphology 2008, 96, 339–354. [Google Scholar] [CrossRef]
  7. Chang, C.-W.; Lin, P.-S.; Tsai, C.-L. Estimation of sediment volume of debris flow caused by extreme rainfall in Taiwan. Eng. Geol. 2011, 123, 83–90. [Google Scholar] [CrossRef]
  8. Arattano, M.; Bertoldi, G.; Cavalli, M.; Comiti, F.; D’Agostino, V.; Theule, J. Comparison of methods and procedures for debris-flow volume estimation. In Engineering Geology for Society and Territory-Volume 3: River Basins, Reservoir Sedimentation and Water Resources; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 115–119. [Google Scholar]
  9. Tang, W.; Ding, H.-T.; Chen, N.-S.; Ma, S.-C.; Liu, L.-H.; Wu, K.-L.; Tian, S.-F. Artificial Neural Network-based prediction of glacial debris flows in the ParlungZangbo Basin, southeastern Tibetan Plateau, China. J. Mt. Sci. 2021, 18, 51–67. [Google Scholar] [CrossRef]
  10. Lee, D.-H.; Cheon, E.; Lim, H.-H.; Choi, S.-K.; Kim, Y.-T.; Lee, S.-R. An artificial neural network model to predict debris-flow volumes caused by extreme rainfall in the central region of South Korea. Eng. Geol. 2021, 281, 105979. [Google Scholar] [CrossRef]
  11. Huang, F.; Huang, J.; Jiang, S.; Zhou, C. Landslide displacement prediction based on multivariate chaotic model and extreme learning machine. Eng. Geol. 2017, 218, 173–186. [Google Scholar] [CrossRef]
  12. Xiong, K.; Adhikari, B.R.; Stamatopoulos, C.A.; Zhan, Y.; Wu, S.; Dong, Z.; Di, B. Comparison of different machine learning methods for debris flow susceptibility mapping: A case study in the Sichuan Province, China. Remote Sens. 2020, 12, 295. [Google Scholar] [CrossRef]
  13. Zhou, X.; Wang, H.; Xu, C.; Peng, L.; Xu, F.; Lian, L.; Deng, G.; Ji, S.; Hu, M.; Zhu, H.; et al. Application of kNN and SVM to predict the prognosis of advanced schistosomiasis. Parasitol. Res. 2022, 121, 2457–2460. [Google Scholar] [CrossRef] [PubMed]
  14. Pham, V.H.S.; Nguyen, V.N. Cement transport vehicle routing with a hybrid sine cosine optimization algorithm. Adv. Civ. Eng. 2023, 2023, 2728039. [Google Scholar] [CrossRef]
  15. Shen, D.; Zhang, S.; Ming, W.; He, W.; Zhang, G.; Xie, Z. Development of a new machine vision algorithm to estimate potato’s shape and size based on support vector machine. J. Food Process Eng. 2022, 45, e13974. [Google Scholar] [CrossRef]
  16. Zhang, J.; Yu, Y.; Zhang, L.; Chen, J.; Wang, X.; Wang, X. Dig information of nanogenerators by machine learning. Nano Energy 2023, 114, 108656. [Google Scholar] [CrossRef]
  17. Zhou, J.; Huang, S.; Wang, M.; Qiu, Y. Performance evaluation of hybrid GA–SVM and GWO–SVM models to predict earthquake-induced liquefaction potential of soil: A multi-dataset investigation. Eng. Comput. 2021, 38, 4197–4215. [Google Scholar] [CrossRef]
  18. Bacanin, N.; Antonijevic, M.; Bezdan, T.; Zivkovic, M.; Rashid, T.A. Wireless sensor networks localization by improved whale optimization algorithm. In Proceedings of the 2nd International Conference on Artificial Intelligence: Advances and Applications: ICAIAA 2021, Jaipur, India, 27–28 March 2021; Springer Nature: Singapore, 2022; pp. 769–783. [Google Scholar]
  19. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
  20. Liao, K.; Wu, Y.; Miao, F.; Li, L.; Xue, Y. Using a kernel extreme learning machine with grey wolf optimization to predict the displacement of step-like landslide. Bull. Eng. Geol. Environ. 2020, 79, 673–685. [Google Scholar] [CrossRef]
  21. Mao, M.; Yang, H.; Xu, F.; Ni, P.; Wu, H. Development of geosteering system based on GWO–SVM model. Neural Comput. Appl. 2022, 34, 12479–12490. [Google Scholar] [CrossRef]
  22. Barthelemy, P.; Bertolotti, J.; Wiersma, D.S. A Lévy flight for light. Nature 2008, 453, 495–498. [Google Scholar] [CrossRef]
  23. Rong, F.; Dazhi, M.; Dashun, X. Research progress of statistical correlation analysis methods. Math. Model. Its Appl. 2014, 3, 1. (In Chinese) [Google Scholar]
  24. Wang, X.; Zhao, J.; Li, Q.; Fang, N.; Wang, P.; Ding, L.; Li, S. A hybrid model for prediction in asphalt pavement performance based on support vector machine and grey relation analysis. J. Adv. Transp. 2020, 2020, 7534970. [Google Scholar] [CrossRef]
  25. Guoqiang, Y.; Maosheng, Z.; Genlong, W.; Liang, P. Comparison and application of support vector machine and BP neural network in predicting average velocity of debris flow. J. Water Resour. 2012, 43, 105–110. (In Chinese) [Google Scholar]
  26. Ferentinou, M.; Fakir, M. Integrating rock engineering systems device and artificial neural networks to predict stability conditions in an open pit. Eng. Geol. 2018, 246, 293–309. [Google Scholar] [CrossRef]
  27. Wang, Y.J. Hazard Assessment on Rainstorm Induced Debris Flows in Beichuan County of Wenchuan Earthquake Affected Area; Chengdu University of Technology: Chengdu, China, 2009. (In Chinese) [Google Scholar]
  28. Markovic, S.; Bryan, J.L.; Ishimtsev, V.; Turakhanov, A.; Rezaee, R.; Cheremisin, A.; Kantzas, A.; Koroteev, D.; Mehta, S.A. Improved oil viscosity characterization by low-field NMR using feature engineering and supervised learning algorithms. Energy Fuels 2020, 34, 13799–13813. [Google Scholar] [CrossRef]
  29. Chen, P.Y.; Qiao, J.S.; Peng, Z.W.; Xie, K.; Yu, H. Screening of debris flow risk factors and risk evaluation based on rank correlation. Rock Soil Mech. 2013, 34, 1409–1415. (In Chinese) [Google Scholar]
  30. Ikeya, H.; Mizuyama, T. Flow and Deposit Properties of Debris Flow; Report; Public Works Research Institute: Tsukuba, Japan, 1982; pp. 157–162. [Google Scholar]
Figure 1. Grey wolf location update.
Figure 1. Grey wolf location update.
Water 15 04161 g001
Figure 2. IGWO-SVM Debris Flow Scale Prediction Modelling Process.
Figure 2. IGWO-SVM Debris Flow Scale Prediction Modelling Process.
Water 15 04161 g002
Figure 3. Debris flow outburst scale based on IGWO-SVM.
Figure 3. Debris flow outburst scale based on IGWO-SVM.
Water 15 04161 g003
Figure 4. Geological map of Beichuan County.
Figure 4. Geological map of Beichuan County.
Water 15 04161 g004
Figure 5. Distribution of debris flow in Beichuan County.
Figure 5. Distribution of debris flow in Beichuan County.
Water 15 04161 g005
Figure 6. Frequency distribution.
Figure 6. Frequency distribution.
Water 15 04161 g006
Figure 7. Correlation analysis.
Figure 7. Correlation analysis.
Water 15 04161 g007
Figure 8. Comparison of training and prediction.
Figure 8. Comparison of training and prediction.
Water 15 04161 g008
Figure 9. Four methods to predict data error frequency distribution.
Figure 9. Four methods to predict data error frequency distribution.
Water 15 04161 g009
Figure 10. Sensitivity index.
Figure 10. Sensitivity index.
Water 15 04161 g010
Table 1. The Basic Data Statistics Table of 72 Debris Flows.
Table 1. The Basic Data Statistics Table of 72 Debris Flows.
The Basic Data Statistics Table of 72 Debris Flows.
SamplesLoose Source Material Reserves
(103 m3)
Basin
Area
(km2)
Drainage Density
(km−1)
Basin Relative Relief
(km)
Shifting Bed Proportion
(%)
Main Channel Length
(km)
Chaimazigou#10.042.58.241.60.482.06
Shuxuegou39.0413.92.901.40.504.03
Yingtaogou#143.6510.33.781.40.723.89
Miaobagou728.207.85.261.460.854.10
Jinlongcun79.504.57.440.980.643.35
Hualingou385.9512.24.821.360.855.88
Wangjiashangou104.501.87.671.040.861.38
Xinzhigou#1240.45105.391.560.765.39
Chenjiabaogou50.201.97.161.10.481.36
Pijialianggou2.402.47.421.140.231.78
Xishanpogou15001.620.751.120.613.32
Renjiapinggou2420.514.60.460.840.73
Mofanggou160.700.813.630.660.721.09
Miaobagou6.607.53.811.380.392.86
Piankoxianggou#24.804.64.330.860.541.99
Xinzhigou#273.2021.83.592.040.427.82
Honglingou2.855.75.351.920.373.05
Chaimazigou#214.706.83.751.80.402.55
Qinglingou109.3023.23.232.30.617.49
Baishuihegou3510.64.011.680.474.25
Piankoxianggou#3160.34163.681251.040.515.89
Subaohegou603.56.431.240.652.25
Shuligou70.600.720.430.960.611.43
Xinigou40.530.719.4310.811.36
Tianbaigou163.3218.73.161.680.765.91
Piankoxianggou0.890.915.110.720.431.36
Lijiawangou601.212.080.860.411.45
Kaipingzhigou26.20113.200.60.621.32
Yuxuegou1016.400.814.380.880.861.15
Xiatongbaogou1967.9015.73.801.220.845.97
Sibapinggou378.2421.43.471.50.767.42
Zhibeigou1998.73.251.360.602.83
Yangliucun101.639.94.641.70.584.59
Yanghuziwangou40.201.212.080.820.811.45
Zhifanggou741.19.550.750.691.05
Yingtaogou#2119.3017.64.331.660.567.62
Sunjiagou15.552.710.701.220.452.89
Chayuanlianggou542.612.041.260.413.13
Hanjiashangou67.440.815.250.820.821.22
Baiguoshugou107.300.616.500.670.730.99
Weigou33.542.29.500.740.572.09
Weigou#2106.500.322.000.520.760.66
Madiwangou3.360.729.860.550.472.09
Huangjiawangou4.132.88.3910.472.35
Jingzhuyuangou51.801.19.000.590.460.99
Jiangjiagou12.140.523.000.920.521.15
Maoershi10.801.47.570.980.471.06
Subaogou5071.110.450.580.791.15
Liujiagou120.081.87.501.040.891.35
Daokaimengou15.983.18.190.840.512.54
Qingtangwangou303.55.140.820.751.80
Huangtulianggou11424.63.291.220.648.10
Guanmenzigou14.262.85.571.120.701.56
Shupinggou334.18.881.090.463.64
Dengjiacungou900.0322.25.121.70.4411.36
Qushanzhenggou2103.68.671.20.963.12
Guzhubagou1000.1075.741.220.874.02
Wangjiayangou4852.57.8810.811.97
Chenjiabagou931.2423.14.281.20.669.88
Tudilianggou12.2146.401.030.532.56
Tudimiaogou34.08163.691.280.395.91
Guaitangou0.0811.74.051.080.214.74
Dapingdigou16.805.45.591.460.403.02
Xiatongbaogou98.5022.73.371.860.767.66
Chanzipinggou67.202.55.281.020.831.32
Shangyantaigou17.501.511.671.240.91.75
Shuangyigou93.302.89.821.30.782.75
Shilonggou50.807.35.361.20.843.91
Yangjiawangou135.5726.43.201.80.678.46
Zhaojiawangou14.662.88.181.340.822.29
Dongxigou8.9510.93.781.50.574.12
Maliuwangou97.8217.13.761.280.706.43
Table 2. Parameter statistics.
Table 2. Parameter statistics.
Data TypeLoose Source Material Reserves
(103 m3)
Basin
Area
(km2)
Drainage Density
(km−1)
Basin Relative Relief
(km)
Shifting Bed Proportion
(%)
Main Channel Length
(km)
Debris Flow Scale
minimum value0.040.310.680.460.210.666.3
maximum value1966.926.444.062.30.9611.36152.83
average value195.937.1022.181.180.633.4062.85
Table 3. Correlation Analysis.
Table 3. Correlation Analysis.
Correlation Analysis
Correlation FactorDebris Flow Scale (103 m3)
Basin area/km20.920 **
Drainage density/1/km0.136
Basin relative relief/km0.778 **
Shifting bed proportion/%−0.154
Main channel length/km0.766 **
Note: ** p < 0.01.
Table 4. Linear Regression Analysis Results.
Table 4. Linear Regression Analysis Results.
Linear Regression Analysis Results
Unstandardized CoefficientsStandardized CoefficienttpVIFR2Adjust R2F
BStandard ErrorBeta
constant14.8187.171-2.0660.044 *-0.9040.898F (3,46) = 144.282
p = 0.000
Basin area10.3341.0351.5389.9880.000 **11.354
Basin relative
relief
39.3297.2510.3855.4240.000 **2.414
Main channel
length
−21.3773.322−0.993−6.4360.000 **11.396
Note: Dependent variable: debris flow scale; D-W: 2.149; * p < 0.05, ** p < 0.01.
Table 5. Prediction Error Analysis of Different Prediction Models.
Table 5. Prediction Error Analysis of Different Prediction Models.
Prediction Error Analysis of Different Prediction Models
NameRMSEMAER2
IGOW-SVR7.757.00.95
GOW-SVR7.807.60.94
SVR10.998.790.92
BPNN13.7014.470.83
Table 6. Prediction Model Consumption Time Comparison.
Table 6. Prediction Model Consumption Time Comparison.
Prediction Model Consumption Time Comparison
SVRBPNNGWO-SVRIGWO-SVRSVR
Time/s3.62265.45002.31411.7876
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, L.; Zhang, Z.; Zhao, D.; Qiang, Y.; Ni, B.; Wu, H.; Hu, S.; Lin, H. Debris Flow Scale Prediction Based on Correlation Analysis and Improved Support Vector Machine. Water 2023, 15, 4161. https://doi.org/10.3390/w15234161

AMA Style

Li L, Zhang Z, Zhao D, Qiang Y, Ni B, Wu H, Hu S, Lin H. Debris Flow Scale Prediction Based on Correlation Analysis and Improved Support Vector Machine. Water. 2023; 15(23):4161. https://doi.org/10.3390/w15234161

Chicago/Turabian Style

Li, Li, Zhongxu Zhang, Dongsheng Zhao, Yue Qiang, Bo Ni, Hengbin Wu, Shengchao Hu, and Hanjie Lin. 2023. "Debris Flow Scale Prediction Based on Correlation Analysis and Improved Support Vector Machine" Water 15, no. 23: 4161. https://doi.org/10.3390/w15234161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop