Next Article in Journal
Spatio-Temporal Study on Irrigation Guarantee Capacity in the Northwest Arid Region of China
Previous Article in Journal
Fast Prediction of Urban Flooding Water Depth Based on CNN−LSTM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inversion of Nutrient Concentrations Using Machine Learning and Influencing Factors in Minjiang River

1
College of Earth Science, Chengdu University of Technology, Chengdu 610059, China
2
School of Surveying and Geo-Informatics, Sichuan Water Conservancy Vocational College, Chengdu 611200, China
3
Sichuan Water Conservancy Innovation and Development Research Institute, Chengdu 611200, China
4
School of Physics, University of Science Malaysia, Penang 11800, Malaysia
5
China Building Materials Southwest Survey and Design Co., Ltd., Chengdu 610052, China
*
Author to whom correspondence should be addressed.
Water 2023, 15(7), 1398; https://doi.org/10.3390/w15071398
Submission received: 13 March 2023 / Revised: 25 March 2023 / Accepted: 29 March 2023 / Published: 4 April 2023

Abstract

:
Remote sensing is widely used for lake-water-quality monitoring, but the inversion of the total nitrogen (TN) and total phosphorus (TP) of rivers and non-optical parameters is still a difficult problem. The use of high spatial and temporal resolution multispectral imagery combined with machine learning techniques is an effective solution for this difficulty. Three machine learning methods based on support vector regression (SVR), neural network (NN) and random forest (RF) were used to invert TN and TP using actual water-quality measurement data and Sentine-2 remote-sensing images, and analyzed the factors influencing water quality in terms of pollutant emissions and land use. The results show that RF performs the best in both TN (R2 = 0.800, RMSE = 0.640, MSE = 0.400, MAE = 0.480) and TP (R2 = 0.830, RMSE = 0.033, MSE = 0.001, MAE = 0.022) inversion models, and that the optimal selection of feature variables improves model performance. The TN and TP concentrations in the Minjiang River Meishan Water Function Development Zone were the highest in the downstream section and in 2018. Analysis of the factors influencing water quality shows that pollution sources and amounts were closely related to land-use types, and land use in riparian zones at different spatial scales had different degrees of impact on water quality.

1. Introduction

Water is an incredibly important resource [1]. As an important water resource and carrier, inland rivers are the green lifeline of ecosystems and are related to the survival and development of human beings [2]. The decline in river water quality has become a global concern with the threat of major changes in the hydrological cycle brought about by population growth, urban construction, industry, agricultural development and climate change [3]. The most prevalent water-quality problem is eutrophication, with total nitrogen (TN) and total phosphorus (TP) being the main causal factors for eutrophication [4]. Therefore, the survival and development of human beings require the continuous monitoring of TN and TP status in inland rivers.
Traditional water-quality monitoring methods are time-consuming, laborious and costly. Remote sensing plays an important role in water-quality inversion and monitoring with its advantages of high efficiency, wide monitoring, low cost and ease of long-term dynamic monitoring [5]. Water-quality remote-sensing inversion can be divided into water color remote sensing and non-water color remote sensing, according to the optical and physical characteristics of water-quality parameters [6]. Scholars have studied more remote-sensing inversions of water color [7,8,9,10] and less remote-sensing inversions of non-optical water-quality parameters, which refer to parameters without significant spectral or optical features, such as TN, TP, etc. Therefore, estimating non-optical parameters from spectral features is difficult [11]. However, these non-optical water-quality parameters are related to eutrophication in water bodies and their studies are of great importance for environmental assessment and human health [12,13,14]. With the advent of Landsat TM, Landsat ETM+, SPOT HRV, MODIS, IKONOS, QuickBird and more hyperspectral high-resolution sensors, studies on non-optical water-quality parameters such as TN and TP are becoming more and more advanced, but there are very few studies on TN and TP inversions of inland rivers using the Sentinel-2 remote sensing satellite. Sentinel-2 is capable of providing multispectral data (13 bands) with a high spatial resolution (10 m, 20 m and 60 m), and the revisit period is shortened to 5 days for the simultaneous operation of the two satellites [15,16]. Due to the narrow waters of inland rivers, fast-changing water-quality conditions and complex optical composition, high spatial and temporal resolution multispectral satellite sensor data are required, and Sentinel-2 can meet all these requirements simultaneously.
As an important ecological barrier and water-supporting area in the upper reaches of the Yangtze River [17], Minjiang River is a representative of inland rivers. The Minjiang River Meishan Level 1 Water Function Development and Utilisation Area is the longest and more developed basin of Minjiang River, which is a typical example of a key water function development and utilization area in China. The area flows through many cities and has developed industries and agriculture within the watershed. Problems such as urban discharge, industrial discharge and agricultural surface pollution have led to unavoidable exceedances of nutrient standards such as TN and TP in this river. Therefore, long-term and regular water-quality monitoring in this area is particularly important to safeguard the health of Minjiang River and the stability of the Yangtze River ecosystem. In the current inland-river-water-quality inversion studies [18,19,20], there is a lack of studies applicable to Minjiang River and key water function development and utilization areas. The study on the Minjiang River Meishan Water Function Development and Utilisation Area is representative and important for inland-river-water-quality monitoring.
Previous studies have confirmed that empirical models are feasible when statistical relationships are established between TN, TP concentrations and reflectance values [21,22]. Most of these models are based on linear, exponential or logistic regression, and machine learning algorithms have excelled in improving performance. With the development of artificial intelligence and data science, scholars have used different machine learning methods such as support vector regression (SVR), neural network (NN) and random forest (RF) for water-quality monitoring in lakes and seas to construct inverse models of non-optical water-quality parameters such as TN and TP, and all of them have shown good performance [23,24,25]. However, the physical characteristics of the river’s narrow strip flow have limitations and difficulties in the selection and processing of images. The variation in water depth, water temperature and flow velocity of rivers at different times of the year leads to a more complex inversion of their water-quality parameters. In most studies, scholars mainly concentrated on constructing water-quality parameter regression models using different machine learning methods, while ignoring the monitoring of section location characteristics and seasonal variation characteristics. These studies also lacked experiments on the correlation between the reflectance of each waveband of the water body, the combination of wavebands and the output results. The scientific selection of model characteristic variables can further improve the performance of the model.
In the above context, this study addresses the complexity of the non-optical parameter inversion of inland rivers, using high spatial and temporal resolution Sentine-2 remote-sensing imagery and machine learning methods to achieve the remote-sensing inversion of TN and TP concentrations in Minjiang River Meishan Water Function Development and Utilisation Area. Remote-sensing methods for monitoring water quality provide regular access to water quality; how to improve the applicability of water-quality monitoring is the key to real-time water-quality monitoring technology [26]. This study conducts correlation experiments on data such as the spatial, temporal, reflectance of each waveband and the waveband combination of monitoring sections; the machine learning algorithms and parameters are optimized. This enhances the performance of the water-quality inversion model and improves the applicability of water-quality monitoring. Additionally, it provides a reference for water environment management and water-quality improvement, and lays the foundation for subsequent in-depth research combined with real-time water-quality monitoring.

2. Materials and Methods

2.1. Study Area

The Minjiang River is an important first-class tributary of the upper reaches of the Yangtze River. Located at the western edge of the abdominal region of Sichuan Basin in China, it lies between 102°35′–104°40′ E and 28°25′–33°9′ N with a total length of 753 km. The Minjiang River Meishan Water Function Development and Utilisation Area is the longest water function development and utilization area on the main stream of Minjiang River and is also an important water function development and utilization area in China. It lies between 103°53′13″–103°49′34″ E and 30°16′8″–30°0′54″ N, with a total length of 41.2 km.
The study area had three major monitoring cross-sections before 2017, and with the introduction of management practices related to water function areas, monitoring of water quality in water function areas had been strengthened to pass water-quality compliance assessments. Since January 2017, the Minjiang River Meishan Water Function Development and Utilisation Area had been increased to a total of 10 water-quality monitoring sections (M1–M10); the specific location is shown in Figure 1. The monitoring sections were routinely sampled and monitored every month, and the government and relevant departments were regularly informed of the water-quality status of the water function areas, with a significant improvement in water quality in 2019 and a gradual reduction in water-quality monitoring sections in 2020.

2.2. Data Sources

The actual water-quality monitoring data were obtained from the Meishan Water Affairs Bureau. TN and TP concentrations were obtained using standard analytical methods: UV spectrophotometry with alkaline potassium persulphate and ammonium molybdate spectrophotometry with potassium persulphate, respectively. The TN and TP monitoring data of the study area between the key monitoring years of 2017 and 2019, including the monthly monitoring data of eight monitoring sections and the quarterly monitoring data of two monitoring sections, totaling 312 sets, were selected as the actual measurement data for this study.
Remote-sensing data from Sentinel-2A and Sentinel-2B satellites with multispectral imager acquired the images. When both Sentinel-2A and Sentinel-2B satellites are in operation, they can complete a complete image of the Earth’s equatorial region every five days. The simultaneous use of 2 types of satellite imagery makes it easier to obtain images that match the timing of water-quality collection. Sentinel-2 satellite images are available in 13 bands with a spatial resolution of 10–60 m. A total of 16 images from January 2017 to December 2019 were acquired at the Copernicus Open Access Hub (https://scihub.copernicus.eu/dhus/#/home, accessed on 4 November 2022). These images and water-quality data were collected in less than 72 h. The pre-processing of images was carried out in the ENVI 5.3 software for geometric correction, atmospheric correction, resampling, etc. The reflectance of 10 monitoring sections on 12 bands (except Band10) was extracted in the software, resulting in a total of 144 sets of valid reflectance data.
Data for the analysis of pollutants entering the river are from the 2018 Meishan Pollution Source Statistics. The scope of counting pollutants entering each river is the administrative area of the township where the river is located. Sentinel-2 satellite images of good quality in 2018 were selected, interpreted to obtain land-use maps, and the area of each land type and total area were counted to calculate the proportion of each land type in different buffer zones.

2.3. Methods

2.3.1. Correlation Analysis of Model Characteristics Variables

To improve model performance, correlation experiments were conducted to determine model characteristic variables before modeling [27]. Correlation analysis is an exploratory analysis at the front end—it can explore the relationship between variables and their properties. The results of the correlation analysis are used to guide the next step in the approach, and they are the foundation for data mining. Correlation analysis was carried out using the Pearson and Spearman’s correlation analysis in SPSS Statistics 27 software. First, TN and TP concentrations were correlated with every single band to extract single-band modeling variable factors. Previous studies have shown that remote-sensing indices have better sensitivity than single spectra [28] and that TN, TP and chlorophyll correlate better [29]. Therefore, we used the chlorophyll-related Normalized Difference Vegetation Index (NDVI) and the ratio of bands for correlation analysis. Correlation coefficients and two-tailed significance jointly evaluate the correlation of variables. In general, the closer the absolute value of the correlation coefficient is to 1, the greater the correlation; however, the correlation coefficient is also influenced by the number of data sets. Two-tailed significance means that the occurrence of the correlation coefficient is statistically significant generally and not by chance [30]. Correlation coefficients and two-tailed significance together are more scientific in evaluating correlations.
Water reflectance is also influenced by water depth, temperature, and the location of monitoring points, so these extrinsic influences were also analyzed in correlation with TN and TP concentrations.

2.3.2. Support Vector Machines

The support vector machine (SVM) is a supervised learning algorithm developed based on the structural risk minimization principle. It seeks the best compromise between model complexity and learning ability based on the limited sample information to obtain good generalization [31]. Support vector machine regression (SVR) is a support vector machine used to solve regression problems [32]. For non-linear problems, SVR uses a non-linear mapping function to map the sample to a high-dimensional linear space, and then constructs a regression function in this space [33]; the formula is as follows:
f x , w = w Φ x + b
where x is the sample input, Φ x is the non-linear mapping, w is the weight vector, and b is the threshold value.
For a given training data set: y 1 , x 1 , y 2 , x 2 , y l , x l , using an insensitive loss function: ε , then its constrained optimization problem can be expressed as:
min w 1 2 w 2 + C i = 1 l ( ξ i + ξ i ) , i = 1 , 2 , , n s . t . y i w Φ ( x ) b ε + ξ i w Φ ( x ) + b y i ε + ξ i ξ i , ξ i 0
The optimization problem of Equation (2) can be transformed into a dyadic problem by introducing a Lagrangian function. The solution to Equation (1) is obtained by solving the pairwise problem.
f ( x ) = i = 1 n s v ( α i α i ) K ( x i , x ) + b
where α i , α i ( i = 1 , 2 , , l ) is the Lagrange multiplier, their corresponding samples are the Support Vector, s v ; n s v is the number of support vectors; and K ( x i , x ) is the kernel function. Radial basis kernel functions are usually used.
K ( x i , x ) = e x p ( λ x x i 2 )
where λ is the nuclear parameter.
In this study, the Matelab R2020a platform was used to run the SVR. During the regression model construction, the parametric optimization functions were used to find the optimal penalty parameters and kernel function parameters to improve SVR model learning.

2.3.3. Neural Networks

Neural networks (NN) is a machine learning technique that mimics neural networks of the human brain to achieve artificial intelligence, including an input layer, a hidden layer, and an output layer [34]. In traditional neural networks, neurons cannot store information, they cannot “remember” and are prone to overfitting problems. Recurrent neural networks (RNNs) are special neural network structures that allow for information to be passed between neurons; long short-term memory (LSTM) networks are a special type of RNN model [35]. This can solve the gradient disappearance and the dimensional disaster problems arising from the RNNs gradient backpropagation process and is well suited for learning from experience [36]. The selection of hyperparameters in the NN regression model has a significant impact on the performance of the model. The Grey Wolf Optimizer (GWO) is a heuristic optimization algorithm that mimics the social life of wolves, and it has the characteristics of fast convergence and global traversal. The GWO-LSTM model has better prediction and convergence [37]. This paper, therefore, uses the GWO-LSTM model in TN and TP neural network regression, which runs on a platform in Matelab R2020a.

2.3.4. Random Forest

Random forest (RF) is an integrated learning algorithm based on classification tree (CT) [38]. In the process of generating numerous decision trees, samples of the modeled dataset and characteristic variables are sampled separately in the case of random. Each sampling result is a tree, and each tree generates rules and judgment values that match its properties; the forest eventually integrates the rules and judgment values of all decision trees to achieve the regression of the random forest algorithm [39].
The snake optimizer (SO) algorithm was introduced in the RF regression model construction. SO was proposed by Hashim et al. in 2022 and was inspired by the foraging and mating behavior of snakes. In this study, the model was used in the Matlab R2020a platform and the results showed that the SO algorithm is superior in terms of the number of iterations, solution time, and the adaptation value curve, and has an excellent convergence performance [40].

2.3.5. Model Accuracy Evaluation Indicators

A learner, whether it is a classification or regression problem, measures how good a learner is in two main ways, bias and variance [41]. Typically, bias represents the accuracy of a model and variance represents the generalization ability of model. When training a model, both aspects have to be taken into account at the same time; if the bias is too large, the model is too inaccurate, and if the variance of the model is too large, the model is too poorly generalized [42].
In this paper, the mean squared error (MSE), root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2) are used to evaluate a models’ performance, as shown in Table 1.

3. Results

3.1. Characteristic Variable Selection Results

Using the Pearson and Spearman’s correlation analysis, a total of 144 sets of experimental data were correlated between TN, TP concentrations and 12 bands of reflectance. P-TN, P-TP denote the Pearson correlation coefficients of TN, TP. S-TN, S-TP denote Spearman’s correlation coefficients of TN, TP. The results of the analysis showed that B11, B12 were highly correlated in P-TN and S-TN, with B12 being the highest. B8, B9, B11 and B12 were highly correlated in P-TP and S-TP, with B9 being the highest. The single-band correlation analysis by both methods confirmed the superiority of Spearman’s over the Pearson correlation analysis for the study area data; this shows that the data of this experiment are non-linearly correlated and ordered with each other. Both methods show that the overall single-band reflectance correlation of TP is higher than that of TN, as shown in Figure 2.
Although some of the single bands were correlated with TN and TP concentrations and showed a significant correlation in the two-tailed examination, the correlation coefficients were low and there were too few correlated bands. Thus, based on the results of single-band correlation studies and the experience of previous studies, correlations between band combinations and TN, TP concentrations were established. N and P are important elements for chlorophyll synthesis [43], and the ratio of bands was commonly used in previous studies to establish correlations with TN, TP concentrations. Therefore, repeated experiments were conducted to explore the correlation between the NDVI (B5 − B4/B5 + B4); band ratio operations; and TN, TP concentrations. The experimental results showed that TN correlated highest with B12/B5 and NDVI, and TP correlated highest with B5/B4 and NDVI.
Correlation analysis of the external influences on albedo shows that the water period and location have a strong influence on the experiment. It was finally determined that the TN model feature variables were selected from temporal information, spatial information, B12, B12/B5 and NDVI; the TP model feature variables were selected from temporal information, spatial information, B9, B5/B4 and NDVI.

3.2. Model Validation and Accuracy Evaluation

We used a total of 144 sets of valid data from 10 monitoring sections—TN and TP characteristic variables were used as the inputs to the SVR, NN and RF models; TN and TP concentrations as the model outputs; 80% of the data as the training set and 20% of the data as the test set. The models were validated separately and the regression results of all three machine learning models showed good performance. SVR was average in the TN (R2 = 0.690, RMSE = 0.780, MSE = 0.610, MAE = 0.610) and TP (R2 = 0.720, RMSE = 0.040, MSE = 0.002, MAE = 0.027) inversion models; NN was compared to SVR, and the TN (R2 = 0.740, RMSE = 0.710, MSE = 0.500, MAE = 0.540) and TP (R2 = 0.770, RMSE = 0.038, MSE = 0.001, MAE = 0.029) inversion models had significantly better accuracy; among the three machine learning methods, RF performed significantly better in the TN (R2 = 0.800, RMSE = 0.630, MSE = 0.400, MAE = 0.480) and TP (R2 = 0.830, RMSE = 0.033, MSE = 0.001, MAE = 0.022) inversion models, as shown in Figure 3.
The results of the model experiments show that the selection of model feature variables is useful for model performance improvement. SVR has a beneficial effect on model performance by finding the optimal penalty parameter c and kernel function parameter g in terms of optimizing hyperparameters. The LSTM in neural network models improves the gradient vanishing problem in RNNs, and the GWO optimization of hyperparameters improves the convergence of the model. However, in this set of experimental data, SVR still exhibits problems such as insensitivity to the selection of kernels, and the NN model suffers from problems such as sample dependency, resulting in overfitting of the training set and poor fitting of the test set in both models. RF builds on the integration of decision trees as the base classifier and further introduces random attribute selection in the training process of decision trees. It can balance errors for unbalanced, non-linear and poorly characterized data sets. Accuracy can be maintained when a large proportion of features are missing, and the model shows a high generalization ability [44]. The introduction of the SO algorithm to the RF regression model resulted in better iterations, adaptation value curves, and accuracy evaluation than before its introduction, with better fits to both the training and test sets.

3.3. Spatial Distribution Characteristics

Water-quality-parameter concentrations show different characteristics in different seasons due to factors such as water quantity and temperature changes [45]. TN and TP in water bodies mainly originate from various point and non-point source pollutants, so the spatial distribution characteristics also show a correlation with pollution sources [46].
TN concentrations show consistency in spatial distribution characteristics and variability across seasons in the water function areas. TN concentrations maintain a high consistency throughout the water function area, especially during the wet season (e.g., July and September). During the dry season (e.g., January and March), TN occasionally occurs in high concentrations at downstream monitoring cross-sections. The spatial distribution of the TN concentration indicates that the study area has low rainfall during the dry season and is mainly affected by point source pollution, resulting in a high TN concentration at individual monitoring sections. In contrast, during periods of the wet season, it is mainly influenced by non-point source pollution, showing a consistency of TN concentrations throughout the river, as shown in Figure 4.
TP concentrations are significantly higher in the dry season than in the wet season. High TP concentrations are due to low water levels, low flows, and no reduction in human activity during the dry season. This indicates that TP concentration is highly correlated with water level, water volume, temperature and other factors. The impact of pollution sources is similar to that of TN, with some monitoring sections being high during the dry season and more balanced overall during the wet season, as shown in Figure 5. In half of the 12 periods analyzed, concentrations in the lower reaches of the river were above 0.2 mg/L and in some cases exceeded 0.3 mg/L, exceeding the surface water environmental quality category IV standard, as shown in Figure 6. This is highly correlated with nearby land-use conditions and human activities, with urbanization and industrial development in the downstream section, and pollutants entering the river from tributaries becoming important influences on changes in water-quality parameters.

3.4. Temporal Distribution Characteristics

The temporal trends in TN and TP concentrations are generally consistent; concentrations were significantly higher in 2018 and then fell below 2017 levels in 2019, with an overall downward trend; this is related to the construction of urbanization and industrialization in the water function area during that period. The changes in the 10 monitoring sections are relatively consistent; the TN and TP concentrations are especially evenly distributed during the wet season, with very few localized high or low cases. This indicates that rainfall is abundant during the wet season and that rivers are mainly affected by non-point source pollution. During the dry season, there is a characteristic change from low in January to high in March, especially in 2018 when TN and TP concentrations were very low in January, but suddenly high in March. There is a gradual rise in local temperatures and increased human activity in March, so the concentrations increased significantly compared to January, as shown in Figure 7.
The comparison between dry and wet seasons shows that the TN and TP concentrations were consistent in the monitoring sections during the wet season and were locally high in the dry season. This suggests that the overall water quality changes are better characterized during the wet season and that abnormal changes at local monitoring sites are better detected during the dry season. From the point of view of pollution sources, the dry season is seriously affected by point-source pollution, while the wet season is mainly affected by non-point-source pollution.

4. Discussion

4.1. Evaluation of the Impact of the Characteristic Variables on the TN and TP Models

In previous studies of TN and TP models based on machine learning [23,24], Li et al. constructed TN and TP inversion models using four machine learning methods in a marine water quality study, using all single-band reflectance of Landsat remote-sensing images as input variables. The results showed that seasonal changes affect the inversion accuracy of machine learning [24]. Zhang et al. conducted a correlation study on the input variables of the inversion model of TN and TP in Dongting Lake. The study selected sensitive wavebands and combinations for modeling and analyzed the correlation between water level, flow, precipitation and temperature data and TP, TN variables. However, the study did not participate in modeling as a characteristic variable [23]. The input of model feature variables in this paper considers the effects of water depth and temperature caused by seasonal changes. When building machine learning TN and TP models for inland rivers with complex optical features, time and spatial information, sensitive bands and combinations are selected as model input variables to reduce the influence of extraneous factors such as seasons on model construction.
We compared all single bands (ASB), sensitive bands and combinations (HSB), the sensitive bands and combinations combining temporal and spatial information (IHSB) in TN, TP machine learning models. The comparison results show that the accuracy IHSB > HSB > ASB in the SVR model, the accuracy ASB > IHSB > HSB in the NN model, and the accuracy IHSB > HSB > ASB in the RF model; the conclusions of the TN and TP models are consistent. Although the ASB model has the highest overall accuracy in NN, it suffers from over-fitting and has a large prediction error. This is caused by the involvement of noisy data in the fitting, showing the weak generalization ability of the model. Therefore, the input of characteristic variables has an impact on the modeling accuracy. According to the characteristics of inland-river-water-quality data and model features, optimizing the input variables is an effective way to improve the model’s performance and accuracy. We selected temporal and spatial information, sensitive bands and combinations as model input variables, and compared them with input variables from previous research methods in three modeling approaches. The results showed that our selected model feature variables performed best.
Water level, flow, precipitation and temperature have strong seasonal and regional variations. Temporal and spatial information can record their variability characteristics well and can be used as model variable inputs to improve model performance. From the analysis of water-quality influencing factors, water level, flow and so on are again the influencing factors of water quality. These factors can be quantitatively analyzed in terms of their relevance to water quality and the extent of their impact. However, pollutants entering rivers are the source of the decline in water quality and directly affect them. Pollutants originate from human activity and areas with high land use that have high human activity. Therefore, land use indirectly reflects the pressures carried by river water quality.

4.2. Direct Impacts of Incoming Pollutants on Water Quality

The spatial and temporal variability of water-quality parameters shows that water-quality-parameter concentrations vary somewhat across river reaches and exhibit different variability characteristics across water periods and years. There are many direct and indirect factors affecting these variability characteristics. Incoming pollutants directly affect water-quality conditions [47]. Therefore, pollution sources and their discharges are important influences on the variability of water-quality-parameter concentrations.
The study area flows through cities and towns, and there are cases of pollution by the direct discharge of domestic sewage from them. The construction of domestic sewage treatment facilities in these towns is lagging. Moreover, the urban areas have not yet fully realized the rainwater and sewage drainage system, and the domestic sewage collection and treatment rate of urban residents is not high. The large-scale livestock and poultry farming industry is more developed along the river, and most of the manure and wastewater from the farms (households) have not yet been treated to meet the discharge standards. Some of them are illegally discharged into the rivers, which is the main source of the formation of black smelly water bodies. According to the survey and relevant information, there are eight large industrial areas and more small enterprises and workshops within the rural areas on both sides of the river. There are cases of substandard sewage treatment and discharge, forming black smelly water bodies that pollute Minjiang River. There is a large amount of agricultural land along the river—the residues of chemical fertilizers and pesticides applied in agricultural production, and the wastewater discharged from ponds fertilizing fish farming, flow into the river with rainwater or irrigation residual water and pollute the water bodies. Therefore, urban domestic sewage pollution, livestock and poultry manure discharge pollution, enterprise production wastewater pollution and rural surface source pollution are the main sources of pollution affecting water-quality parameters in the study area. These pollutants directly affect the water-quality situation and the analysis of pollutant discharges is very important.
To capture water-quality pollution and nutrient status, sources and emissions of the pollutants chemical oxygen demand (COD), ammonia nitrogen (NH3-N) and TP were monitored and counted in the study area during key monitoring years. COD is a specific composite indicator that characterizes organic pollution in environmental water samples and is a major source of eutrophication in influenced water bodies [48]. TN includes NH3-N, nitrite nitrogen and nitrate nitrogen; NH3-N contributes the most to eutrophication in water bodies [49], so NH3-N emissions can indirectly respond to the trophic contribution of TN to water quality.
Based on the results of the inversion and time variation analysis, it is evident that the nutrient concentration was highest in 2018. COD, NH3-N and TP are important pollutants that affect nutrient concentrations in rivers. Pollutants in the study area mainly originate from the main stream of Minjiang River and three important tributaries, Simon River, Liquan River and Mao River. Therefore, the sources and emissions of COD, NH3-N and TP emissions in these four river basins were analyzed. The results of the analysis showed that the three parameters of urban domestic sewage, livestock and poultry manure discharge and enterprise production wastewater pollution were the highest in Minjiang River, in addition to rural surface source pollution, which is related to the dense population, industrial development and urbanization in the functional area of Minjiang River. The Mao River has the highest agricultural pollution emissions for the three indicator parameters. Industrial pollution emissions are extremely small. The analysis of the land-use situation is related to the very high proportion of agricultural land in the Mao River basin, which does not involve urban and industrial land-use situations. The Simon and Liquan rivers are mainly concentrated on livestock and poultry manure emissions and rural surface pollution. These two rivers have more villages and agricultural land in their watersheds. Livestock and poultry manure discharge is a common phenomenon due to the imperfection of rural sewage systems, as shown in Figure 8. The results of the above analysis can provide a reference basis for the government’s pollutant management program and land-use planning for different river basins.

4.3. Indirect Impacts of Land Use on Water Quality

Land use is the use of terrestrial space by humans for economic, residential, recreational, conservation and governmental purposes [50]. Land use often alters watershed hydrological processes and nutrient transport, and indirectly causes the degradation of river water quality [51], thus affecting watershed ecology.
The results of the analysis of riverine pollutants and related studies show that riverine pollutants directly affect river water quality. Additionally, there is a correlation between land-use type and riverine pollutants [52]. Construction land includes towns, villages and factories. Pollution from domestic sewage in towns and cities originates from towns and cities. Pollution from livestock and poultry manure emissions mainly originates from villages. Pollution from production wastewater from enterprises mainly originates from factories. Therefore, the proportion of construction land in the watershed is positively correlated with the number of pollutants discharged. Agricultural land includes paddy fields, dry land, orchards, etc. Rural surface pollution mainly comes from pesticides sown on agricultural land, etc. Therefore, the proportion of agricultural land is positively correlated with the amount of rural surface source pollution. Forests and grasses have functions in water conservation, water-quality purification and soil conservation, and have a positive protective effect on the water ecology. Therefore, the proportion of forest and grassland use is negatively correlated with pollutants entering the river. Land use at different spatial scales affects water quality to different degrees, so studying land use in riparian zones at different spatial scales is useful for understanding the indirect effects of riparian zones on water quality at different spatial scales.
We used remote-sensing imagery acquired in 2018 to interpret land-use maps within 5 km on both sides of the main stream shoreline in the study area. This was followed by buffering the river shoreline extent. Subsequently, land-use maps at different spatial scales were extracted. Finally, the statistical analysis of each land-use type and its share in different buffer zones was carried out. It was found that there was a more significant change in land type for every 0.5 km increase in buffer, so land use was counted for every 0.5 km increase in buffer. The results show that the 5 km buffer zone is dominated by construction land and agricultural land, which together account for 75% on average, with the proportion of construction land gradually increasing with the expansion of the riverbank, reaching a peak of 39% in the 2.5 km buffer zone, and then gradually decreasing; the proportion of agricultural land gradually increases with the expansion of the riverbank, reaching 51% in the 5 km buffer zone. The average proportion of forested land in the 5 km buffer zone is less than 10%, with the highest proportion of grassland in the 0.5 km buffer zone, but only 6%, gradually decreasing as the extent increases, and the proportion of forested land remains stable at different spatial scales, with an average of only 7%, as shown in Figure 9.
The results of land use at different spatial scales show that the 5 km buffer zone in Minjiang Meishan Water Function Development and Utilisation Area has a high land-use rate. The 2.5 km buffer zone has the greatest impact on water quality due to pollution from urban domestic sewage and enterprise production wastewater caused by construction land. The 5 km buffer zone has an increasing impact on water quality due to pollution from rural surface sources caused by agricultural land, and the proportion of construction land in this zone has been fluctuating in a high range. It is therefore confirmed that the main sources of pollution in the study area come from rural surface source pollution, urban domestic sewage pollution, livestock and poultry manure discharge pollution and enterprise production wastewater pollution. In summary, the high utilization rate of land in the Minjiang Meishan Water Function Development and Utilisation Area poses a greater threat to the water environment, with the 2.5 km buffer-zone land-use status having the most significant impact on water quality. The land-use rate within 0.5 km on both sides of the main stream is low, and the forest and grassland are concentrated on both sides of the river bank; the discharge into the river within this range is small, which has the weakest impact on water quality, and also fully reflects the ecological protection of the shoreline.

5. Conclusions

We used the measured water-quality data and Sentine-2 remote-sensing images of the Minjiang River Meishan Water Function Development and Utilisation Area during the important monitoring years (2017–2019), and constructed three machines learning TN and TP inversion models of SVR, NN and RF. The results show that machine learning with an optimal selection of feature variables and Sentine-2 remote-sensing imagery are feasible for the water-quality inversion of non-optical parameters of inland rivers. In addition, RF showed better performance (TN: R2 = 0.800, RMSE = 0.640, MSE = 0.400, MAE = 0.480; TP: R2 = 0.830, RMSE = 0.033, MSE = 0.001, MAE = 0.022) compared to the other models (SVR, NN). We found that TN and TP concentrations in the study area were the highest in 2018, that incoming pollutants have a greater impact on river water quality, that land use in riparian zones at different spatial scales has different degrees of impact on water quality, and that the high utilization of land in the study area poses a greater threat to the water environment. Our work provides a basis for the management of water ecology and land-use planning. Moreover, our work provides a methodological reference for the inversion of TN and TP nutrients in inland rivers in terms of the idea of optimizing the selection of model-building feature variables. In addition, we further confirm that remote sensing is a reliable tool for long-term water-quality monitoring and the analysis of its change factors.

Author Contributions

Conceptualization, Z.T. and W.L.; methodology, Z.T. and J.R.; formal analysis, Z.T. and S.L.; resources, J.R., R.Z., T.S. and S.L.; writing—original draft preparation, Z.T.; writing—review and editing, W.L., T.S. and S.L.; visualization, S.L.; supervision, S.L.; project administration, Z.T. and J.R.; funding acquisition, Z.T. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Plan Project of Sichuan Province (Grant No. 2021YJ0369), Chengdu water ecological civilization construction research key base (Grant No. SST2021-2022-15 and SST2021-2022-25), Chengdu University of Technology Postgraduate Innovative Cultivation Program (Grant No. CDUT2022BJCX001).

Data Availability Statement

Water-quality monitoring data from Meishan City Water Bureau monthly routine water-quality monitoring data. A total of 16 images of Sentinel 2 from January 2017 to December 2019 were acquired at the Copernicus Open Access Hub (https://scihub.copernicus.eu/dhus/#/home).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, Y.; Xu, M.; Li, X.; Qi, J.; Zhang, Q.; Guo, J.; Yu, L.; Zhao, R. Hydrochemical Characteristics and Multivariate Statistical Analysis of Natural Water System: A Case Study in Kangding County, Southwestern China. Water 2018, 10, 80. [Google Scholar] [CrossRef] [Green Version]
  2. Makinde, A. Environment and Sustainable Development; Springer: Delhi, India, 2013. [Google Scholar]
  3. Tian, Q.; Xu, K.H.; Dong, C.M.; Yang, S.L.; He, Y.J.; Shi, B.W. Declining sediment discharge in the Yangtze River from 1956 to 2017: Spatial and Temporal changes and their causes. Water Resour. Res. 2021, 57, e2020WR028645. [Google Scholar] [CrossRef]
  4. Ulrich, A.E.; Malley, D.F.; Watts, P.D. Lake Winnipeg Basin: Advocacy, challenges and progress for sustainable phosphorus and eutrophication control. Sci. Total Environ. 2016, 542, 1030–1039. [Google Scholar] [CrossRef]
  5. Mouw, C.B.; Greb, S.; Aurin, D.; DiGiacomo, P.M.; Lee, Z.; Twardowski, M.; Binding, C.; Hu, C.; Ma, R.; Moore, T.; et al. Aquatic color radiometry remote sensing of coastal and inland waters: Challenges and recommendations for future satellite missions. Remote Sens. Environ. 2015, 160, 15–30. [Google Scholar] [CrossRef]
  6. Xu, Y.; Dong, X.Y.; Wang, J.J. Use of Remote Multispectral Imagine to Monitor Chlorophyll-a in Taihu Lake: A Comparison of Four Machine Learning Models. J. Hydroecology 2019, 40, 48–57. [Google Scholar]
  7. Li, S.; Song, K.; Wang, S.; Liu, G.; Wen, Z.; Shang, Y.; Lyu, L.; Chen, F.; Xu, S.; Tao, H.; et al. Quantifification of chlorophyll-a in typical lakes across china using Sentinel-2 MSI imagery with machine learning algorithm. Sci. Total Env. 2021, 778, 146271. [Google Scholar] [CrossRef]
  8. He, Y.; Gong, Z.; Zheng, Y.; Zhang, Y. Inland Reservoir Water Quality Inversion and Eutrophication Evaluation Using BP Neural Network and Remote Sensing Imagery: A Case Study of Dashahe Reservoir. Water 2021, 13, 2844. [Google Scholar] [CrossRef]
  9. Chen, Y.-P.; Fu, B.-J.; Zhao, Y.; Wang, K.-B.; Zhao, M.M.; Ma, J.-F.; Wu, J.-H.; Xu, C.; Liu, W.-G.; Wang, H. Sustainable development in the Yellow River Basin: Issues and strategies. J. Clean. Prod. 2020, 263, 121223. [Google Scholar] [CrossRef]
  10. Li, N.; Zhang, Y.; Shi, K.; Zhang, Y.; Sun, X.; Wang, W.; Huang, X. Monitoring water transparency, total suspended matter and the beam attenuation coefficient in inland water using innovative ground-based proximal sensing technology. J. Environ. Manag. 2022, 306, 114477. [Google Scholar] [CrossRef]
  11. Zhou, Y.; He, B.; Fu, C.; Xiao, F.; Feng, Q.; Liu, H.; Zhou, X.; Yang, X.; Du, Y. An improved Forel–Ule index method for trophic state assessments of inland waters using Landsat 8 and sentinel archives. GIScience Remote Sens. 2021, 58, 8. [Google Scholar] [CrossRef]
  12. Zhou, B.T.; Zhang, Y.Y.; Shi, K. Research progress on remote sensing assessment of lake nutrient status and retrieval algorithms of characteristic parameters. Natl. Remote Sens. Bull. 2022, 26, 77–91. [Google Scholar]
  13. Sagan, V.; Peterson, K.T.; Maimaitijiang, M.; Sidike, P.; Sloan, J.; Greeling, B.A.; Maalouf, S.; Adams, C. Monitoring inland water quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Sci. Rev. 2020, 205, 103187. [Google Scholar] [CrossRef]
  14. Niu, C.; Tan, K.; Jia, X.; Wang, X. Deep learning based regression for optically inactive inland water quality parameter estimation using airborne hyperspectral imagery. Environ. Pollut. 2021, 286, 117534. [Google Scholar] [CrossRef] [PubMed]
  15. Liu, H.; He, B.; Zhou, Y.; Yang, X.; Zhang, X.; Xiao, F.; Feng, Q.; Liang, S.; Zhou, X.; Fu, C. Eutrophication monitoring of lakes in Wuhan based on Sentinel-2 data. GIScience Remote Sens. 2021, 58, 776–798. [Google Scholar] [CrossRef]
  16. Pahlevan, N.; Smith, B.; Alikas, K.; Anstee, J.; Barbosa, C.; Binding, C.; Bresciani, M.; Cremella, B.; Giardino, C.; Gurlin, D.; et al. Simultaneous retrieval of selected optical water quality indicators from Landsat-8, Sentinel-2, and Sentinel-3. Remote Sens. Environ. 2022, 270, 112860. [Google Scholar] [CrossRef]
  17. Qin, J.; Yang, W.; Zhang, J. Assessment of Ecosystem Water Conservation Value in the Upper Minjiang River, Sichuan, China*. Chin. J. Appl. Environ. Biol. 2009, 15, 453–458. [Google Scholar]
  18. Tian, S.; Guo, H.; Xu, W.; Zhu, X.; Wang, B.; Zeng, Q.; Mai, Y.; Huang, J.J. Remote sensing retrieval of inland water quality parameters using Sentinel-2 and multiple machine learning algorithms. Environ. Sci. Pollut. Res. Int. 2022, 30, 7. [Google Scholar] [CrossRef]
  19. Zhao, Y.; Yu, T.; Hu, B.; Zhang, Z.; Liu, Y.; Liu, X.; Liu, H.; Liu, J.; Wang, X.; Song, S. Retrieval of Water Quality Parameters Based on Near-Surface Remote Sensing and Machine Learning Algorithm. Remote Sens. 2022, 14, 5305. [Google Scholar] [CrossRef]
  20. Zhang, H.J.; Wang, B.; Zhou, J.; Yu, Y.; Ke, S.; Huang, F.K. Remote sensing retrieval of inland river water quality based on BP neural network. J. Cent. China Norm. Univ. (Nat. Sci.) 2022, 56, 333–341. [Google Scholar]
  21. Li, X.B.; Chen, C.Q.; Shi, P.; Li, X. Retrieval of total inorganic nitrogen concentration in pearl river estuary by remote sensing. Acta Sci. Circum. Stantiae 2007, 27, 313–318. [Google Scholar]
  22. Dong, G.; Hu, Z.; Liu, X.; Fu, Y.; Zhang, W. Spatio-Temporal Variation of Total Nitrogen and Ammonia Nitrogen in the Water Source of the Middle Route of the South-To-North Water Diversion Project. Water 2020, 12, 2615. [Google Scholar] [CrossRef]
  23. Zhang, Y.; Jin, S.; Wang, N.; Zhao, J.; Guo, H.; Pellikka, P. Total Phosphorus and Nitrogen Dynamics and Influencing Factors in Dongting Lake Using Landsat Data. Remote Sens. 2022, 14, 5648. [Google Scholar] [CrossRef]
  24. Li, H.; Zhang, G.; Zhu, Y.; Kaufmann, H.; Xu, G. Inversion and Driving Force Analysis of Nutrient Concentrations in the Ecosystem of the Shenzhen-Hong Kong Bay Area. Remote Sens. 2022, 14, 3694. [Google Scholar] [CrossRef]
  25. Xiong, J.; Lin, C.; Ma, R.; Cao, Z. Remote Sensing Estimation of Lake Total Phosphorus Concentration Based on MODIS: A Case Study of Lake Hongze. Remote Sens. 2019, 11, 2068. [Google Scholar] [CrossRef] [Green Version]
  26. Yaroshenko, I.; Kirsanov, D.; Marjanovic, M.; Lieberzeit, P.A.; Korostynska, O.; Mason, A.; Frau, I.; Legin, A. Real-Time Water Quality Monitoring with Chemical Sensors. Sensors 2020, 20, 3432. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, X.Y. Correlation Analysis between the Concentrations of Nutrient and Phosphorus in Eutrophic Water and Its Spectral Reflectance. J. Hangzhou Norm. Univ. (Nat. Sci. Ed.) 2009, 8, 453–456. [Google Scholar]
  28. Zhang, L.F.; Peng, M.Y.; Sun, X.J.; Cen, Y.; Tong, Q.X. Progress and bibliometric analysis of remote sensing data fusion methods (1992–2018). J. Remote Sens. 2019, 23, 603–619. [Google Scholar] [CrossRef]
  29. Chen, Z.; Zhao, D.; Li, M.; Tu, W.; Liu, X. A field study on the effects of combined biomanipulation on the water quality of a eutrophic lake. Environ. Pollut. 2020, 265 Pt A, 115091. [Google Scholar] [CrossRef]
  30. Xu, W. A Review on Correlation Coefficients. J. Guangdong Univ. Technol. 2012, 29, 12–17. [Google Scholar]
  31. Ding, S.; Qi, B.; Tan, H. An Overview on Theory and Algorithm of Support Vector Machines. J. Univ. Electron. Sci. Technol. China 2011, 40, 2–10. [Google Scholar]
  32. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  33. Yan, G.; Zhu, Y. Parameters Selection Method for Support Vector Machine Regression. Comput. Eng. 2009, 35, 218–220. [Google Scholar]
  34. Zhang, C.; Guo, Y.; Li, M. Review of Development and Application of Artificial Neural Network Models. Comput. Eng. Appl. 2021, 57, 57–69. [Google Scholar]
  35. Tanaka, T.; Moriya, T.; Shinozaki, T.; Watanabe, S.; Hori, T.; Duh, K. Evolutionary optimization of long short-term memory neural network language model. J. Acoust. Soc. Am. 2016, 140, 3062. [Google Scholar] [CrossRef]
  36. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  37. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
  38. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  39. Fang, K.; Wu, J.; Zhu, J.; Xie, B. A Review of Technologies on Random Forests. Stat. Inf. Forum 2011, 26, 32–38. [Google Scholar]
  40. Hashim, F.A.; Hussien, A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl. -Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
  41. Qiao, Z.; Sun, S.; Jiang, Q.O.; Xiao, L.; Wang, Y.; Yan, H. Retrieval of Total Phosphorus Concentration in the Surface Water of Miyun Reservoir Based on Remote Sensing Data and Machine Learning Algorithms. Remote Sens. 2021, 13, 4662. [Google Scholar] [CrossRef]
  42. Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encycl. Database Syst. 2009, 5, 532–538. [Google Scholar]
  43. Wu, A.; Hu, X.; Wang, F.; Guo, C.; Wang, H.; Chen, F.-S. Nitrogen deposition and phosphorus addition alter mobility of trace elements in subtropical forests in China. Sci. Total Environ. 2021, 781, 146778. [Google Scholar] [CrossRef]
  44. Lei, C.; Deng, J.; Cao, K.; Ma, L.; Xiao, Y.; Ren, L. A random forest approach for predicting coal spontaneous combustion. Fuel 2018, 223, 63–73. [Google Scholar] [CrossRef]
  45. Chen, Y.; Yang, P.; Xiang, Q.; Yu, H. Evaluation of the Water Quality in Minjiang River and Analysis on its Variation Trend. Environ. Monit. China 2015, 31, 53–57. [Google Scholar]
  46. Schaffner, M.; Bader, H.P.; Scheidegger, R. Modeling the contribution of point sources and non-point sources to Thachin River water pollution. Sci. Total Environ. 2009, 407, 4902–4915. [Google Scholar] [CrossRef]
  47. Xu, H.; Gao, Q.; Yuan, B. Analysis and identification of pollution sources of comprehensive river water quality: Evidence from two river basins in China. Ecol. Indic. 2022, 135, 108561. [Google Scholar] [CrossRef]
  48. Lu, D.; Ren, Y.; Shi, Y.; Zhang, C.; Li, S.; Zhu, C.; Zhou, Y. Time-space Variation Characteristics of Waters Eutrophication in Malodorous Yuxi River. Environ. Sci. Technol. 2015, 38, 129–134. [Google Scholar]
  49. Zheng, L.W.; Zhai, W. Excess nitrogen in the Bohai and Yellow seas, China: Distribution, trends, and source apportionment. Sci. Total Environ. 2021, 794, 148702. [Google Scholar] [CrossRef]
  50. Tian, H.Y.; Tong, L.; Yu, G.A.; Arika, B. Relationship between water quality and land use at different spatial scales: A case study of the Mun River basin, Thailand. J. Agro-Environ. Sci. 2020, 39, 2036–2047. [Google Scholar]
  51. Zhang, Z.; Chen, Y.; Wang, P.; Shuai, J.; Tao, F.; Shi, P. River discharge, land use change, and surface water quality in the Xiangjiang River, China. Hydrol. Process. 2014, 28, 4130–4140. [Google Scholar] [CrossRef]
  52. Xiang, S.; Pang, Y.; Dou, J.S.; Lü, X.J.; Xue, L.Q.; Chu, Z.S. Impact of land use on the water quality of inflow river to Erhai Lake at different temporal and spatial scales. Acta Ecol. Sin. 2018, 38, 876–885. [Google Scholar]
Figure 1. Location of the study area and distribution of monitoring sections.
Figure 1. Location of the study area and distribution of monitoring sections.
Water 15 01398 g001
Figure 2. Correlation of TN, TP and single band.
Figure 2. Correlation of TN, TP and single band.
Water 15 01398 g002
Figure 3. Performance evaluation of TN, TP inversion models based on all experimental data for SVR (a,b), NN (c,d) and RF (e,f).
Figure 3. Performance evaluation of TN, TP inversion models based on all experimental data for SVR (a,b), NN (c,d) and RF (e,f).
Water 15 01398 g003
Figure 4. Spatial distribution of TN concentration.
Figure 4. Spatial distribution of TN concentration.
Water 15 01398 g004
Figure 5. Spatial distribution of TP concentration.
Figure 5. Spatial distribution of TP concentration.
Water 15 01398 g005
Figure 6. Distribution of TN and TP concentrations at each monitoring section. TN in dry season (a), TN in wet season (b), TP in dry season (c), TP in wet season (d).
Figure 6. Distribution of TN and TP concentrations at each monitoring section. TN in dry season (a), TN in wet season (b), TP in dry season (c), TP in wet season (d).
Water 15 01398 g006
Figure 7. Characteristics of the temporal distribution of TN and TP concentrations. Distribution and trend of all water-quality monitoring data (a), the annual variation in median water-quality monitoring data (b), temporal variation in each monitoring section (c).
Figure 7. Characteristics of the temporal distribution of TN and TP concentrations. Distribution and trend of all water-quality monitoring data (a), the annual variation in median water-quality monitoring data (b), temporal variation in each monitoring section (c).
Water 15 01398 g007
Figure 8. Discharge of pollutants into the main streams and important tributaries in the study area. TP emissions (a), NH3-N emissions (b), COD emissions (c).
Figure 8. Discharge of pollutants into the main streams and important tributaries in the study area. TP emissions (a), NH3-N emissions (b), COD emissions (c).
Water 15 01398 g008
Figure 9. Riparian buffer-zone land use in the study area in 2018. Land-use map of the 5 km buffer zone along the main stream banks (a), different buffer-zone land-use types and percentages (b), land-use type legend (c) (percentages in (b), with the numerator being the area of each type of land and the denominator being the total area).
Figure 9. Riparian buffer-zone land use in the study area in 2018. Land-use map of the 5 km buffer zone along the main stream banks (a), different buffer-zone land-use types and percentages (b), land-use type legend (c) (percentages in (b), with the numerator being the area of each type of land and the denominator being the total area).
Water 15 01398 g009
Table 1. Model performance evaluation indicators.
Table 1. Model performance evaluation indicators.
Evaluation IndicatorsFormulaIndicator DescriptionFeatures
MSE M S E = 1 n i = 1 n y i y i 2 The squares of the deviations between the predicted and true values are summed and then averaged.The smaller the value, the higher the accuracy of the model.
RMSE R M S E = 1 n i = 1 n y i y i 2 The square root of MSE.The smaller the value, the higher the accuracy of the model.
MAE M A E = 1 n i = 1 n y i y i The absolute values of the deviations of the predicted values from the true values are summed and then averaged.The smaller the value, the higher the accuracy of the model.
R2 R 2 = 1 - i = 1 n y i y i 2 i = 1 n y i ¯ y i 2 Reflects how accurately the model fits the data.The closer the result is to 1, the more accurate the model is.
where :   y i is the predicted value, y ¯ is the mean of the true value, y i is the true value, and n is the sample size.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, Z.; Ren, J.; Li, S.; Li, W.; Zhang, R.; Sun, T. Inversion of Nutrient Concentrations Using Machine Learning and Influencing Factors in Minjiang River. Water 2023, 15, 1398. https://doi.org/10.3390/w15071398

AMA Style

Tan Z, Ren J, Li S, Li W, Zhang R, Sun T. Inversion of Nutrient Concentrations Using Machine Learning and Influencing Factors in Minjiang River. Water. 2023; 15(7):1398. https://doi.org/10.3390/w15071398

Chicago/Turabian Style

Tan, Zhan, Jiu Ren, Shaoda Li, Wei Li, Rui Zhang, and Tiegang Sun. 2023. "Inversion of Nutrient Concentrations Using Machine Learning and Influencing Factors in Minjiang River" Water 15, no. 7: 1398. https://doi.org/10.3390/w15071398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop