Next Article in Journal
Groundwater Flow Model along a Vertical Profile of the Sardas Landfill in Sabiñánigo, Huesca, Spain
Next Article in Special Issue
Integrated Constructed Wetland–Microbial Fuel Cell Systems Using Activated Carbon: Structure-Activity Relationship of Activated Carbon, Removal Performance of Organics and Nitrogen
Previous Article in Journal
Remediation of River Water Contaminated with Whey Using Horizontal Subsurface Flow Constructed Wetlands with Ornamental Plants in a Tropical Environment
Previous Article in Special Issue
Status of Research on Greenhouse Gas Emissions from Wastewater Collection Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Daily Water Consumption in Residential Areas Based on Meteorologic Conditions—Applying Gradient Boosting Regression Tree Algorithm

School of Environmental Science and Engineering, Tianjin University, Tianjin 300350, China
*
Authors to whom correspondence should be addressed.
Water 2023, 15(19), 3455; https://doi.org/10.3390/w15193455
Submission received: 5 September 2023 / Revised: 27 September 2023 / Accepted: 28 September 2023 / Published: 30 September 2023
(This article belongs to the Special Issue Green and Low Carbon Development of Water Treatment Technology)

Abstract

:
A more accurate way of water consumption forecasting can be used to help people develop a scheduling plan of water workers more targeting; therefore, this paper aims to establish a forecast model of daily water consumption based on meteorological conditions. At present, most studies of daily water consumption forecasts focus on historical data or single water use influencing factors; moreover, daily water consumption could be influenced by meteorologic conditions. The influence of complex meteorology factors on water consumption is analyzed based on a gradient-boosted regression tree (GBRT) model. The correlation of 10 meteorologic factors has been discussed and divided into 5 categories, including temperature factor, pressure factor, precipitation factor, sunshine factor, and wind factor. Through the GBRT algorithm, the daily water consumption of residential area could be predicted with a maximum error of ±8%. The results show that the average ground temperature (the feature importance accounts for 81% of the total) has the greatest impact on the daily water consumption of the residential community, followed by the somatosensory temperature (the feature importance accounts for 7% of the total). The method can provide the daily water consumption of water consumption nodes with higher precision for municipal water supply network model accuracy. It also provides a reference for water utility operation schemes and urban development planning.

1. Introduction

The influencing factors and forecasting methods of daily water consumption in the water distribution networks (WDNs) are mainly discussed. It is an important task to forecast the water consumption of WDNs as A more accurate water consumption prediction can improve the accuracy of the WDNs model. In addition, by forecasting the daily water consumption, a water utility can better plan its water reserves and allocation to balance supply and consumption. This helps to avoid situations of insufficient or excessive water supply, thus ensuring the reliability and quality of the water supply. At the same time, the forecast of daily water consumption can also help water utilities develop more efficient water supply plans to reduce waste and losses. This helps to improve the efficiency and sustainability of the water supply while reducing the cost of water supply and providing residents with more affordable water services [1].
Water consumption forecasting is an important task for the development and operation of WDNs. By forecasting daily water consumption, the water utility can better plan its water reserves and allocation to balance supply and consumption. This helps to avoid situations of insufficient or excessive water supply, thus ensuring reliability and quality. In addition, accurate water consumption prediction ensures the accuracy of WDNs model, which is commonly used as a critical tool by water utilities to develop more efficient operation plans. Accurate forecast of water consumption improves the efficiency of the water supply while providing residents with more affordable and sustainable water services.
The advancement of smart metering technology enables the collection of more extensive water consumption data, thus holding substantial promise for enhancing daily water consumption prediction [2]. In the context of our society, the challenges posed by rapid population growth and urbanization have underscored the critical importance of effective water usage monitoring. The recent emergence of the Internet of Things (IoT) concept has opened up new avenues for the development of water-efficient smart devices, systems, and applications tailored to the needs of buildings and cities [3]. Notably, Parra Lorenad et al. [4] have introduced an integrated IoT architecture encompassing electricity, water, and gas smart metering, offering benefits to both consumers and utility providers. Moreover, Caldognetto N et al. [5] have proposed a comprehensive smart environmental data metering solution with capabilities for data collection, processing, and over-the-air (OTA) communication for IoT devices, with the overarching aim of mitigating water losses and ensuring the monitoring of water quality. These advancements collectively establish a robust research background for the study of water consumption forecast.
Zhang et al. established a short-term daily water consumption prediction model based on time factors and previous water consumption to scientifically predict urban daily water consumption [6]. Pacchin et al. established a model for predicting water consumption within a 24 h time window based on previous water consumption, which uses only one pair of coefficients whose values are updated at each prediction step [7]. Bruntan et al. proposed three methods of generating comprehensive water consumption time series from observation data based on time series, thereby producing the final consumption in different ways [8]. Roper et al. used weather factors as variables to analyze water consumption in the Boston metropolitan area and found that weather factors had a significant impact on water consumption [9].
Various data analysis methods have been used in water consumption prediction in recent years. The research is mainly focused on the following two types of models: traditional statistical models and machine learning models. Among the traditional statistical models, general multivariable linear regression (GMLR) models [10], copula-based multivariate analysis methods [11], and multidimensional contingency tables [12] have been applied to predict daily water consumption models.
In terms of machine learning models, Huang et al. developed an ensemble learning-based method to predict the short-term water consumption, which improved the accuracy and stability of water consumption prediction [6]. Fiorillo et al. used a random forest model based on weather variables to establish the relationship between water consumption and weather, and the results showed that the total regional water consumption may increase by 9–10% in the hottest weeks [13]. Haque et al. used the Monte Carlo simulation to quantify the uncertainty of long-term water consumption prediction caused by the randomness and related structure of prediction variables [14].
From the analysis of factors affecting water consumption prediction, the following problems still exist: Firstly, the prediction of daily water consumption in water distribution networks requires consideration of many factors, such as meteorological conditions, historical water usage data, etc. The acquisition and updating of these factors pose certain difficulties, and there are errors and uncertainties in the data. Secondly, residential water consumption is influenced by various factors such as weather, seasonal changes, policies, cultural practices, and hotspot events. These factors operate collectively to influence the daily water consumption rather than independently. Moreover, these factors interrelate with one another, leading to complex interactions that impact water consumption patterns. Therefore, it is necessary to conduct a more in-depth analysis of meteorological factors and their contributions to water consumption.
Numerous scholars have dedicated their efforts to investigating the influence of urban models and urban planning on water consumption. Li et al. [15] employed CPMBNIP to predict urban water consumption in six lower-tier cities in southern China, utilizing historical water consumption data spanning from 1965 to 2004 to forecast the period from 2005 to 2013. This research contributes valuable insights into urban water consumption prediction under changing environmental conditions. Mousavi et al. [16] developed a Bayesian network (BN) model as a probabilistic approach and compared it with the gene expression programming (GEP) model, an evolutionary algorithm, for forecasting urban water consumption (UWC). Their findings suggest that sunshine hours exert a significant influence on UWC, and the BN model’s predictive capability is substantially enhanced by incorporating this predictor, especially in the context of a city located in an arid region experiencing rapid population growth. Wang et al. [17] utilized annual urban domestic water consumption data from Beijing, Chongqing, and Qingdao to introduce the Kernel Density Estimation-Fractional Order Reverse Accumulative Gray Model. Their results demonstrate that this proposed model outperforms others in predicting the urban domestic water consumption. This model serves as a potent decision support tool for addressing regional urban water consumption forecasting challenges within the water source sector. Gao et al. [18] have proposed a complex system for urban water management based on deep neural networks known as UWM-Id. Extensive experimentation has substantiated UWM-Id’s commendable performance, establishing its suitability for utilization within urban water management systems.
The existing methods have various limitations that can be categorized into the following areas. Traditional statistical models have limitations due to their inability to deal with large datasets and complex structures common in the era of big data. To overcome these limitations, more flexible methods need to be employed. Furthermore, machine learning models have insufficient generalization ability to handle unforeseen data, which is crucial for accurately predicting water consumption in practical applications. Traditional statistical models and machine learning-based methods suffer from overfitting or underfitting.
The present paper proposes a methodology to enhance the accuracy of daily water consumption forecasting. Based on a comprehensive analysis of meteorological factors, a daily water consumption prediction method is proposed using the gradient boosting regression tree (GBRT) model. GBRT algorithm is a highly interpretable machine learning algorithm that analyzes the contribution of each factor by thoroughly analyzing the meteorological factors and daily water consumption, allowing for the calculation of the contribution rate of each factor. The feature engineering problem is solved by calculating the importance of each factor. The GBRT model is used to analyze the impact of factors on daily water consumption, and the optimal solution for the model’s hyperparameters is calculated, effectively addressing the overfitting problem commonly seen in machine learning algorithms. The maximum error of this model in the prediction of daily water consumption is ±8%.

2. Materials and Methods

This study is grounded in meteorological factors, including air temperature, ground temperature, humidity, air pressure, wind speed, etc., as well as historical water consumption data in residential areas. Correlation analysis was employed to ascertain the factors exerting the most substantial influence on daily water consumption in residential areas. Finally, the association between these factors and water consumption is quantified utilizing GBRT.
The methodology employed in this study includes the following steps:
(1)
The extraction and preprocessing of meteorological data and historical water consumption data;
(2)
The conduction of correlation analysis between the historical water consumption and meteorological data, for the selection of factors with high correlation with water consumption;
(3)
The application of the GBRT model for observation of the relationship between the aforementioned factors and water consumption, with the utilization of a genetic algorithm (GA) configured to optimize the hyper parameters in the GBRT model throughout the course of the analysis.
Figure 1 provides a visual representation of the general approach.
This chapter primarily consists of the following three sections: 1. data acquisition and preprocessing; 2. analysis of meteorological factors affecting water consumption; 3. gradient-boosted regression tree (GBRT).
In the first section, we provide an overview of the methods employed for the acquisition and preprocessing of meteorological data and water consumption data. The second section focuses on the methodology used to analyze the correlation between meteorological factors and water consumption. The third section delves into the fundamental concepts of the gradient-boosted regression tree (GBRT) model and outlines the steps involved in constructing this model.

2.1. Data Acquisition and Preprocessing

2.1.1. Meteorological Data

The meteorological data used in this study were extracted from “China Daily Value Dataset of Surface Meteorology Data (V3.0)” on the website of China National Meteorological Information Center (https://data.cma.cn/, accessed on 20 September 2023). The dataset are codes formed by direct observation of various meteorological stations in China, and the authenticity of the data is high; however, data translation and integration are required before use.
The city involved in this study is located in northeastern China, so the Yongji monitoring station whose monitoring station number is 54171 was selected as its meteorological information source. The collated data contain 21 meteorological indicators, which are shown in Table 1.

2.1.2. Water Consumption Data

The daily water consumption (DWC) data for this study were obtained from a district metering area (DMA) system in a city in northeast China. The monitored data comprises the instantaneous flow rate and is recorded with an interval of one minute. The daily water consumption is calculated using the instantaneous flow rate obtained from the monitored data. The entire dataset contains water consumption data from March 2016 to March 2019. The data have a resolution of 1 min, meaning there are 1440 instantaneous flow rate data points for water consumption recorded each day. As shown in Figure 2a, this graph illustrates the distribution of water consumption data within a day (one cycle), which encompasses 1440 instantaneous flow rate data points. The complete dataset comprises 1095 such cycles.
Daily water consumption is calculated in this manner due to potential data loss caused by water consumption collection device malfunction, such as power outages, resulting in under-reporting, and thereby, and inaccurate calculations of collected data. In the monitoring data, due to the influence of the monitoring equipment’s own factors and monitoring conditions, it is partially abnormal. When analyzing the monitoring data under the conditions of upholding the truth and conforming to the reality, the processing and repair of the abnormal data will appear particularly important.
Regarding the identification of invalid monitoring data, the basic idea is to find the data in the monitoring data that obviously do not conform to the law of water use. The manifestations of data that do not meet the water use laws are generally divided into the following three types: data missing, data duplication, and data fluctuations that clearly exceed the threshold. A typical representation in the water consumption data is shown in Figure 2a.
Data loss is indicated by empty values for the flow rate recorded at the respective time stamp. The reasons for this error include the following: packet loss during data transmission over the network, power failure of the data acquisition equipment, etc. The identification of this kind of error is relatively simple, and it is enough to directly identify the null value in the data column; moreover, duplicate data are identified when the same data are repeatedly recorded within the same time frame. The reasons for this type of error include abnormal status in the data register or errors in the memory address incrimination process [19]. The identification of anomalous data is carried out following the method proposed by Z.X. Li et al. [20] in the paper ‘The Treatment Method of Abnormal Data in Water Consumption Monitoring in Construction Zone’.
Following the removal of abnormal data, missing data are interpolated to accurately restore water consumption variations within the community. Hot deck interpolation method is utilized, which accounts for statistical laws to ensure accurate data interpolation. This data interpolation method was proposed by Li et al. [20] in 2019. The water consumption data after data cleaning and interpolation is shown in Figure 2b. Upon comparing Figure 2a,b, it is evident that data processing resulted in the removal of noise that was characterized by data fluctuations well above the normal threshold for water consumption. Furthermore, after interpolating the original data, the trend of the interpolated data was consistent with the original data’s trend, and its fluctuation pattern followed the probability distribution of the original data.

2.2. Analysis of Meteorological Factors Affecting Water Consumption

Various meteorological factors significantly affect the behavior of residents in terms of daily water consumption. This paper will discuss in advance the impact of some meteorological factors that have not been widely used in short-term water consumption forecasting. These factors include somatosensory temperature, wind speed, air pressure, sunshine duration, wind direction, and ground temperature.

2.2.1. Meteorological Factors

Previous studies have shown that meteorological factors can have an impact on the use of air conditioners during summer and electric heaters in winter [21]. This fluctuation in electrical load can be attributed to changes in weather leading to changes in human comfort. Numerous researchers in the field of power grid have conducted significant research on this issue. Therefore, changes in meteorological conditions result in changes in human water use, which can be analyzed and predicted based on the same meteorological factors. Wind speed, air pressure, and sunshine duration also affect water usage by influencing human perception.
The wind direction can influent the water use behavior of resident. The geographical location of the city is determined and will not change, and the geographical features around the city will not undergo major changes in a short period of time. Therefore, the wind blowing from different directions in the city will have different subjective feelings and influences on people.
Ground temperature is one of the important indicators for characterizing soil thermal properties. It represents the transformation of solar radiation absorbed by the soil surface into soil thermal energy and transferred to deeper layers [22]. The change in ground temperature is more conservative and lagging than that of air temperature. There are seasonal changes in the thermal energy at deeper soil depths, influenced by solar radiation and changes in internal heat flow within the earth. Ground temperature can reflect the impact of historical climate and is closer to the temperature inside buildings without air conditioning systems, where human production and life take place. Therefore, ground temperature also has an impact on water usage.

2.2.2. Composite Meteorological Factor

The somatosensory temperature is different from the temperature measured by the meteorological station, and the change is more complicated because of the comprehensive influence of air temperature, air humidity, wind speed, and other conditions [21].
There are two widely used methods for calculating the somatosensory temperature. “General Formula for Somatosensory Temperature” (RGST), which incorporated air temperature, water vapor pressure, wind speed, and relative humidity into the formula was proposed by Robert G. Steadman. Wind speed, air pressure, and sunshine duration also affect water consumption by affecting human perception [23].
In addition, there is the fact that meteorological factors such as wind speed and humidity have inconsistent effects on human somatosensory temperature when humans are in different temperature ranges. In response to this sensory characteristic of human beings, some scholars have proposed a segmented somatosensory temperature (SST) calculation formula, which includes temperature, humidity, and wind speed [24].
What method for calculating somatosensory temperature in predicting daily water consumption is more reasonable will be discussed in the next section.
Correlation analysis is employed to examine the association between meteorological factors and daily water consumption. Furthermore, using the degree of correlation between meteorological factors and daily water consumption, the study identified the driving factors of water consumption in the GBRT model. The correlation coefficient was utilized to quantitatively measure the relationship between different meteorological factors and daily water consumption. Meteorological factors and water consumption are represented by X and Y, respectively, and their correlation coefficients are represented by ρ X , Y , whose calculation formula is as follows:
ρ X , Y = cov ( X , Y ) σ X σ Y = i = 1 N ( X i X ¯ ) ( Y i Y ¯ ) i = 1 N ( X i X ¯ ) 2   i = 1 N ( Y i Y ¯ ) 2
ρ X , Y > 0 means positive correlation; ρ X , Y < 0 means negative correlation. The closer the absolute value of ρ X , Y is to 1, the greater degree of linear correlation between variables X and Y, the closer the absolute value of ρ X , Y is to 0, the less linear correlation between X and Y. if 0 < ρ X , Y < 1, indicating that X, Y are related; if | ρ X , Y | ≥ 0.8. It is regarded as a very strong correlation; if 0.6 < | ρ X , Y | < 0.8, it is regarded as strong correlation; if 0.2 < | ρ X , Y | < 0.6, it is regarded as moderate correlation; when | ρ X , Y | < 0.2, it is weak correlation [25].

2.3. Gradient Boosted Regression Tree

2.3.1. Algorithm Overview

Gradient boosting is a machine learning technique used for regression, classification, and sorting tasks, and is part of the boosting algorithms family [26]. The gradient boosting algorithm builds a learner that can reduce the loss along the direction of the steepest gradient at each step of the iteration to make up for the deficiencies of the existing model. The classic AdaBoost algorithm can only handle two-class learning tasks using exponential loss functions, while the gradient boosting method can handle various learning tasks (multi-classification, regression, ranking, etc.) by setting different differentiable loss functions, which greatly expands the scope of application of the algorithm [27]. The gradient boosting algorithm uses the negative gradient of the loss function as the residual fitting method. If the basis function is a regression tree, a gradient boosting regression tree (GBRT) is obtained.

2.3.2. Input and Output

The algorithm takes into account input data consisting of meteorological and historical daily water consumption data. Initially, the meteorological data are correlated with historical daily water consumption data to identify the types of meteorological data that have a strong correlation with the historical daily water consumption. The selected meteorological data and pre-processed historical daily water consumption data are then fed into the model for processing.
The output of the model consists of two parts. The first part is the mapping relationship between meteorological data and historical daily water consumption data. This relationship can be utilized to forecast daily water consumption. In other words, the daily meteorological data are entered into the mapping relationship to compute the daily water consumption of the day. The second part of the model output is the significance of each meteorological factor during the prediction process, allowing factors to be categorized by their impact on daily water consumption.

2.3.3. Algorithmic Flow

The expression of the GBRT prediction function is as follows:
F x , ω = t = 0 T ρ t h t ( x , ω t ) = t = 0 T f t ( x , ω t )  
x : input simple;
h t : The t-th regression tree;
ω : Regression tree parameters
ω t : t-th regression tree parameters;
ρ t : Parameters of the t-th regression tree.
Set the initial predicted value as the global mean value, which is the average of all daily water consumption. Iterative training:
  • Compute the residual between the current predicted value and the actual daily water consumption as the target for the next round of training;
  • Construct a decision tree with the current target value as the label of daily water consumption and divide it into left and right subtrees based on meteorological factors;
  • Use the strategy of minimizing the regression loss function (mean square error) to determine the best partition feature and threshold for each node;
  • Allocate the samples based on the partition result to the corresponding subtree;
  • Calculate the optimal output value for each leaf node;
  • Update the current prediction value of daily water consumption by adding the prediction result of the new decision tree multiplied by a learning rate;
  • Repeat steps a–f to generate more decision trees. Termination condition: reach the preset tree quantity or maximum iteration number, or the residual change is small.
Output: Accumulate the prediction results of all generated decision trees as the final predicted value. The schematic vision of GBRT operation is shown in Figure 3.

2.3.4. Data Processing and Model Creation

In the process of data processing and model creation, several key steps were executed using various software libraries.
Software Used:
The data processing and modeling tasks were carried out using Python, leveraging the following libraries:
pandas: Used for data manipulation and loading;
scikit-learn (sklearn): Employed for machine learning tasks and model development;
numpy: Utilized for numerical operations.

Data Processing

(1)
Data loading: The initial step involved loading the dataset from an external CSV file. The data were structured as a comma-separated format;
(2)
Data splitting: After loading the data, it was divided into the following two main components: features (X) and the target variable (y). The first 10 columns of the dataset represented the input features (X), while the eleventh column represented the target variable (y).

Model Creation

(1)
Data splitting for training and testing: The dataset was further divided into training and testing sets to assess the model’s performance. This was accomplished using the train_test_split function from scikit-learn. In this case, 90% of the data were allocated for training, and the remaining 10% for testing;
(2)
Gradient boosting regressor initialization: A gradient boosting regressor model was initialized with specific hyperparameters. These hyperparameters included the choice of loss function, learning rate, the number of estimators, maximum depth of the trees, and minimum samples required to split a node;
(3)
Model training: The gradient boosting regressor model was trained using the training dataset. During this phase, the model learned to map the input features (X) to the target variable (y);
(4)
Model evaluation: To assess the model’s performance, both the training and testing datasets were used. The evaluation was performed by calculating the R-squared (R2) score, which measures the goodness of fit. The higher the R2 score, the better the model fits the data;
(5)
Feature importance: The model was analyzed to determine the importance of each feature in making predictions. Features importance provide insights into which input variables have the most influence on the target variable;
(6)
Prediction: After training, the model was applied to the entire dataset (X) to make predictions. These predictions were used for further analysis.
In conclusion, the data processing and model creation process involved data loading, splitting, initializing a gradient boosting regressor model, training, evaluating model performance, assessing feature importance, and generating predictions. Python, along with the pandas and scikit-learn libraries, facilitated these tasks, enabling data-driven insights and predictions based on the provided dataset.

2.3.5. Algorithm Parameter

Hyperparameter optimization refers to selecting a suitable set of hyperparameters from the hyperparameter space to balance the bias and variance of the model, thereby improving the effect and performance of the model. This study adopts the GBRT algorithm. There are three hyperparameters in this algorithm that need to be tuned. The three hyperparameters are as follows: n_estimators, learning_rate, max_depth.
n_estimators is the maximum number of iterations of weak learners, or the maximum number of weak learners. Learning_rate is the weight reduction factor ν of each weak learner, also known as the step size. Max_depth is the maximum depth of the regression tree, which can control overfitting, because the deeper the regression tree, the more likely it is to overfit [28]. In general, when the model has a large sample size and features, it is recommended to limit this maximum depth. The exact value depends on the distribution of the data. The rules of hyperparameters’ settings obey to Table 2.

3. Results

3.1. Analysis of the Importance of Meteorology Factors

When considering the impact of meteorological factors on daily water consumption (DWC), it is necessary to come up with a way of quantifying importance. The following will study the impact of meteorological factors on water consumption and analyze the degree of correlation between meteorological factors and water consumption to help improve the accuracy of water consumption forecasting.

3.1.1. Streamlining and Analysis of Drivers by Correlation Analysis

The study includes the following eight categories of raw meteorological data: barometric pressure, ground temperature, air temperature, relative humidity, precipitation, wind speed, wind direction, and sunshine duration. Each category of meteorological data includes several detailed subcategories of meteorological data. For example, average barometric pressure, highest barometric pressure, and lowest barometric pressure are included in the barometric pressure category of meteorological data. Since similar data of the same type can have similar impacts on residents’ water usage behavior, it tends to cause overfitting if all similar data are included in the daily water consumption model; therefore, the correlation analysis of similar data was performed first. Thus, 21 meteorology factors been analyzed, and 8 factors been adopted.
Minimum barometric pressure (MBP), average ground temperature (AGT), highest temperature (HT), average humidity (ARH), daily precipitation (DP), maximum wind speed (MWS), direction of maximum wind speed (MWSD), and sunshine duration (SD) have the strongest correlation with water consumption in each category of raw meteorological data. In addition to the raw meteorological data, two types of somatosensory temperature are also included in the subsequent correlation analysis, namely, segmented somatosensory temperature (SST) and general formula for somatosensory temperature (RGST). The Pearson correlation coefficients and significance (two-tailed) of all meteorologic factors and water consumption were calculated, and a heat map was drawn (Figure 4).
It can be seen in the heat map that the correlations of SST, RGST, MBP, AGT, HT, ART, DT, MWSD, and DWC were significant at a significance (two-tailed) level of 0.01. Among them, AGT had the strongest correlation, and the Pearson correlation coefficient was 0.930; SD and MWS had the weakest correlation with water consumption. While the significance level between water consumption and the two meteorological factors, standard deviation (SD) and mean wind speed (MWS), may not achieve a notably high threshold, it is imperative to acknowledge that they should not be dismissed in the ensuing gradient boosting regression analysis. Although the revelation of a direct linear correlation between SD and daily water consumption (DWC) remains less conspicuous, a compelling pattern becomes apparent in Figure 5. More precisely, when attention is turned to the upper boundaries of data points within the scatter plot, the following distinct trend is unveiled: SD demonstrates a proclivity to diminish as DWC experiences ascendant values. This observation suggests the presence of an underlying relationship between DWC and SD. A parallel trend is similarly identified when examining the association between MWS and DWC. Furthermore, when these two factors interface collaboratively with other meteorological variables, as they exert their collective influence on DWC, the potential arises for beneficial impacts on our forecasting outcomes. In essence, the intricate interplay of these variables, coupled with the influence wielded by supplementary meteorological factors over water consumption, may account for the relatively modest correlation witnessed among SD, MWS, and DWC. While this correlation may not be explicitly posited as the exclusive rationale for their integration within the model, it undeniably merits comprehensive exploration within our forthcoming discussions. The topic of how SD and DWC contribute to the overall accuracy and efficacy of the model shall be revisited and scrutinized in our ensuing discussion section.

3.1.2. Discussion of SST and RGST

The significance between SST and water consumption is high, as is RGST, and both showed a positive correlation with water consumption. However, the essence of both are indicators of human susceptibility to comprehensive meteorology factors, so it is incorrect to include both indicators in follow-up research as influencing factors. Therefore, it is necessary to make a trade-off between these two influencing factors. Observing the scatter plot of the two and water consumption (Figure 6), it can be found that both have a strong positive correlation with water consumption, indicating that there is indeed a significant positive correlation between somatosensory temperature and water consumption. But, compared with RGST, SST has less dispersion. In addition, in terms of statistics, the Pearson correlation coefficient between RGST and water consumption is 0.856, which is smaller than the Pearson correlation coefficient between SST and water consumption of 0.896. Therefore, it is most appropriate to use SST as a representative of human’s intuitive feelings about meteorology factors in subsequent calculations.

3.2. Analysis of Daily Water Consumption Using GBRT Algorithm

The forecast model’s architecture is as follows: 1. Data preparation phase; 2. Establish a preliminary model using a basic decision tree structure; 3. Calculate residual error as a basis for subsequent model iterations; 4. Use an updated decision tree to update the model; 5. Integrate resulting models; 6. Evaluate the model. If overfitting is encountered, return to the second step and adjust parameters.
Take 90% of the monitoring data as the training group and the remaining 10% as the test group. The optimization of hyperparameters adopts the method of combining a genetic algorithm (GA) with GBRT. The Pearson correlation coefficient between the water consumption calculated by the algorithm and the actual water consumption after the calculation is used as the measure of the prediction accuracy.

3.2.1. Hyperparameter Optimization for GBRT

Due to the huge number of parameter combinations, it is difficult to obtain the optimal parameter combination by using the traditional parameter adjustment method, so the genetic algorithm was employed in this study for parameter optimization.
The specific idea of parameter optimization is to regard each parameter adjustment process as the generation of a new offspring and save the optimal individuals of the offspring and the parent each time. Taking the Pearson correlation coefficient of the test group as the objective function, the convergence process is shown in Figure 7.
The results show that the Pearson correlation coefficient of the test group converged from 0.650 to 0.971 after 97 iterations.

3.2.2. Analysis of the Results of GBRT

Take 90% of the monitoring data as the training group and the remaining 10% as the test group. The Pearson correlation coefficient of the training group obtained by the GBRT algorithm is 0.987, and the Pearson correlation coefficient of the test group is 0.971. It can be seen that the algorithm does not have the problem of overfitting. Figure 8 shows the relative error between predicted and actual values. It can be found in Figure 8 that the relative error of all predicted values and actual values is of 8%, and the mean relative error is 1.68%. Among them, the predicted value error of ±2% accounted for 66.53% of the total, and the error of ±4% accounted for 83.54% of the total.

4. Discussion

4.1. Analysis of the Importance of Meteorological Factors Based on Gini Index

It mainly focuses on how much each feature contributes to each tree, averages these contributions, and finally compares the contribution between features. Calculated as follows:
G I q i = 1 c = 1 C p q c i 2  
V I M j q G i n i i = G I q i G I l i G I r i  
V I M j G i n i i = q Q V I M j q G i n i i
V I M j G i n i = i = 1 I V I M j G i n i i
G I q i —— The Gini index of node q in the i-th tree;
G i n i i ——the i-th Gini indexs, include G I q i , G I l i , G I r i ;
p q c i —— The proportion of category c in the i-th tree, and j represents the feature;
V I M j q G i n i i —— The importance of feature j at the node q of the i-th tree;
G I l i , G I r i —— Respectively, represent the Gini indices of the two new nodes after branching;
V I M j G i n i i —— The importance of feature j in the i-th tree;
Q——The set of nodes in which feature j appears in the i-th tree;
V I M j G i n i —— Importance of feature j.
Different from the correlation analysis mentioned above, factor importance analysis is a quantitative description of the proportion of contribution made by each driving factor to the forecast among all the factors in the forecasting process rather than the strength of one-to-one correspondence between a single influencing factor and daily water consumption.
Figure 8 is the graph of feature importance percentages, it can be found that the feature importance of average ground temperature accounts for 81% of the total, followed by somatosensory temperature (7%), minimum air temperature (5%), and minimum barometric pressure (2%). Observing Figure 9, it can be found that, after removing the influence of the average ground temperature, the feature importance of somatosensory temperature is the highest, accounting for 36% of the total, which is followed by the lowest temperature (24%), the minimum barometric pressure (10%), the average relative humidity (8%), maximum wind speed (7%), sunshine duration (6%), daily precipitation (4%), wind direction of instantaneous wind speed (3%), and wind direction of mean maximum wind speed (2%).
Affected by factors such as temperature, precipitation, and solar radiation, the average ground temperature can reflect comprehensive meteorologic factors [29]. In addition, the average ground temperature is also affected by historical meteorologic factors, and its changes are more conservative and lag when the temperature changes. Since most of human production and life are in various forms of buildings, the average ground temperature can better represent the temperature that humans feel in buildings compared with other single meteorologic factors. This can explain why the feature importance of the average ground temperature is the highest in this model. In addition, because people will inevitably have outdoor activities due to factors such as production, life or entertainment, the somatosensory temperature at this time has become the meteorologic condition most directly felt by people. Therefore, in this model, the feature importance of somatosensory temperature is second only to the feature importance of average ground temperature. Other meteorological factors will also have a certain impact on people’s water use behavior. For example, average relative humidity, sunshine duration, wind direction, and wind speed will affect people’s bathing behavior [30], and daily precipitation, sunshine hours, and wind speed, will affect people’s laundry behavior. Incorporating these meteorologic factors into models can improve the accuracy of water consumption forecasts.

4.2. Analysis of Factor Necessity

The necessity of incorporating meteorology factors into the daily water consumption prediction model was analyzed by excluding GT, ST, SD, and WS as influencing factors and calculating the relative errors of the models with these factors excluded. Subsequently, the relative errors have been analyzed, and Figure 10 was created to display the results. Removing these factors will bring greater error and uncertainty to the forecast. The boxes depicted in the box diagram have been arranged in descending order based on their degree of influence on the predicted outcomes after controlling for the effects of the considered factor. Among all the factors, GT has the most significant influence on the prediction of the model. Omitting GT from the predictor variables results in a maximum prediction error of 12.6%, which is 4.6% higher than that obtained with the full model. These findings demonstrate that incorporating GT significantly enhances the model’s accuracy. Furthermore, when excluding WS and SD, the maximum relative errors increase by 0.46% and 1.25%, respectively, whereas the exclusion of both factors raises the maximum relative error to 1.61%. The analysis of WS, SD, and daily water consumption reveals that a weak correlation in an individual aspect does not preclude a link between two variables. Such a correlation may appear when the variables interact with other factors.
Removing influencing factors significantly affects the model’s fitting performance. Figure 11 shows that GBRT plays a crucial role in enhancing the learning ability of the training group through the boosting mechanism. Irrespective of the factor combinations, GBRT consistently results in a higher Pearson correlation coefficient. However, the Pearson correlation coefficient for the test group significantly decreased when some factors were eliminated. This implies that removing these influencing factors can cause the entire model to underfit.

4.3. Comparison and Limitations

Numerous scholars have extensively delved into the discourse surrounding water consumption and its associated influencing factors, thereby rendering notable contributions to the domain of water consumption prediction.
In the realm of prediction scope, a wealth of research endeavors has been dedicated to forecasting the entirety of urban water consumption. Notably, researchers like Chen Wei [31] have undertaken the task of predicting hourly water consumption within Nanjing City. Their predictions have demonstrated a commendable degree of concordance with the actual data, showcasing minimal predictive errors. Chen et al. [32] have additionally delved into the realm of probabilistic forecasting concerning daily urban water consumption by introducing a Bayesian theory-based approach. This pioneering endeavor has culminated in the development of a probabilistic forecasting system for daily water consumption.
Furthermore, a plethora of scholars has diligently probed into the intricacies of water consumption behavior among individual residents. For instance, Jiang [33] employed methodologies encompassing data acquisition from water utility companies, in-person surveys, and questionnaire-based inquiries to scrutinize water utilization within residential edifices spanning various urban classifications within the Chongqing region. Moreover, they have discerningly established water consumption quotas for diverse residential typologies. In a separate scholarly endeavor, Reis et al. [34] have dissected the determinants impacting residential water usage within a residential complex located in Goiânia, Brazil, thereby gaining comprehensive insights into residents’ water consumption patterns.
The focal point of this study resides in the prediction of daily water consumption within residential areas. In juxtaposition to individual-level predictions, this research is primarily centered on residential areas, construed as conglomerates encapsulating the water consumption behaviors of myriad individual residents. This judicious approach serves to mitigate the potentially adverse ramifications stemming from the uncertainties associated with individual behavioral variances. Furthermore, it distinguishes itself from city-wide water consumption predictions by aptly encapsulating the nuanced influence of climatic variables upon specific residential locales. This granular predictive scale harmonizes effectively with the requisites of network modeling, thereby facilitating a more refined simulation of water utilization behaviors across diverse residential areas situated within the network model’s purview.
With respect to predictive modeling, scholars have diligently forged substantial inroads within this domain. Chen Wei et al. [31] have adroitly harnessed BP neural networks to craft composite models geared toward time series forecasting and explanatory interpretation of urban hourly water consumption patterns. In a parallel vein, Zhao et al. [35] have harnessed the predictive prowess of Markov chain models to proffer estimations encompassing the gamut of possible outcomes, ultimately culminating in the formulation of an equi-dimensional grey Markov prediction model. Chen [32] has ingeniously employed an adaptive Markov chain Monte Carlo simulation approach to derive posterior density estimates pertaining to daily water consumption, thereby yielding probabilistic forecasts. Concurrently, Reis [34] has adroitly leveraged the R programming language to undertake graphical analysis, trend curve fitting, and multivariate regression analysis, thereby shedding light upon the multifaceted factors that influence residential water consumption.
Within the purview of this study, the gradient boosting regression trees (GBRT) model has been judiciously adopted. In stark contrast to neural networks, this model confers a surfeit of interpretability, thereby furnishing the capability to discern factors most germane to the prediction outcomes. Moreover, diverging from the conventional multivariate regression models prevalent within statistical analysis, this model possesses the versatility to accommodate a copious multitude of independent variables, namely, the factors influencing the phenomenon under investigation. Furthermore, it excels in the identification of intricate interrelationships that may exist among these variables. Notably, this research introduces pioneering elements into the predictive framework, notably ground temperature and an array of other meteorological parameters, thereby unearthing an even more profound association between ground temperature and daily water consumption, as juxtaposed against air temperature.
This study presents its findings on the influence of meteorological factors on water consumption within residential areas and the potential utility of meteorological variables for predicting daily water consumption within these residential areas. Notably, the study achieves a prediction with a maximum error of ±8%. However, it is essential to acknowledge certain limitations inherent in this research.
Firstly, the study’s prediction relative error, while respectable at ±8%, still encompasses a degree of variability. The predicted value error of ±2%, accounting for 66.53% of the total, and the error of ±4%, accounting for 83.54% of the total, indicate that there is room for further refinement and improvement in the predictive model’s precision.
Secondly, the research highlights that average ground temperature is the most influential meteorological factor, with a feature importance accounting for 81% of the total. While this is a significant finding, it is crucial to recognize that there may be other meteorological variables or latent factors not considered in this study that could also impact water consumption within residential areas. Further research may be needed to comprehensively explore and incorporate these additional variables.
Thirdly, this study solely utilizes meteorological factors as influencing variables for daily water consumption in residential areas. However, it is essential to acknowledge that there are numerous factors affecting daily water consumption, including significant events, major holidays, and policy changes. These factors may well be the reasons for the remaining errors in this model.
Lastly, while this study primarily focuses on predicting daily water consumption within residential areas, its applicability to other types of water consumption forecasting is suggested. It is important to recognize that extending this method to different water consumption nodes within a hydraulic pipeline network may introduce unique complexities and considerations that warrant careful investigation and adaptation.
In conclusion, while this research offers valuable insights into the prediction of residential water consumption using meteorological factors, it is essential to acknowledge its limitations in terms of prediction accuracy, variable selection, data integration, and generalizability to other water consumption contexts. Future research should aim to address these limitations to enhance the robustness and applicability of predictive models in the domain of water consumption forecasting.

5. Conclusions

This study examines the impact of meteorological factors on water consumption in residential areas. It explores the potential use of meteorological factors to predict daily water consumption in residential areas with a maximum error of ±8%. Among them, the predicted value error of ±2% accounted for 66.53% of the total, and the error of ±4% accounted for 83.54% of the total. In addition, this study also found that the average ground temperature (the feature importance accounted for 81% of the total) had the greatest impact on the daily water consumption of residential areas, followed by the somatosensory temperature (the feature importance accounted for 7% of the total).
In the process of using this model, the algorithm can continuously incorporate new water consumption data and meteorology factors into the historical water consumption dataset and meteorology factor dataset. In this way, the basic database of the algorithm is updated and refined to better predict the water consumption. In addition to predicting the daily water consumption of residential areas, in the follow-up research, this method can also be extended horizontally to other types of water consumption forecasting. This further enriches the water consumption prediction methods of different types of water consumption nodes in the hydraulic model of the pipeline network.
A novel water consumption prediction algorithm is proposed in this study and validated with a specific case study in a city in northeastern China. Importantly, this method can also be applied to forecast other types of water nodes and cities. This study establishes a framework for water consumption predictions and attribution analyses. The driving data applied in this study encompass historical water consumption data and historical meteorological data. Among these, historical water consumption data can reflect the water consumption characteristics of the nodes themselves. These usage characteristics can manifest the water behaviors of different types of nodes (residential communities, industrial, utilities, etc.). When historical water consumption data from different types of nodes are input into the model, the model can output corresponding predictions of daily water consumption for the respective types.
Furthermore, when this algorithm is applied to different types of water nodes, factors other than meteorological ones can also be considered within the model. For instance, in the prediction of industrial water consumption, date labels can be input as influencing factors into the model. Of course, for different types of water nodes, further exploration is needed to determine which influencing factors should be incorporated into the model.
Taking this research one step further, once it has been applied to predict water consumption for various types of water nodes, it can subsequently be applied to forecast the overall water consumption of a city. This is achieved by calculating the city’s water consumption based on the predicted water consumption values for each water node provided by this algorithm. If overall climate change trends are included in the model, it can provide estimates of water consumption under potential future climate change conditions. If urban development planning is integrated into the model, it would require mapping the scale and form of planned water nodes with existing water nodes to obtain water consumption predictions for an expanded city. Through these possible extensions of research, valuable insights can be provided for urban development and planning.
This study can also be combined with the Markov dynamic process to longitudinally extend to the study of the change law of the minute water consumption in the meteorology and residential areas. That can make the water consumption behavior of the water consumption nodes in the residential area in the hydraulic model of the pipeline network closer to the real situation.

Author Contributions

Writing—original draft preparation, Z.L.; writing—review and editing, S.P.; software, resources, G.Z.; code debug X.C.; project administration, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2022YFC3203803.

Data Availability Statement

The data used in this manuscript are available from the corresponding authors.

Acknowledgments

Thanks to the National Key R&D Program of China for supporting and funding this project, grant number 2022YFC3203803.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Qiao, Z.R.; Wu, L.F.; Yang, Z.Z. Prediction of Water Consumption in 31 Provinces of China Based on FGM(1,1) Model. Clean-Soil Air Water 2022, 50, 2200052. [Google Scholar] [CrossRef]
  2. Ye, Y.L.; Yang, Y.H.; Zhu, L.; Wang, J.; Rao, D.N.; IEEE. A LoRa-based Low-power Smart Water Metering. In Proceedings of the IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021; pp. 301–305. [Google Scholar]
  3. Al-Madhrahi, R.; Abdullah, J.; Alduais, N.A.M.; Mahdin, H.B.; Nasser, A.B.; Saad, A.; Alduais, H.S. An Efficient IoT-based Smart Water Meter System of Smart City Environment. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 420–428. [Google Scholar] [CrossRef]
  4. Lloret, J.; Tomas, J.; Canovas, A.; Parra, L. An Integrated IoT Architecture for Smart Metering. IEEE Commun. Mag. 2016, 54, 50–57. [Google Scholar] [CrossRef]
  5. Caldognetto, N.; Evangelisti, L.P.; Poltronieri, F.; Russo, M.; Stefanelli, C.; Tenani, S.; Toboli, S.; Tortonesi, M. Water 4.0: Enabling Smart Water and Environmental Data Metering. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium, Electr Network, Budapest, Hungary, 25–29 April 2022. [Google Scholar]
  6. Huang, H.D.; Zhang, Z.X.; Song, F.X. An Ensemble-Learning-Based Method for Short-Term Water Demand Forecasting. Water Resour. Manag. 2021, 35, 1757–1773. [Google Scholar] [CrossRef]
  7. Pacchin, E.; Alvisi, S.; Franchini, M. A Short-Term Water Demand Forecasting Model Using a Moving Window on Previously Observed Data. Water 2017, 9, 15. [Google Scholar] [CrossRef]
  8. Brentan, B.M.; Meirelles, G.L.; Manzi, D.; Luvizotto, E. Water demand time series generation for distribution network modeling and water demand forecasting. Urban Water J. 2018, 15, 150–158. [Google Scholar] [CrossRef]
  9. Roper, A.M.; Palmer, R.N. Analyzing the Effects of Temperature and Precipitation in the Context of a Water Demand Model. In Proceedings of the 20th Annual World Environmental and Water Resources Congress, Henderson, NV, USA, 17–21 May 2020; pp. 290–303. [Google Scholar]
  10. Stelzl, A.; Pointl, M.; Fuchs-Hanusch, D. Estimating Future Peak Water Demand with a Regression Model Considering Climate Indices. Water 2021, 13, 1912. [Google Scholar] [CrossRef]
  11. Fontanazza, C.M.; Notaro, V.; Puleo, V.; Freni, G. Multivariate Statistical Analysis for Water Demand Modeling. In Proceedings of the 16th International Conference on Water Distribution System Analysis (WDSA), Bari, Italy, 14–17 July 2014; pp. 901–908. [Google Scholar]
  12. Ridolfi, E.; Vertommen, I.; Magini, R. Joint Probabilities of Demands on a Water Distribution Network: ANon-Parametric Approach. In Proceedings of the 11th International Conference of Numerical Analysis and Applied Mathematics (ICNAAM), Rhodes, Greece, 21–27 September 2013; pp. 1681–1684. [Google Scholar]
  13. Fiorillo, D.; Kapelan, Z.; Xenochristou, M.; De Paola, F.; Giugni, M. Assessing the Impact of Climate Change on Future Water Demand using Weather Data. Water Resour. Manag. 2021, 35, 1449–1462. [Google Scholar] [CrossRef]
  14. Haque, M.M.; Rahman, A.; Hagare, D.; Kibria, G. Probabilistic Water Demand Forecasting Using Projected Climatic Data for Blue Mountains Water Supply System in Australia. Water Resour. Manag. 2014, 28, 1959–1971. [Google Scholar] [CrossRef]
  15. Li, J.; Song, S.B. Urban Water Consumption Prediction Based on CPMBNIP. Water Resour. Manag. 2023, 1–25. [Google Scholar] [CrossRef]
  16. Mousavi-Mirkalaei, P.; Roozbahani, A.; Banihabib, M.E.; Randhir, T.O. Forecasting urban water consumption using bayesian networks and gene expression programming. Earth Sci. Inform. 2022, 15, 623–633. [Google Scholar] [CrossRef]
  17. Li, J.; Song, S.B.; Kang, Y.; Wang, H.J.; Wang, X.J. Prediction of Urban Domestic Water Consumption Considering Uncertainty. J. Water Resour. Plan. Manag.-ASCE 2021, 147, 05020028. [Google Scholar] [CrossRef]
  18. Gao, X.; Zeng, W.R.; Shen, Y.; Guo, Z.W.; Yang, J.H.; Cheng, X.H.; Hua, Q.Z.; Yu, K.P. Integrated Deep Neural Networks-Based Complex System for Urban Water Management. Complexity 2020, 2020, 8848324. [Google Scholar] [CrossRef]
  19. Li, Z. Analysis and Simulation on Water Consumption Law and Related Design Parameter in Residential Quarters. Master’s Thesis, Tianjin University, Tianjin, China, 2020. [Google Scholar]
  20. Li, Z.; Wu, X.; Jiang, A.; Liu, Z. The Treatment Method of Abnormal Data in Water Consumption Monitoring in Construction Zone. In Proceedings of the 3rd Session of the 3rd Membership Assembly and Academic Exchange Conference of the Water Supply and Drainage Research Branch of the Architectural Society of China, Guangzhou, China, 20–22 September 2019; p. 7. [Google Scholar]
  21. Shuo, X.; Wenxiong, M.; Le, L.; Rui, T.; Tian, L. LSTM load forecasting algorithm based on time-sharing somatosensory. In Proceedings of the 10th Renewable Power Generation Conference (RPG 2021), Online Conference, 14–15 October 2021. [Google Scholar]
  22. Chen, C.; Zhou, G. Analysis on variation characteristics of air temperature and ground temperature in Guilin from 1961 to 2010. Acta Ecol. Sin. 2013, 33, 2043–2053. [Google Scholar] [CrossRef]
  23. Yin, Z.; Fan, J.; Chen, Y.; Li, D.; Zhang, L. Impact of Sensible Temperature on Summer Weather- Sensitive Power Load Rate in Huangshi City. Meteorol. Mon. 2017, 43, 620–627. [Google Scholar]
  24. Yu, B.; Liu, M.; Yan, M.; Yao, K. The apparent temperature model under cool condition and effects of wind, vapor-pressure and extra radiation. Sci. Meteorol. Sin. 2002, 22, 304–312. [Google Scholar]
  25. Xu, W.; Liu, H.; Zhang, Q.; Liu, P. Response of vegetation ecosystem to climate change based on remote sensing and information entropy: A case study in the arid inland river basin of China. Environ. Earth Sci. 2021, 80, 132. [Google Scholar] [CrossRef]
  26. Zhang, Z.; Yang, W.; Wushour, S. Traffic Accident Prediction Based on LSTM-GBRT Model. J. Control Sci. Eng. 2020, 2020, 4206919. [Google Scholar] [CrossRef]
  27. Wang, G.; Ruan, Y.; Wang, H.; Zhao, G.; Cao, X.; Li, X.; Ding, Q. Tribological performance study and prediction of copper coated by MoS2 based on GBRT method. Tribol. Int. 2023, 179, 108149. [Google Scholar] [CrossRef]
  28. Yuan, H.; Yuan, K.; Zhao, Z. On Predicting Event Propagation on Weibo. In Proceedings of the 14th International Conference on Service Systems and Service Management (ICSSSM), Dongbei Univ Finance & Econ, Sch Management Sci & Engn, Dalian, China, 16–18 June 2017. [Google Scholar]
  29. Pandey, B.; Pathak, J.; Singh, P.; Kumar, R.; Kumar, A.; Kaushik, S.; Thakur, T.K. Microplastics in the Ecosystem: An Overview on Detection, Removal, Toxicity Assessment, and Control Release. Water 2023, 15, 51. [Google Scholar] [CrossRef]
  30. Sun, J.; Wang, J.; Sun, Y.; Xu, M.; Shi, Y.; Liu, Z.; Wen, X. Electric Heating Load Forecasting Method Based on Improved Thermal Comfort Model and LSTM. Energies 2021, 14, 4525. [Google Scholar] [CrossRef]
  31. Wei, C.; Jian, L.U.; Zhi-cheng, W.U. Combined forecast model of urban hourly water consumption based on BP neural network. J. Harbin Inst. Technol. 2009, 41, 197–200. [Google Scholar]
  32. Chen, L. Probabilistic daily water consumption forecasting using Bayesian theory. Syst. Eng. -Theory Pract. 2017, 37, 761–767. [Google Scholar]
  33. Jiang, W.; Huang, C.; Liu, Q.; Liu, Y.; Tian, S. Investigation on current situation of water consumption and water quota in Chongqing. Water Wastewater Eng. 2019, 45, 102–106. [Google Scholar]
  34. Reis, R.P.A.; Rocha, D.G.; de Rezende, G.P.; Campos, M.A.S.; Basso, R.E.; Fioramonte, B. Influence of the number of residents and climatic factors on residential water consumption. Water Supply 2023, 23, 1626–1640. [Google Scholar] [CrossRef]
  35. Zhao, X.; Gai, M. Urban Water Consumption Forecasting in Dalian Based on Equal Dimensional and New Information Grey Markov Forecasting Model. Hydrology 2011, 31, 66–69, 87. [Google Scholar]
Figure 1. General framework.
Figure 1. General framework.
Water 15 03455 g001
Figure 2. Monitoring data (a) before and (b) after data processing.
Figure 2. Monitoring data (a) before and (b) after data processing.
Water 15 03455 g002
Figure 3. Schematic of GBRT.
Figure 3. Schematic of GBRT.
Water 15 03455 g003
Figure 4. Weather factor correlation and significance heatmap.
Figure 4. Weather factor correlation and significance heatmap.
Water 15 03455 g004
Figure 5. Correlation analysis meteorology factors and water consumption.
Figure 5. Correlation analysis meteorology factors and water consumption.
Water 15 03455 g005
Figure 6. Comparative analysis of SST and RGST—water consumption.
Figure 6. Comparative analysis of SST and RGST—water consumption.
Water 15 03455 g006
Figure 7. Convergence process of genetic algorithm for hyperparameter optimization.
Figure 7. Convergence process of genetic algorithm for hyperparameter optimization.
Water 15 03455 g007
Figure 8. Relative error between predicted and actual water consumption.
Figure 8. Relative error between predicted and actual water consumption.
Water 15 03455 g008
Figure 9. Pie chart of meteorology factor importance.
Figure 9. Pie chart of meteorology factor importance.
Water 15 03455 g009
Figure 10. Relative error after single meteorological factor is removed.
Figure 10. Relative error after single meteorological factor is removed.
Water 15 03455 g010
Figure 11. Pearson correlation after single meteorological factor is removed.
Figure 11. Pearson correlation after single meteorological factor is removed.
Water 15 03455 g011
Table 1. List of meteorological indicators.
Table 1. List of meteorological indicators.
Meteorological IndicatorsResolutionUnits
minimum evaporation 0.1mm
maximum evaporation 0.1mm
average barometric pressure 0.1hPa
highest barometric pressure 0.1hPa
lowest barometric pressure 0.1hPa
average temperature 0.1°C
maximum temperature 0.1°C
minimum temperature 0.1°C
average relative humidity 0.1%
daily precipitation 0.1mm
8–20 precipitation 0.1mm
20–20 precipitation 0.1mm
average wind speed 0.1m/s
maximum wind speed 0.1m/s
wind direction of maximum wind speed
maximum wind speed 0.1m/s
wind direction of maximum wind speed
sunshine duration 0.1h
average ground temperature 0.1°C
highest ground temperature0.1°C
lowest ground temperature 0.1°C
Table 2. Hyperparameters’ settings.
Table 2. Hyperparameters’ settings.
Hyperparameter NameDefaultRangeStep Size
N_estimators10010–1501
Learning_rate1.000.01–1.000.01
Max_depth502–1001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Peng, S.; Zheng, G.; Chu, X.; Tian, Y. Prediction of Daily Water Consumption in Residential Areas Based on Meteorologic Conditions—Applying Gradient Boosting Regression Tree Algorithm. Water 2023, 15, 3455. https://doi.org/10.3390/w15193455

AMA Style

Li Z, Peng S, Zheng G, Chu X, Tian Y. Prediction of Daily Water Consumption in Residential Areas Based on Meteorologic Conditions—Applying Gradient Boosting Regression Tree Algorithm. Water. 2023; 15(19):3455. https://doi.org/10.3390/w15193455

Chicago/Turabian Style

Li, Zhengxuan, Sen Peng, Guolei Zheng, Xianxian Chu, and Yimei Tian. 2023. "Prediction of Daily Water Consumption in Residential Areas Based on Meteorologic Conditions—Applying Gradient Boosting Regression Tree Algorithm" Water 15, no. 19: 3455. https://doi.org/10.3390/w15193455

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop