Three parts constitute this research. Part (1) describes the data preparation process, including rainfall data and waterlogging crowdsourced data obtained by web crawlers, as well as the underlying surface basic data used to build the simulation model. Part (2) introduces the analysis method of the characteristics of waterlogging rainstorms. By drawing the ID curves, the waterlogging rainstorms are classified. Part (3) introduces the simulation results of the SWMM corrected by the crawled waterlogging points information, which verifies the rationality and feasibility of the method.
2.3.1. Crowdsourced Data Acquisition of Urban Waterlogging Based on Web Crawlers
Search engines and scalable crawlers are two different ways of obtaining Internet crowdsourced data related to urban waterlogging. In terms of the customizable data acquisition and speed, the latter has more prominent advantages. Therefore, this study uses web crawlers to capture waterlogging-related data from collection URLs (uniform resource locators) [
38]. First, we put all these URLs in an ordered queue in a specific order, extract the URL and download the page, then we analyze the page content, extract the new URL and store it in the queue to be crawled. We repeat the above process until the URL queue is empty or meets specific crawl termination conditions, so as to traverse the web and achieve effective data collection [
39].
The object of data acquisition is Sina Weibo V-certified urban waterlogging news. Compared with other types of crowdsourced data, such as municipal center telephone call volume, pipe network maintenance records, Internet flooding news, etc., these data have three advantages. (1) Weibo is a closed platform, V-certified news has high credibility. (2) The attributes of Weibo news data include geographic location and time information, making it easy to select data that meet the requirements. (3) Weibo news data are easier to clean up, with fewer redundant data and higher content value density. Input attributes are location, time and waterlogging keywords in the crawler program, and by simulating landing on the Weibo platform, the information that conforms to the attributes is automatically crawled into a fixed storage path, and the results are displayed in the form of rainfall information and waterlogging information. The method flow is shown in
Figure 3.
We change the time parameter in the code to collect multiple rainfall and waterlogging information. In this study, keywords such as rain, waterlogging, and flooding were used individually or in combination to expand the capacity of the target database. The reliability of internet information, collected by web crawlers to supplement the lack of flood-related data can greatly influence the accuracy of the results. From this point of view, data cleaning is considered an important step in collecting reliable data. The research achieves the purpose of data cleaning by eliminating redundancy and wrong information. Redundant information is mainly due to the same news reprinted by different news media, the wrong information mainly comes from news media’s reports on flooding news in other regions or other times. For example, on 21 July 2012, Beijing, China suffered the worst rainfall event in the past 70 years, which caused widespread concern. The Weibo media in the research area also reported and reprinted this news in large numbers. These data are invalid for the research and need to be eliminated. In general, the more serious the urban waterlogging, the more information feedback from citizens and news. Therefore, the amount of waterlogging information can be used as an objective standard to reflect the severity of waterlogging. The geographic information involved in the text is the actual waterlogging point of the city.
2.3.2. Waterlogging Rainstorms Thresholds Based on Intensity–Duration (ID) Curves
The return period of rainfall is an important basis for the construction of urban flood control facilities, in a unit of year (a), and it represents the average interval time between the occurrence of rainfall greater than or equal to a certain intensity, and the value is equal to the reciprocal of the frequency of heavy rainfall. For example, the construction standard for road drainage in Zhengzhou is 2 a, which is determined based on the Zhengzhou rainstorm intensity formula, Equation (1).
where
i is the rainfall intensity of the design rainfall, mm/min;
P is the return period, a;
t is the duration, min.
Urban waterlogging rainfall corresponds to 4 types: short-term heavy rainfall has excessive rainfall intensity, and poorly drained areas cannot drain the excess rainfall in a short time, which can be called rainfall intensity waterlogging; for rainfall with low intensity but long duration, rainwater gradually fills the downstream pipe network and overflow, which is called rainfall amount waterlogging; for rain with both strong intensity and long duration, waterlogging is called combined waterlogging; rainfall with a small amount and intensity always means no waterlogging.
Due to the independence between rainfall events, the characteristics of actual rainfall and designed rainfall are generally different. According to Equation (1), we calculate the return period of all rainfall events at 14 stations, and randomly select 12 events, the ID curves are shown in
Figure 4, the dotted lines are the design rainfall of 0.5-, 1-, 2- and 5-a return periods. The ordinate is the maximum average rainfall intensity during the period, and the abscissa is its corresponding duration. By selecting appropriate long-short duration thresholds and return periods, waterlogging rainstorms can be divided into four categories. Taking the duration thresholds of 20 min (M
20) and 60 min (M
60) and the return period of 1 a as example, the four types of waterlogging rainfall are divided as follows.
Table 2 shows the information and classification results of the 12 rainstorms in 4 return periods.
- (1)
M20 ≥ 21.36 mm and M60 ≥ 35.35 mm. It shows that both of the intensity and amount of rainfall have reached the flood-causing conditions, which is CW.
- (2)
M20 ≥ 21.36 mm, M60 < 35.35 mm. It means rainfall is concentrated and rapidly attenuating, resulting IW.
- (3)
M20 < 21.36 mm, M60 ≥ 35.35 mm. It shows that the rainfall is uniform and lasts for a long time, the corresponding waterlogging is AW.
- (4)
M20 < 21.36 mm and M60 < 35.35 mm. Which means the amount and intensity are both small and not enough to cause waterlogging disasters, corresponding to NW.
M20 and M60 indicate the maximum rainfall of 20 min and 60 min during the return period 1 a, which are 21.36 and 35.35 mm.
Different return periods and duration thresholds correspond to different waterlogging disaster discrimination standards. In practical applications, the duration thresholds can be determined according to the regional rainfall characteristics, and then the most accurate return period can be determined according to the waterlogging points information obtained by the crawlers and the flood simulation model, so as to obtain the appropriate waterlogging rainfall thresholds standard.
2.3.3. Analysis of Urban Waterlogging Process Based on SWMM
The urban waterlogging points data obtained by crawlers are often subject to the subjective influence of citizens. For example, rainfall of the same magnitude that occurs during peak hours and in the early morning will have completely different social responses, resulting in a lack of consistency in the data on waterlogging points obtained by crawlers. In contrast, the simulation results of the urban waterlogging model are not affected by data crowdsourcing, and are consistent and objective. Therefore, the use of waterlogging points data for model calibration to obtain reasonable urban waterlogging distribution characteristics is of great significance for judging the rationality of design rainfall in different return periods such as the waterlogging rainstorms thresholds standard.
SWMM, as a mature model used in urban waterlogging simulation research, is suitable for the needs of this research, which is an urban stormwater management model proposed by the US Environmental Protection Agency (EPA). Since the model was developed in 1971, after more than 40 years of development and application, it has been widely recognized [
40,
41,
42,
43,
44].
The model approximates the slope confluence as multiple sets of one-dimensional flow processes generated on the slope, and is calculated based on the motion wave equation. The basic principle comes from the simultaneous solution of water balance formula, Equation (2) and Manning formula, Equation (3) [
45].
where,
F is the subcatchment area, m
2;
V is the storage capacity of the catchment area, m
3;
h is the water storage depth of the catchment area, mm;
rs is the surface runoff rate obtained from runoff analysis, m/s;
Q is the flow rate, m
3/s;
hp is the water storage depth of the depression, mm;
W is the confluence width, m;
s is the slope of the subcatchment area;
n is the roughness.
Substituting (3) into (2), using the Newton–Raphson iterative method to calculate the approximate solution of the finite difference scheme to obtain the water depth process, we obtain Equation (4). In the calculation of slope confluence, the water depth process is obtained by Equation (4), and the flow process can be obtained by introducing Equation (2).
where,
h1 and
h2 are the water depths at the beginning and end of the period Δ
t.
K is the slope confluence coefficient, which is given by Equation (5).
For the confluence of pipelines and rivers, due to its own linear characteristics of water flow, a one-dimensional water flow formula is used for calculation. Among the three currently popular methods, the constant flow method and the moving wave method are simple generalizations of the actual process. In contrast, the dynamic wave method is most suitable for the calculation of urban pipelines and river confluences. Its governing equations are the Saint-Venant equations composed of the continuous equations and the momentum equations, Equations (6) and (7).
where,
Q is the flow rate, m
3/s;
A is the cross-sectional area of the water, m
2;
H is the water depth, m;
g is the acceleration of gravity, 9.8 m/s
2;
Sf is the friction drop, which can be determined by Equation (8).
After simplified calculation, the flow rate is given by Equation (9).
where the subscripts
1 and
2 respectively represent the upstream and downstream nodes of the pipe section or river section; the upper horizontal line represents the average value of the Δ
t period;
L is the pipe section or river section length, m. In addition, the nodes on the pipeline or river must also meet the continuity condition, Equation (10).
The finite difference format of the water level of the node can be expressed as Equation (11); where,
H is the node water level (or head), m;
Qi is the flow of the node, m
3/s;
ω is the free water surface area at the node, m
2.
Combining Equations (9)–(11), the flow rate and node water level of each pipe section or river section can be obtained.
In summary, SWMM can perform nodal overflow calculations, and can also accurately simulate the generation and disappearance of stagnant points. The waterlogging simulation model of the study area is constructed based on SWMM and the waterlogging data obtained by reptiles can be used for model correction, which can carry out reasonable urban waterlogging simulation, judge the rationality of the thresholds division according to the simulation results, and form a complete set of urban waterlogging rainstorms thresholds division methods.