Next Article in Journal
Spatial Heterogeneity of Combined Factors Affecting Vegetation Greenness Change in the Yangtze River Economic Belt from 2000 to 2020
Next Article in Special Issue
Influence of South-to-North Water Diversion on Land Subsidence in North China Plain Revealed by Using Geodetic Measurements
Previous Article in Journal
Analysis of Illumination Conditions in the Lunar South Polar Region Using Multi-Temporal High-Resolution Orbital Images
Previous Article in Special Issue
Remote Sensing-Based Hydro-Extremes Assessment Techniques for Small Area Case Study (The Case Study of Poland)
 
 
Article
Peer-Review Record

Integrating GRACE/GRACE Follow-On and Wells Data to Detect Groundwater Storage Recovery at a Small-Scale in Beijing Using Deep Learning

Remote Sens. 2023, 15(24), 5692; https://doi.org/10.3390/rs15245692
by Ying Hu 1, Nengfang Chao 1,*, Yong Yang 2, Jiangyuan Wang 1, Wenjie Yin 3, Jingkai Xie 4, Guangyao Duan 2, Menglin Zhang 2, Xuewen Wan 1, Fupeng Li 5, Zhengtao Wang 6 and Guichong Ouyang 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5:
Reviewer 6: Anonymous
Remote Sens. 2023, 15(24), 5692; https://doi.org/10.3390/rs15245692
Submission received: 19 September 2023 / Revised: 29 November 2023 / Accepted: 4 December 2023 / Published: 11 December 2023

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

 

This paper proposes a robust hybrid statistical downscaling method that integrates GRACE/GRACE Follow-on and wells data with deep learning techniques to detect small-scale groundwater storage anomalies   in Beijing. The study focuses on the impact of the South-to-North Water Diversion Project's Middle Route (SNDWP-MR) on groundwater depletion and recovery in Beijing. The authors use three deep learning methods to reconstruct terrestrial water storage anomalies and downscale GWSA using wells and GRACE/GRACE-FO data. The results show improved accuracy in detecting GWSA and provide valuable insights into the dynamics of groundwater storage in Beijing. This study is novel and suitable to publish in RS after further improvement/clarification.

Specific comments:

 1. What is the meaning of post-preprocessed about the GRACE data in the text and supplementary? post-preprocessed or post-processed?

2.  Pictures are not clear to read, need higher resolution.

3.  The manuscript lacks important references that need to be added.

 4. Line 135:  "However, existing research has demonstrated effectiveness of the grid resolution of 0.5°×0.5°. The references are necessary.

Line 215-218:  What kind of Kriging interpolation method was used? How was the accuracy of the interpolation assessed? How much does the interpolation method affect the results?

 Line 341: Please add a relevant description about Taylor diagrams.

Line 348:  Incorrect diagram number, Figure 9 should be Figure 8.

 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Comments on “Integrating GRACE/GRACE Follow-on and wells data to detect groundwater storage recovery at a small-scale in Beijing using deep learning ” by Hu, et al., 2023 ( remotesensing-2646662).

 

 

This work attempts to detect groundwater storage recovery at a small-scale in Beijing by Integrating GRACE/GRACE Follow-on and wells data through deep learning approach. The authors firstly developed three deep learning models to reconstruct GRACE-derived terrestrial water storage anomaly (TWSA) with grid cell of 0.5°×0.5°. They then deduct non-groundwater components from GRACE-derived TWSA. At last, they used LSTM to downscale the GRACE-derived GWSA from 0.5° to 0.25° based on three proposed methods. This study delivers an innovation technical reference for dynamically monitoring a small-scale GWSA from GRACE/GRACE-FO data. It is of high scientific value for publication. I would like to recommend it to be published by this journal, however some minor comments related to the writing and discussion presented in this manuscript need to be solved before publication.

 

Comments:

1.      Lack of references (lines 130).

2.      The description of Figure 5 and 6 is confusing. Please revise them (lines 289 and lines 290).

3.      “From Figure 6” should be “From Figure 5” (lines 300).

4.      The description of Taylor diagram is unclear (lines 343).

5.      The figure number for Figure 9 is incorrect and it should be Figure 8 (lines 350).

6.      Please carefully verify the correspondence between the figure numbers and their respective descriptions in the text.

 

Comments on the Quality of English Language

Overall the English presentation is good, but some paragraphs need to be have a double check.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Editor and Authors,

I have posted the comments on the attached file.

Best regards

Comments for author File: Comments.pdf

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript “Integrating GRACE/GRACE Follow-on and wells data to detect groundwater storage recovery at a small-scale in Beijing using deep learning” stated goals are to detect groundwater storage recovery due to water diversions in the Beijing area. However, the paper, as written, is not clear on what was actually done or what the goals were. I expect the work is correct, but poorly described. For example, on L22 in the abstract, they state the goal is to “reconstruct the 0.5x0.5 terrestial water anomalies” while just two lines later, L24, they state they used different DL stragteies to downscale to 0.25x0.25 degrees. This type of multiple competing descriptions occur throughout the paper.

 

As near as I can tell, two things were done. The first was data imputation to fill the gaps in the GRACE data, the second was to look at GW anomalies in the Beijing area to determine if water diversions had increased groundwater levels. The paper describes at length the three different GRACE products, i.e., Table 1 and Figure 2, but it isn’t clear how they were used. Later in the paper, L231, it is implied that only the JPL data were used. There is never a clear distinction between the different Tasks. Figure 2 is very complex, but is confusing and never described well.

 

The biggest issue is best demonstrated by Figures 10 and 11. These are GW trends in the Beijing area at 0.25x0.25 degrees (Figure 10). I used the Lat/Long labels on the figure to determine the resolution, though the caption implies 0.5x0.5 degrees (before downscaling). Figure 11 appears to show a 2x increase in resolution to 0.125x0.125 degrees. This is never discussed in the paper. More importantly, it is impossible that Figure 11 is a downscaled version of Figure 10. There are numerous additional discrepancies between these two figures best seen in the bottom row (method 3), though obvious in the other two methods. In Fig 10 P1M3 (period 1 method 3) the scale goes from -10 to 5, the downscaled data, Fig 11, go from -60 to 0. It is difficult to compare the panels within either figure or between figures and the color scales are different for every panel. However, upward trends are blue and downward are red. If you look again at Fig 10 P1M3 and Fig 11 P1M3 the plots go from the overwhelming majority of the region as a downward trend (Fig 10) to an overwhelming majority of the region in an upward trend (Fig 11). While this may be true, the paper doesn’t explain how “downscaling” can completely change the spatial distribution, rather than just increase the resolution.

 

While I used Figure 10 and Figure 11 as an example, these types of issues occur through the paper. I cannot provide a better review until what was done is better described. The methods MAY have been appropriate or the methods MAY have been in error. It is impossible to judge with the current paper.

L350 Section 4.3 This should be a key point, but is not emphasized in abstract or descriptions of what is in the paper. The paper focuses too much on ML and not enough on evaluating GW. The ML things in this paper are just homework assignments. The implications and trends in GW are the important part and the reason the paper should be published. The ML stuff is just a tool – there is nothing new in the ML parts of the paper.

L385 Fig 9 You need to discuss the very large change in in-situ data at the end of the period. What caused it, why didn’t GRACE show it, etc.? There is something either very interesting or very wrong here. Also, based on this graph, precipitation shows an upward trend after the red line. This could easily be the cause of the groundwater rise – rather than water diversion. This needs to be analyzed and discussed. These issues are much more publishable that all the ML stuff in this paper. Just treat the ML stuff as a tool, the paper should be about the GW issues, are the trends from diversions, are they from increased precipitation, etc.

 

While I could not complete a full review, I did make some specific notes in reading the paper.

 

L39-40 GW is only the main source if a few regions, in most of the  world surface water provides the majority of the resources.

L45 replace “served” with “serves”

L87-88 I assume by “fill the gaps” you mean “impute” missing data. This is the first instance where it isn’t clear what you are doing are you imputing gaps or “reconstructing TWSA of Beijing”. These are different tasks. You may need to do the first to complete the second, but throughout the paper it isn’t clear which task you are discussion or describing.

 

L101 replace “GWAS” with “GWSA”

L108 Problems with sentence and English.

L110-111 Item 3, this is a focus of the paper (the stated purpose of the list), but a result.

L113-115 No pink circles (as described in the caption). Make figure bigger. Zoom into study region. Make legends and inset smaller. Maybe make right and left panels separate figures (or vertical panels) to increase size.

L127 Section 2.1 This discussion doesn’t follow Table 1. Also, Table 1 seems to just be an overview of available data, not the data used for this paper. The text and the table are not correlated.

L127-138 Section 2.1 It is never clear what data you used for your models, why you selected those data, or what format or resolution those data were in.

L135-137 You don’t need to provide this level of detail, just tell us which data set you used, cite the source, and tell us why.

L138 “GRACE-derived TWSA” which ones

L138 “resampled to 0.5x0.5”. In Table 1, all the data are listed as 0.5x0.5 or 0.25x0.25. What did you do? How did you resample? I know later you present the models, but it isn’t clear what you are resampling.

L140 Delete the phrase “used the laws of physics”.

L144 “resampled to 0.5 and 0.25” How were these data resampled?

L146-150 What grid resolution, did you resample, etc.

L152-157 What grid resolution, did you resample, etc. What time resolution

L158 Change “Wells” to “Well”

L165 Table 1, this should be a table of the data you used (and for what), what it original form was, and what you did to it. Currently it is more confusing than helpful because it lists everything and it isn’t clear what you used.

L166-173 This explanation isn’t clear. I think you imputed the missing GRACE data, used the GRACE GWA data along with well data to rescale these data to 0.25x0.25, the evaluated trends. If this is true, why not just use the 0.25x0.25 GRACE data. I understand you feel you can do a better job by including ground truth in the rescaling, but you need to describe this process and goal. You say you followed Figure 2, but I can’t follow Figure 2

L176-187 Section 3.1 This is either too little information, if the differences matter for this study, or too much (which I think is true). Just list the models, cite the literature, and tell us if they are better at certain things. This feels like “we took a list of models from the literature and tried them all” paper, which is not research, it is homework.

L187 Spell out “SI” the first time used.

L188-193 Section 3.2 I think this is describing imputation to fill the gaps, using the existing data as ground truth to compute errors, but it isn’t described that way.

L188-193 Section 3.2 I think this is describing the top part of Figure 2, but this text does not follow the figure. I don’t see 6 types of GRACE data, or any of the other details in the text (or if they are there, they don’t match the figure).

L191 What are the “six types of GRACE-derived TWSA” you used? Again it isn’t clear what data you used in the study.

L194 – 201 Where did you get these data? What resolution, etc. Most studies use GLDAS, but you list a large number of sources earlier in the paper.

L206 You want to go from GRACE water equivalent to water height, the opposite of what you say. Eq 3 is correct.

L210-211 Over what period did you average the GW elevation? Why not just pick the first value as a datum? Either if fine, but you need to describe what you actually did.

L213 What is “the optimal” model? I think you mean the model you used after going through the fitting procedure, but this is misleading and you then describe several models, only one of which can be optimal by definition.

L214 These are not “forcing” variables, these are the “feature” variables for the ML models.

L215 How did you train a machine learning model without any target data? You say you trained a machine learning model to go from 0.5x0.5 to 0.25x0.25. What were the target values, where did they come from, etc.

L215-219 If Kriging is ground truth (it isn’t clear), what you actually did was train a ML model to Krig the data. This is fine, but there is no reason to do it. Just Krig the data. It is fast, accurate, and understandable. It seems that your error is the difference between the ML results and the Kriging results. What does ML give you that Kriging doesn’t?

L220 Eq 4 You need to discuss the fact that your test and train data sets are from different time periods on data that has trends. It would probably be better to have several test and train breaks in different time periods, rather than all the train data at the beginning and all the test data at the end.

L221 Eq 5 If the Kriging data (i.e., ground truth) was already gridded data, why are you Kriging the error data? You already have the error data on a grid. Whatever you did is not described in the paper.

 

L222 Eq 6 What does this equation mean? Is this just a statement that there is error in the process or did you add some estimated error to the simulation results? Again, if the error is the difference between the ML data and Kriging, why not just use Kriging for re-scaling?

L223-228 Why you look at these average errors, I think you will find that since they are from different time periods in data that have trends (not constant trends), that this type of operation is suspect (Eq 5)

L230 What does “optimally align” mean.

L231 Here is an example stated earlier. Up to this point in the paper, you talk about 3 or 6 or multiple GRACE data sets, here you only list the JPL data.

L232 How was “downscaling” done?

L235-239 Section 3.4 Why is there a section on Random Forest? Why is it here rather than earlier in the document. You never talk about using it, it isn’t in any Figures or methods.

L247 What is “GFZ SH TWSA”? Again, you need to clearly describe, earlier, exactly what data you used. Not use “we used everything”

L253 Figure 3 Yellow is a bad color for figures. Why not use 3 lines, rather than bars and two lines?

L269 Figure 4 This seems to be a data imputation step – which is required, but not described as such. Is Fig 3 the error from the data imputation step, or the GWa downscaling step. Again, you never clearly describe what you did.

L284 Table 2 This is not “Precision” but rather “Error metric”. Again, I assume that these are the results of doing data imputation on the 6 different GRACE data sets, but this isn’t clearly defined. Also, later in the paper, you appear to only use 1 to estimate GW trends. If that is the case, they impute data for all 6?

L307 Figure 5 What are all the points in the graph on the right panel. There are only 30 cells in the figures, but significantly more points on these plots – again you need to describe what was down.

L307 Figure 5 What are the different columns? The rows are data sets, but columns are not defined.

L323 Fig 6 Here you only seem to be using one GRACE data set with three different downscaling methods. What data set did you use? Why did you pick it? Why did you impute all 6 data sets if you only need 1. Both the figure and the text move from imputation to re-scaling without making it clear that you are talking about something different.

L323 Fig 6 How did you compute in-situ GWA? At what time step is the anomaly. What was the base period from which the anomaly was computed? Need to describe what you did.

L323 Fig 6 Need to change GWSA to GWA it isn’t a ground water surface anomaly. Do a search and replace on entire document.

L338 Fig 7 same comments as Fig 6

L323 Fig 7 Need to describe how both things were re-scaled. What was re-scaled out of the 6 GRACE data sets.

 

L348 Fig 9 If you fit a linear line to these data, there is NO correlation in the top two panels (horizontal line). The bottom left panel is slightly better. The error metrics support this. If Kriging was the ground truth, this is more information that says to just use Kriging. This should be discussed. Your data show that ML shows no benefit over Kriging for re-scaling.

L348 Fig 9 Yellow is bad color for figures.

L350 Section 4.3 This should be a key point, but is not emphasized in abstract or descriptions of what is in the paper. The paper focuses too much on ML and not enough on evaluating GW. The ML things in this paper are just homework assignments. The implications and trends in GW are the important part and the reason the paper should be published. The ML stuff is just a tool – there is nothing new in the ML parts of the paper.

L360-370 You are now talking about time-lags and other issues. None of this was discussed in the methods or earlier in the paper. You need to describe what you did and how it was done. If these are results, you need to show me that there is a time lag – plots or something. Much more important than plots of error metrics for the ML models you don’t even use (5 of the GRACE data sets).

L385 Fig 9 You need to discuss the very large change in in-situ data at the end of the period. What caused it, why didn’t GRACE show it, etc.? There is something either very interesting or very wrong here. Also, based on this graph, precipitation shows an upward trend after the red line. This could easily be the cause of the groundwater rise – rather than water diversion. This needs to be analyzed and discussed. These issues are much more publishable that all the ML stuff in this paper. Just treat the ML stuff as a tool, the paper should be about the GW issues, are the trends from diversions, are they from increased precipitation, etc.

L433 Figure 12 Similar comments to Fig 9. These are the interesting data that could affect GW changes. Are the changes caused by water diversion, precipitation, or use changes. What would have happened without the diversion? How has changes in use affected the GW. Just cite the ML stuff and do the analysis. Also, the better analysis might be Kriging of in situ data. If there are not enough data to Krig, then you need to explain that you used GRACE data to get a longer time trend, to do that you needed to impute missing values, and rescale (though there are 0.25x0.25 degree GRACE data).

 

The paper puts forth a lot of stuff, without providing details or context. Figure out what story you are trying to tell - I expect it is “water diversions are offsetting use in the Beijing area” then focus on that. Everything else is just a tool

 

 

 

 

 

 

Comments on the Quality of English Language

English is ok. There are some issues with past/present tense

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Thank you for giving me an opportunity to review the entitled manuscript “Integrating GRACE/GRACE Follow-on and wells data to detect groundwater storage recovery at a small-scale in Beijing using deep learning”  Authors have proposed machine learning based downscaling approaches integrated with water wells data to improve the groundwater dynamics characterization in a anthropogenic regions taking a case study of Beijing where governmental agencies have took initiative to reduce the impacts of depletion. I have seen research framework has certain strengths but also suffering some knowledge gaps, international literature discussion, and missing opportunities to evaluate the machine learning prediction using training, testing and validation data. Although the manuscript is relatively completed but still need careful revision to get it accepted for publication.  Introduction has a lack of many important details and background information. Line#62-73 Discussion on GRACE follow-on mission is not enough. Please go through more international literatures and cites well established studies on GRACE specifically for groundwater storage estimations. Aldo, its not entirely clear why authors have used developed LSTM, GRU, and MLP instead of many other machine and deep learning methods. Authors need to discuss it more, identify the knowledge gaps in previous methods and provide the selection criteria and advancements in selected methods. Statistical verses dynamics downscaling: again, literatures and justifications are not well established in this section. These literature might be good as references studies:  https://www.mdpi.com/2073-4441/15/6/1017, https://www.sciencedirect.com/science/article/abs/pii/S0022169422010174’  https://www.sciencedirect.com/science/article/pii/S0048969722031412; (4) https://www.sciencedirect.com/science/article/abs/pii/S0048969721022105

The flowchart of study is not clear to me

The downscaling procedure is incorrectly applied and not well established. Authors should split the grid pixels for training, testing and evaluation. It should be trained using GRACE and explanatory variables, then tested using only explanatory variables to see how many variables can explain the GWSA. And the final step should be evaluation and applications. I am not impressed by the straightforward methodology applied for authors same as previous studies done. The who thing is correct, but I want to see the assessments of methods on training and testing set.

Figure 5: predicted and truth values are too close because authors have used similar GRACE data in training and then using same data for evaluation. That was my previous concern. It would be better to evaluate the predicted results at the testing set (without using GRACE data). Again, results are good but the standard methodology is lacking.

Table 2: Now authors are saying training and testing evaluations. But there is no reasonable evaluations and standard methodology on training, testing etc…. in previous sections including methodology

Figure 5: hard to read evaluation matrices. Please rebuild this figure with a better resolution scale.

Figure 7: There must be a comparison with original GRACE data to ensure how much downscaled product accurate is?

Authors have used different methods and apart from that remote sensing and reanalysis variables were used at different scales. Uncertainty assessment must be performed to see how much final estimate uncertain is. Uncertainty quantification with each variable and algorithms might be good idea. Figure 9: Is it for whole period? Dots indicated the wells observations or number of months in scatter plot?

 

Conclusions: Need to be marginal based on main findings

Comments on the Quality of English Language

Moderate editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 6 Report

Comments and Suggestions for Authors

 

Major comments:

-Table 2 show clear signs of overfitting (extremely tight fit in training period), very weak fit in test period

-correlation of GWSA with well data are very low (see figures)

-it is not explain, what mean “regional average scale”. How it was done? This is important since by doing that the correlation between GWSA and well data get from very low to some.

-Fig9 Even after regional average scale the fit between observed and downscaled GWSA are extremely poor for method 1 and 2 and poor for method 3

-Fig9 upper part shows that temporal evolution of GWSA from none of methods (1-3) follow the real measured data (!)

-Figs.10,11 demonstrates that each method shows completely different output for given period (random, like playing dice). This demonstrate very poor performance of the DL models used.

-Line203 definition of specific yield is not correct. It is volume of water by unit AREA per unit decline

-Figures are often low quality, hard to read

-Authors should keep a single term for single parameter. Instead they use many different names for the same parameters, such as “in-situ GWSA”, “in-situ data”, measured GWSA”, etc.

Chapter 4.4 How exactly the affect of “human factors” was compared with “climate factors” What was used and how?

-Discussion, lines 440-450 shows that more precise estimates of depletion rate and  recovery rate were already published, some of them using much high number of wells for verification (617). What is a value to publish less precise estimate verified by lower number of wells? Generally showing very poor correlation with well data and low fit?

 

Minor comments

Line 416-“shortage of 1 km3 between water supply and water use” In how long period? Why this is not given in more appropriate units (l/s) million m3/year or so?

Author Response

 Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 5 Report

Comments and Suggestions for Authors

The authors have provided a revised version of the entitled manuscript "Integrating GRACE/GRACE Follow-on and wells data to detect groundwater storage recovery at a small-scale in Beijing using deep learning". The authors seem to have not fully incorporated my comments mainly for model training and testing evaluations at pixel scale. I deeply understand the downscaling frameworks, particularly for precipitation and GRACE datasets. Authors have not fully come across the international literature, particularly recommended in previous report. The authors stated many variables from different sources, so uncertainty quantification with data sources and machine learning algorithms must be provided. It is recommended authors carefully review my comments, go through the international literature, and provide appropriate discussion with available and recommended studies. I thus do not recommend to publish this article in its current form.  At present the structure of this article needs further improvement. 

Comments on the Quality of English Language

Minor editing of the English language required

Author Response

Please see the attachment

Author Response File: Author Response.docx

Back to TopTop