Next Article in Journal
Research on Vegetation Coverage Dynamics and Prediction in the Taitema Lake Region
Next Article in Special Issue
Probabilistic Framework Allocation on Underwater Vehicular Systems Using Hydrophone Sensor Networks
Previous Article in Journal
Environmental Assessment Impact of Acid Mine Drainage from Kizel Coal Basin on the Kosva Bay of the Kama Reservoir (Perm Krai, Russia)
Previous Article in Special Issue
Hash-Based Deep Learning Approach for Remote Sensing Satellite Imagery Detection
 
 
Article
Peer-Review Record

A Machine-Learning Approach for Prediction of Water Contamination Using Latitude, Longitude, and Elevation

Water 2022, 14(5), 728; https://doi.org/10.3390/w14050728
by Kakoli Banerjee 1,*, Vikram Bali 2,*, Nishad Nawaz 3, Shivani Bali 4, Sonali Mathur 1, Ram Krishn Mishra 5 and Sita Rani 6
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Water 2022, 14(5), 728; https://doi.org/10.3390/w14050728
Submission received: 22 January 2022 / Revised: 14 February 2022 / Accepted: 16 February 2022 / Published: 24 February 2022
(This article belongs to the Special Issue AI and Deep Learning Applications for Water Management)

Round 1

Reviewer 1 Report

In this manuscript the authors investigate the factors based on water body location to be utilized as an indicator to predict the quality of water in a certain area, using machine learning approaches and water quality index.  Noida city in the northern part of India is selected for this area. This study may be useful in identifying and controlling water pollution. The subject of this study is suitable for publication in the Journal of Water, however, the manuscript was poorly written and needs major revision. Therefore, I think the quality of the paper in the current form is far away from the publication in this Journal.

My comments:

  1. Abstract: The abstract of the paper has to provide information on key aspects of the paper precisely. The abstract should state briefly the purpose of the research, the principal results and major conclusions. An abstract is often presented separately from the article, so it must be able to stand alone. The abstract need to be revised.
  2. The introduction part is not well structured for the paragraphs need to be revised to incorporate additional information for water pollution by heavy metal ions and organic residues, key references:

IOP Conf. Series: Earth and Environmental Science 958 (2022) 012011

IOP Conf. Series: Earth and Environmental Science 612 (2020) 012023

Meanwhile, there is not enough description of innovation and scientific issues in this part.

  1. The authors assessed the water quality in the Northern part of India, like pH, temp., hardness, alkalinity ….etc. however, the other important materials for water pollution such as bacterial, heavy metal ions, pesticides and herbicides assessments are missed.
  2. The heavy metal ions and organic compounds like pesticides and herbicides residues in water are also related to human disease, some advanced detection work is also recommended in the introduction section. Heavy metals, pesticides and herbicides should be discussed as the major reasons for some chemical contaminants. Please see references.
  3. Table 1 and figure 1 should be improved because it is not clear to the reader.
  4. In section 3.1.2 pH: the information presented in the first paragraph is too simple and general information and should be vomited.
  5. Please avoid writing simple and general information in the whole of the manuscript.
  6. Is there any filtering treatment for the samples before analysis?
  7. The conclusion is too long and can be reduced to being more informative.
  8. Overall the quality of English and grammar needs to be thoroughly checked.

Author Response

RESPONSES OF THE REVIEWER 1 COMMENTS

 

Original Manuscript ID: water-1517946              

Original Article Title: A Machine Learning Approach for Prediction of Water Contamination using Latitude, Longitude and Elevation

 

Dear Reviewer,

Thank you for your useful comments and suggestions for the modification of our manuscript. We have modified the manuscript accordingly, and detailed corrections are listed below point by point.

Reviewer#1, Concern # 1: Abstract: The abstract of the paper has to provide information on key aspects of the paper precisely. The abstract should state briefly the purpose of the research, the principal results and major conclusions. An abstract is often presented separately from the article, so it must be able to stand alone. The abstract need to be revised.

 

Author response:   Thank you very much for your suggestion.

Author action: As per the sugesstions given by the reviewer the abstract has been restructured to provide the key aspects of the research. The abstract now talks about the problem staement followed by the work caried out and its conclusion.

The abstract now:- “One of the significant issues that the world has faced in recent decades has been estimation of water quality and location where safe drinking water is available. Due to the unexpected nature of the mode of water contamination, it isn't easy to analyze the quality and maintain it. Some machine learning techniques are used for predicting contaminating factors but there is no technique that can predict the contamination using Latitude, Longitude and Elevation. The main aim of this paper is to take factors such as water body location and elevation as input to the different machine learning techniques which predicts the contamination. The results are reviewed and analyzed according to groundwater contamination and the chemical composition of groundwater location. Non-changeable factors like Latitude, Longitude, Elevation are used to predict pH, Temperature, Turbidity, Dissolved Oxygen Hardness, Chlorides, Alkalinity, Chemical oxygen demand. Such study has not been conducted in the past where location based factors are used to predict the water contamination of any area. This research focuses on creating a relationship between the Location based Factors with the Factors effecting the water contamination in a given area.”

 

Reviewer#1, Concern # 2: The introduction part is not well structured for the paragraphs need to be revised to incorporate additional information for water pollution by heavy metal ions and organic residues, key references:
IOP Conf. Series: Earth and Environmental Science 958 (2022)
012011
IOP Conf. Series: Earth and Environmental Science 612 (2020)
012023
Meanwhile, there is not enough description of innovation and scientific issues in this part.

 

Author response:   Thank you very much for your suggestion.

Author action: The Introduction has been restrctured as per the sugestions given by the reviewer.

The data set which has been used in this study is collected through ground survey and tested in the lab for different parameters. This data is unique and has never been used for any research in the past. The testing of additional parameters like heavy metals and organic contaminants could not be carried out in our laboratries.


Reviewer#1, Concern # 3: The authors assessed the water quality in the Northern part of India, like pH, temp., hardness, alkalinity ....etc. however, the other important materials for water pollution such as bacterial, heavy metal ions, pesticides and herbicides assessments are missed.

 

Author response:   Thank you very much for your suggestion.

Author action: The data set which has been used in this study is collected through ground survey and tested in the lab for different parameters. This data is unique and has never been used for any research in the past. The testing of additional parameters like heavy metals and organic contaminants, pesticides and herbicides could not be carried out in our laboratries.

 

 Reviewer#1, Concern # 4: The heavy metal ions and organic compounds like pesticides and herbicides residues in water are also related to human disease, some advanced detection work is also recommended in the introduction section. Heavy metals, pesticides and herbicides should be discussed as the major reasons for some chemical contaminants. Please see references.

 

Author response:   Thank you very much for your suggestion.

Author action: The data set which has been used in this study is collected through ground survey and tested in the lab for different parameters. This data is unique and has never been used for any research in the past. The testing of additional parameters like heavy metals and organic contaminants, pesticides and herbicides could not be carried out in our laboratries hence the discussion about these contaminants have not been included. The main aim of this research is to create a relationship between the Location based Factors with the Factors effecting the water contamination in a given area. Some research has been conducted in the past where water contamination has been done of an area for a given time line. But no study is available where water contaminants are predicted using the Location Coordinates. In simple words it can be stated that if we have the data of n Locations in a particular area, this study can help to predict the Water Contaminants of the n+1th location without physically testing the water sample of that location. So the same work can be used for other contaminants which have not been covered in this research.

 

Reviewer#1, Concern # 5: Table 1 and figure 1 should be improved because it is not clear to the reader.

 

Author response:   Thank you very much for your suggestion.

Author action: The Spacing of Table 1 has been increased for more clarity.

 

Although We tried to bring more clarity to Figure 1 by changing the chart type, but cannot be enhanced further. Another major reason of less clarity of the Figure 1 is that there is huge variation in the value of the various parameters portrayed in the graph, due to which some parameters are falling very close to X-axis. To avoid this, if we make different graphs for each factor, then the length of the paper will increase drastically and even comparison of the parameters will become difficult to portray.  For more clarity of the diagram, readers can refer parameter values shown in Table 1.  

 

 Reviewer#1, Concern # 6: In section 3.1.2 pH: the information presented in the first paragraph is too simple and general information and should be vomited.

 

Author response:   Thank you very much for your suggestion.

Author action: Section 3.1.2 has been updated with more specific information as per the suggestions given.


Reviewer#1, Concern # 7: Please avoid writing simple and general information in the whole of the manuscript.

 

Author response:   Thank you very much for your suggestion.

Author action: The paper has been updated as per the suggestions


Reviewer#1, Concern # 8: Is there any filtering treatment for the samples before analysis?

 

Author response:   Thank you very much for your question.

Author action: No filtration has been done for the samples before analysis. The bottles, in which these samples were collected, were properly cleaned and dried in the laboratory. So there is no chance of any external contamination of the samples.

 

Reviewer#1, Concern # 9: The conclusion is too long and can be reduced to being more informative.

 

Author response:   Thank you very much for your suggestion.

Author action: The Conclusion has been updated as per the suggestions given.

 

 

Reviewer#1, Concern # 10: Overall the quality of English and grammar needs to be thoroughly checked.

 

Author response:   Thank you very much for your suggestion.

Author action: The paper has been updated as per the suggestions given.




 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Author Response File: Author Response.docx

Reviewer 2 Report

The topic is interesting.

Methods are sound but their description needs some clarification.

The presentation uses too much space: tables can be shortened and some pictures may be replaced by a table.

Please find thereafter my detailed comments:

1) Provide clear definitions of groundwater and underground water in the Introduction (line 51) with supporting examples

2) Clarify what is meant by changeable and non-changeable parameters (all parameters change value over the sample and in real measurements) in line 70

3) The paragraph at lines 76-85 is out of place since it is a reprise of the text dealing with motivation, while you are in the final part of the Introduction, where you should describe what you do and what you accomplish

4) Please state your original contribution at the end of the Introduction

5) Please do not develop Section 2 on related work by describing in sequence one paper after another (which is quite dull). Try to first group references according to the type of contribution they provide.

6) Table 1 is quite long (nearly three pages!) while it should represent a sample. Also, the caption is quite confusing. It would be better to indicate each variable by an abbreviation and then provide a legend accompanying the table

7) Not all the variables in Table 1 appear to have been described. I couldn't locate the description of BOD, Acidity, and Chloride (which should deserve a subsection like all the other variables)

8) Correct typo in Figure 2 (Sage...)

9) Review numbering of sections (There are two Sections 3.2)

10) Please clearly state which variables act as regressors and which ones as target (output) variables. It looks like location variables are the only regressor and all the other variables act as possible output variables. In that case, you have actually multiple regression problems, where each variable act as the output variable in turn), which is not clear in the text 

11) In Figure 3 it looks like the output of each regressor model acts as an input of another regression model. I guess this is not the case. Also, the figure depicts quite a standard approach and could be dispensed with.

12) When showing the results, the error on the training dataset is typically considered non-significant for evaluation purposes. Of course, it has to be minimized, but what we use to rank algorithms is the error on the testing dataset. Figures 4 through 12 could be removed and replaced by a single table, reporting output variables on the rows and the training dataset error for the four models on the columns

13) The size of the dataset is not shown

14) I guess the dataset is not very large. The authors should adopt a cross-validation approach to make full use of the dataset and obtain a better accuracy

15) Reporting the precise performance figures in the Discussion with a large number of figures after the decimal point may be confusing at that point. Please describe the results on the basis of the figures reported in the results table, which the reader can read for himself.

16) In the Conclusion, it looks like you suggest using one model for all the target variables. Since there is not one model achieving better performance for all the target variables, I guess you could suggest a specific model for each regressor.

Author Response

RESPONSES OF THE REVIEWER 2 COMMENTS

 

Original Manuscript ID: water-1517946              

Original Article Title: A Machine Learning Approach for Prediction of Water Contamination using Latitude, Longitude and Elevation

 

Dear Reviewer,

Thank you for your useful comments and suggestions for the modification of our manuscript. We have modified the manuscript accordingly, and detailed corrections are listed below point by point.

Reviewer#2, Concern # 1: Provide clear definitions of groundwater and underground water in the Introduction (line 51) with supporting examples.

 

Author response:   Thank you very much for your suggestion.

Author action: The defination of ground water along with supporting exaple has been added to the Introduction, as per the suggestions given.

 

Groundwater is the water that exists under the Earth's crust in the capillary pores of rocks and soils, as well as in the cracks of rock formations. When a unit of rock or an undistributed deposit may supply an useable amount of water, it is called an aquifer. The water table is the depth at which soil pore pores, cracks, and cavities in rock become totally saturated with water. Groundwater is refilled from the surface, and it can naturally release at springs and seeps, forming oasis or wetlands. Extraction wells are frequently built and operated to extract groundwater for agricultural, municipal, and industrial purposes. The main sourse of ground water for this particular study has been the borewells of different location of Noida.

 

 

Reviewer#2, Concern # 2: Clarify what is meant by changeable and non-changeable parameters (all parameters change value over the sample and in real measurements) in line 70.

 

Author response:   Thank you very much for your suggestion.

Author action: The Changeable paprameter are those parameters that we are trying to predict in our research. These are the factors that change over time like:- pH, Temperature, Turbidity, Dissolved Oxygen Hardness, Chlorides, Alkalinity, Chemical oxygen demand etc. The non-changeable papramets are those paramenter that don’t change over time,  like geolocation wich consist of Longitude, Latidude and Elevation.

 

Reviewer#2, Concern # 3: The paragraph at lines 76-85 is out of place since it is a reprise of the text dealing with motivation, while you are in the final part of the Introduction, where you should describe what you do and what you accomplish

 

Author response:   Thank you very much for your suggestion.

Author action: The paper has been updated and Introduction has been restructured as per the sugession given.

 

Reviewer#2, Concern # 4: Please state your original contribution at the end of the Introduction

 

Author response:   Thank you very much for your suggestion.

Author action: The Original contribution of the Authors have been added at the end of the introduction.

 

Reviewer#2, Concern # 5: Please do not develop Section 2 on related work by describing in sequence one paper after another (which is quite dull). Try to first group references according to the type of contribution they provide.

 

Author response:   Thank you very much for your suggestion.

Author action: The work that hs been done in the are of Water contamination prediction is so varied that it is nearly impossible to group them up. The techniques and parameters that have been used in these papers were so different from each other that we couldn’t find a method for gruoping them up. As our reserch is novel in terms that non changeable parameters have never been used to predict changeable paparameters of water. Hence we don’t have any base paper for our research.

 

Reviewer#2, Concern # 6: Table 1 is quite long (nearly three pages!) while it should represent a sample. Also, the caption is quite confusing. It would be better to indicate each variable by an abbreviation and then provide a legend accompanying the table

 

Author response:   Thank you very much for your suggestion.

Author action: The caption of the Table has been updated for more clarity as per the suggestions given. This novel data set of 52 locations has been presented in this paper so the other reserchers can make use of this data for further study.

 

Reviewer#2, Concern # 7: Not all the variables in Table 1 appear to have been described. I couldn't locate the description of BOD, Acidity, and Chloride (which should deserve a subsection like all the other variables)

 

Author response:   Thank you very much for your suggestion.

Author action: Cholorided have been discussed in Section 3.1.3. As per the suggestion new sections for BOD and Acidity has been added as section 3.1.9 and 3.1.10 respectively.

 

 

Reviewer#2, Concern # 8: Correct typo in Figure 2 (Sage...)

Author response:   Thank you very much for your suggestion.

Author action: Figure 2 has been updated and the mistake has been removed.

 

 

Reviewer#2, Concern # 9: Review numbering of sections (There are two Sections 3.2)

Author response:   Thank you very much for your suggestion.

Author action: This issue has been removed from the paper and reodering of the subsections has been done.

 

 

Reviewer#2, Concern # 10: Please clearly state which variables act as regressors and which ones as target (output) variables. It looks like location variables are the only regressor and all the other variables act as possible output variables. In that case, you have actually multiple regression problems, where each variable act as the output variable in turn), which is not clear in the text

Author response:   Thank you very much for your suggestion.

Author action: Yes the Location factors are beeing fed as input to the machine learning algorithms and these factors are predicting the different changeable paparamets like pH, Temperature, Turbidity, Dissolved Oxygen Hardness, Chlorides, Alkalinity, Chemical oxygen demand etc.

 

Reviewer#2, Concern # 11: In Figure 3 it looks like the output of each regressor model acts as an input of another regression model. I guess this is not the case. Also, the figure depicts quite a standard approach and could be dispensed with.

Author response:   Thank you very much for your suggestion.

Author action: Figure 3 has been updated as per the suggestions given. Now after picking the parameter the flow moves separately to each Machine Learning Algorithm.

 

 

Reviewer#2, Concern # 12: When showing the results, the error on the training dataset is typically considered non-significant for evaluation purposes. Of course, it has to be minimized, but what we use to rank algorithms is the error on the testing dataset. Figures 4 through 12 could be removed and replaced by a single table, reporting output variables on the rows and the training dataset error for the four models on the columns

 

Author response:   Thank you very much for your suggestion.

Author action: As the data set used in this paper has been collected through ground survey and analyzed through lab testing. This paper may not only aims at predicting water contamination using Location coordinates, but also proposes a new method for predicting the water contamination level of a particular area. As the data was collected through ground survey, and we didn’t find any historical data of this area in terms of water contamination, hence traditional machine learning wasn’t applied. Because of this R2 score may be less for some factors, but as the data set will increase, so will the accuracy also increase. Metrics of error was not included in the paper because of limitation of dataset availability.

 

 

Reviewer#2, Concern # 13: The size of the dataset is not shown

Author response:   Thank you very much for your suggestion.

Author action:  The data set is completely presented in the paper and has been shown in Table No.1. The data set which has been used in this study is collected through ground survey and tested in the lab for different parameters. This data is unique and has never been used for any research in the past. This study has been done for a specific city - “Noida” and it is not possible to have a large data set due to the following reasons:

  • Individual ground-water sampling and analytical events yield results which provide a snapshot picture of hydrogeologic and chemical conditions at a monitoring site. As ground water depth in Noida has gone below 27m in 2020, hence extraction of ground water from each and every location in Noida was not possible. The samples were collected from points where submersible pumps with reach more than 27m were available.
  • As Noida is an Urban city, the availability of wells is negligible and most of the localities rely on supply water hence getting locations where ground water was accessible was very limited.
  • Achieving the information needs of a ground-water sampling program in a specified time requires careful planning and execution of the sampling design, which turned up as a big challenge.
  • Also careful planning is particularly crucial to distinguishing between the actual hydrologic and chemical variability at a site and that which may arise from errors in the sample collection, handling, and analysis procedures.

 

 

 

 

 

Reviewer#2, Concern # 14: I guess the dataset is not very large. The authors should adopt a cross-validation approach to make full use of the dataset and obtain a better accuracy

 

Author response:   Thank you very much for your suggestion.

Author action: : As the data set used in this paper has been collected through ground survey and analyzed through lab testing. This paper may not only aims at predicting water contamination using Location coordinates, but also proposes a new method for predicting the water contamination level of a particular area. As the data is limited hence cross validatation approch has very limited scope in this research

 

 

Reviewer#2, Concern # 15: Reporting the precise performance figures in the Discussion with a large number of figures after the decimal point may be confusing at that point. Please describe the results on the basis of the figures reported in the results table, which the reader can read for himself.

Author response:   Thank you very much for your suggestion.

Author action: The updations has been made in the Discussion section as per the suggestions given.

 

Reviewer#2, Concern # 16: In the Conclusion, it looks like you suggest using one model for all the target variables. Since there is not one model achieving better performance for all the target variables, I guess you could suggest a specific model for each regressor

 

Author response:   Thank you very much for your suggestion.

Author action: The updations has been made in the Coclusion section as per the suggestions given.

 

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Thank you for the revised version of your manuscript. I recommend this manuscript for publication.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

The title of this article is "A Machine Learning Approach for Water Contamination Prediction using Geo Location", but then in the article there is no longer any trace of the Geo Location as part of the methodology.

Then, the authors state in the abstract that in this study some machine learning techniques were used for predicting contaminating factors. Then, in the introduction they state that machine learning methodologies were used to predict the location of the contamination. Therefore, there is a significant discrepancy between the stated objectives and the study carried out. That said, the manuscript is poorly organized and poorly written. The study is not rigorous, and the article is full of serious flaws. I proceed to list the most obvious:

  • The number of data is insufficient for a methodology based on machine learning algorithms;
  • The abstract is not sufficiently representative of the contents, the title is misleading.
  • The literature review is insufficient. Many relevant references were not considered.
  • Section 2 consists of a set of copied and pasted sentences;
  • The description of the algorithms is poor;
  • Coefficient of determination R2 alone is not a sufficient metric to compare results. At least one metric representative of the errors must be added.
  • R2 takes values less than 1, it certainly does not assume the values shown in the figures by the authors.
  • The results and discussion sections are so poor that they are embarrassing. In fact, the results are not discussed, but only presented. There is no interpretation and any comparison with the existing literature.
  • English needs to be significantly improved.

I can only recommend the rejection of this manuscript.  

Reviewer 2 Report

The proposed concept of Machine Learning for Water Contamination is not novel. The methodology section is presented very poor with wrong presentation. ML methods presentation and descriptions are poor with various errors in mathematical representation. false references had been given to the model descriptions.Validation is not correct. Comparative analysis is not implemented correctly.

Back to TopTop