# Groundwater Level Prediction with Machine Learning to Support Sustainable Irrigation in Water Scarcity Regions

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

^{2}, was selected as the study area. Bilate is typical of areas in Africa with high demand for water and limited availability of well data. Using a non-time series database of 75 boreholes, machine learning models, including multiple linear regression, multivariate adaptive regression splines, artificial neural networks, random forest regression, and gradient boosting regression (GBR), were constructed to predict the depth to the water table. The study considered 20 independent variables, including elevation, soil type, and seasonal data (spanning three seasons) for precipitation, specific humidity, wind speed, land surface temperature during day and night, and Normalized Difference Vegetation Index (NDVI). GBR performed the best of the approaches, with an average 0.77 R-squared value and a 19 m median absolute error on testing data. Finally, a map of predicted water levels in the Bilate watershed was created based on the best model, with water levels ranging from 1.6 to 245.9 m. With the limited set of borehole data, the results show a clear signal that can provide guidance for borehole drilling decisions for sustainable irrigation with additional implications for drinking water.

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Study Area

^{2}. Bilate is one of the largest watersheds in the Ethiopia Rift Valley Basin [24,25]. Elevation of the region ranges from 1194 m to 3216 m, as shown in Figure 2. The Bilate region was classified as a three-season climate type, with major rains from June to September, dry season from October to January, and minor rains from February to May [26,27]. In Bilate, the minor rains typically start in March. However, for our analysis, we adhered to the general classification, commencing from February, as referenced in [24,25]. The annual average precipitation within the Bilate watershed spans from 769 mm in lower regions to 1339 mm in the highlands. Meanwhile, the mean annual temperature fluctuates between 11 °C and 22 °C in Hossana, situated upstream, and ranges from 16 °C to 30 °C at Bilate Tena, the lower stream of the watershed [24].

^{−5}and 2.78 × 10

^{−1}m/s

^{2}[28].

**Figure 2.**Bilate watershed (pink region on the lower left plot) belongs to a basin called Rift Valley (light green basin on the upper left plot). The map of Bilate watershed with boreholes and elevation [29] is shown on the right.

#### 2.2. Data Description

#### 2.3. Resampling Methods

#### 2.3.1. Leave-One-Out Cross-Validation

#### 2.3.2. Bootstrapping

#### 2.4. Machine Learning Algorithms

#### 2.4.1. Multiple Linear Regression

#### 2.4.2. Multivariate Adaptive Regression Spline

#### 2.4.3. Artificial Neural Networks

#### 2.4.4. Random Forest Regression

_{0}) and slope (β

_{1}) of the transformed prediction:

_{0}+ β

_{1}× ŷ

_{0}is the coefficient for the intercept, and β

_{1}is the coefficient for ŷ. The objective is to find the parameters that minimize the mean square error:

#### 2.4.5. Gradient Boosting Regression

^{th}observed value; $\mathrm{f}\left({\mathrm{x}}_{\mathrm{i}}\right)$ is the predicted response value; and n is the number of observations. Next, the predictions from the decision tree are combined with the current ensemble’s predictions to obtain an updated prediction. This updated prediction is added to the ensemble. Then, the residuals are recalculated using the updated predictions. The new residuals represent the errors that were not captured by the current ensemble. The process continues for a specified number of iterations or until a certain stopping criterion is met. The final prediction is obtained by summing the predictions from the entire ensemble. By iteratively correcting the errors of the previous models, gradient boosting regression is able to learn complex relationships and improve predictive accuracy.

#### 2.5. Evaluation Metrics

## 3. Results

#### 3.1. Mutual Information Analysis

#### 3.2. Multiple Linear Regression

`findCorrelation`() randomly picks one predictor to remove. The remaining predictors include soil type (X2), precipitation (X3–5), wind speed from Feb to May (X10), LST at daytime from Feb to May (X13), LST at nighttime from Oct to Jan (X15), NDVI from Feb to May (X19), and NDVI from Jun to Sep (X20). Table 3 shows the predictors remaining after removal along with their coefficients. We found the factors, including the euric vertisols soil type (X2) and NDVI from Jun to Sep (X20), had a significant relationship with the static water level at a 0.05 significance level.

#### 3.3. Multivariate Adaptive Regression Spline

#### 3.4. Artificial Neural Networks

`nnet`and

`caret`packages in R. The nnet package is designed to support a single hidden layer sandwiched between the input and output layers. In the preprocessing stage, the model is set to center and scale predictors, which is a common strategy to normalize variable scales.

#### 3.5. Random Forest Regression

`randomForest`package in R based on the training dataset, applying LOOCV as the resampling method. The hyperparameters for the RFR model included the number of randomly selected predictors at each split (mtry), the node size, and the number of trees (ntree). These hyperparameters were collectively tuned within a for loop. For mtry, a grid range of 1 to 10 was set and fine-tuned using the ‘caret’ package, whereas the node size was assessed at 5, 6, and 7, and the ntree parameter was evaluated between 50 and 200. Following this comprehensive grid search, the optimal settings were found to be an mtry value of 8, a node size of 5, and a ntree value of 60.

_{1}) were −29.65 and 1.3, respectively. Comparing the residual plot for the original RFR model and the post-processed model, we see that the bias of the original RFR model has been mitigated.

#### 3.6. Gradient Boosting Regression

`gbm`package in R. The hyperparameters such as the minimum number of observations in a node required for a split (n.minobsinnode), the boosting model’s complexity (interaction.depth), the number of iterations, and the learning rate were optimized via grid search. We assessed n.minobsinnode at values of 5 and 10, interaction.depth within a range of 1 to 3, iterations at 50, 70, 100, and 120, and learning rates at 0.1 and 0.01. The final optimal hyperparameters, which led to the best model performance, were 5, 2, 100, and 0.1, respectively.

## 4. Discussion

#### 4.1. Important Variables Analysis

#### 4.2. Model Performance Evaluation and Comparison

#### 4.3. Grid Points Prediction Evaluation Based on the Best Model

#### 4.4. Final Map of the Predicted Water Level

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Chandrasekharan, K.M.; Subasinghe, S.; Haileslassie, A. Mapping Irrigated and Rainfed Agriculture in Ethiopia (2015–2016) Using Remote Sensing Methods; International Water Management Institute (IWMI): Colombo, Sri Lanka, 2021; ISBN 978-92-9090-913-2. [Google Scholar]
- FAO. Small Family Farms Country Factsheet Ethiopia—Food and Agriculture; FAO: Rome, Italy, 2018. [Google Scholar]
- Haileslassie, A.; Agide, Z.; Erkossa, T.; Hoekstra, D.; Schmitter, P.; Langan, S. On-Farm Smallholder Irrigation Performance in Ethiopia: From Water Use Efficiency to Equity and Sustainability; ILRI Editorial and Publishing Services: Addis Ababa, Ethiopia, 2016; ISBN 978-92-9146-468-5. [Google Scholar]
- Khan, M.S.; Coulibaly, P. Application of Support Vector Machine in Lake Water Level Prediction. J. Hydrol. Eng.
**2006**, 11, 199–205. [Google Scholar] [CrossRef] - Liang, C.; Li, H.; Lei, M.; Du, Q. Dongting Lake Water Level Forecast and Its Relationship with the Three Gorges Dam Based on a Long Short-Term Memory Network. Water
**2018**, 10, 1389. [Google Scholar] [CrossRef] - Chen, S.; Qiao, Y. Short-Term Forecast of Yangtze River Water Level Based on Long Short-Term Memory Neural Network. IOP Conf. Ser. Earth Environ. Sci.
**2021**, 831, 012051. [Google Scholar] [CrossRef] - Choi, C.; Kim, J.; Han, H.; Han, D.; Kim, H.S. Development of Water Level Prediction Models Using Machine Learning in Wetlands: A Case Study of Upo Wetland in South Korea. Water
**2020**, 12, 93. [Google Scholar] [CrossRef] - Wang, Q.; Wang, S. Machine Learning-Based Water Level Prediction in Lake Erie. Water
**2020**, 12, 2654. [Google Scholar] [CrossRef] - Assem, H.; Ghariba, S.; Makrai, G.; Johnston, P.; Gill, L.; Pilla, F. Urban Water Flow and Water Level Prediction Based on Deep Learning. In Machine Learning and Knowledge Discovery in Databases; Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., Žitnik, M., Ceci, M., Džeroski, S., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; Volume 10536, pp. 317–329. ISBN 978-3-319-71272-7. [Google Scholar]
- Kim, D.; Han, H.; Wang, W.; Kim, H.S. Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method. Water
**2022**, 14, 466. [Google Scholar] [CrossRef] - Sahoo, S.; Jha, M.K. Groundwater-Level Prediction Using Multiple Linear Regression and Artificial Neural Network Techniques: A Comparative Assessment. Hydrogeol. J.
**2013**, 21, 1865–1887. [Google Scholar] [CrossRef] - Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine Learning Algorithms for Modeling Groundwater Level Changes in Agricultural Regions of the U.S. Water Resour. Res.
**2017**, 53, 3878–3895. [Google Scholar] [CrossRef] - Zhang, J.; Zhu, Y.; Zhang, X.; Ye, M.; Yang, J. Developing a Long Short-Term Memory (LSTM) Based Model for Predicting Water Table Depth in Agricultural Areas. J. Hydrol.
**2018**, 561, 918–929. [Google Scholar] [CrossRef] - Liu, D.; Mishra, A.K.; Yu, Z.; Lü, H.; Li, Y. Support Vector Machine and Data Assimilation Framework for Groundwater Level Forecasting Using GRACE Satellite Data. J. Hydrol.
**2021**, 603, 126929. [Google Scholar] [CrossRef] - Hikouei, I.S.; Eshleman, K.N.; Saharjo, B.H.; Graham, L.L.B.; Applegate, G.; Cochrane, M.A. Using Machine Learning Algorithms to Predict Groundwater Levels in Indonesian Tropical Peatlands. Sci. Total Environ.
**2023**, 857, 159701. [Google Scholar] [CrossRef] - Rahman, A.T.M.S.; Hosono, T.; Quilty, J.M.; Das, J.; Basak, A. Multiscale Groundwater Level Forecasting: Coupling New Machine Learning Approaches with Wavelet Transforms. Adv. Water Resour.
**2020**, 141, 103595. [Google Scholar] [CrossRef] - Wen, X.; Feng, Q.; Deo, R.C.; Wu, M.; Si, J. Wavelet Analysis–Artificial Neural Network Conjunction Models for Multi-Scale Monthly Groundwater Level Predicting in an Arid Inland River Basin, Northwestern China. Hydrol. Res.
**2016**, 48, 1710–1729. [Google Scholar] [CrossRef] - Bahmani, R.; Ouarda, T.B.M.J. Groundwater Level Modeling with Hybrid Artificial Intelligence Techniques. J. Hydrol.
**2021**, 595, 125659. [Google Scholar] [CrossRef] - Liu, W.; Yu, H.; Yang, L.; Yin, Z.; Zhu, M.; Wen, X. Deep Learning-Based Predictive Framework for Groundwater Level Forecast in Arid Irrigated Areas. Water
**2021**, 13, 2558. [Google Scholar] [CrossRef] - Wu, Z.; Lu, C.; Sun, Q.; Lu, W.; He, X.; Qin, T.; Yan, L.; Wu, C. Predicting Groundwater Level Based on Machine Learning: A Case Study of the Hebei Plain. Water
**2023**, 15, 823. [Google Scholar] [CrossRef] - Kochhar, A.; Singh, H.; Sahoo, S.; Litoria, P.K.; Pateriya, B. Prediction and Forecast of Pre-Monsoon and Post-Monsoon Groundwater Level: Using Deep Learning and Statistical Modelling. Model. Earth Syst. Environ.
**2022**, 8, 2317–2329. [Google Scholar] [CrossRef] - Mohaghegh, A.; Farzin, S.; Anaraki, M.V. A New Framework for Missing Data Estimation and Reconstruction Based on the Geographical Input Information, Data Mining, and Multi-Criteria Decision-Making; Theory and Application in Missing Groundwater Data of Damghan Plain, Iran. Groundw. Sustain. Dev.
**2022**, 17, 100767. [Google Scholar] [CrossRef] - Ramirez, S.G.; Williams, G.P.; Jones, N.L.; Ames, D.P.; Radebaugh, J. Improving Groundwater Imputation through Iterative Refinement Using Spatial and Temporal Correlations from In Situ Data with Machine Learning. Water
**2023**, 15, 1236. [Google Scholar] [CrossRef] - Orke, Y.A.; Li, M.-H. Hydroclimatic Variability in the Bilate Watershed, Ethiopia. Climate
**2021**, 9, 98. [Google Scholar] [CrossRef] - Tekle, A. Assessment of Climate Change Impact on Water Availability of Bilate Watershed, Ethiopian Rift Valley Basin. In Proceedings of the AFRICON 2015, Addis Ababa, Ethiopia, 14–17 September 2015; pp. 1–5. [Google Scholar]
- Wolde-Georgis, T.; Aweke, D.; Hagos, Y. The Case of Ethiopia Reducing the Impacts of Environmental Emergencies through Early Warning and Preparedness: The Case of the 1997–98 El Niño; National Meteorological Service Agency (NMSA): Addis Ababa, Ethiopia, 2000. [Google Scholar]
- Legese, W.; Koricha, D.; Ture, K. Characteristics of Seasonal Rainfall and Its Distribution Over Bale Highland, Southeastern Ethiopia. J. Earth Sci. Clim. Chang.
**2018**, 9, 1000443. [Google Scholar] [CrossRef] - Verner, K.; Megerssa, L.; Hroch, T.; Buriánek, D.; Martínek, K.; Janderková, J.; Šíma, J.; Kryštofová, E.; Gebremariyam, H.; Tadesse, E.; et al. Explanatory Notes to the Thematic Geoscientific Maps of Ethiopia at a Scale of 1:50,000; Map Sheet 0637-D3 Arba Minch; Czech Geological Survey: Prague, Czech Republic, 2018. [Google Scholar]
- Alaska Satellite Facility. Available online: https://asf.alaska.edu/ (accessed on 1 August 2022).
- Muluneh, M. Web-Based Decision Support Systems for Managing Water Resources of Abaya Chamo Basin Project; In progress; Water and Land Resource Center, Addis Ababa University: Addis Ababa, Ethiopia, 2018; Available online: https://wlrc-eth.org/ (accessed on 28 September 2023).
- U.S. Geological Survey USGS EROS Archive—Digital Elevation—Shuttle Radar Topography Mission (SRTM) 1 Arc-Second Global. Available online: https://www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-shuttle-radar-topography-mission-srtm-1#overview (accessed on 1 August 2022).
- Food and Agriculture Organization of the United Nations Harmonized World Soil Database. Available online: https://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/harmonized-world-soil-database-v12/en/ (accessed on 16 November 2022).
- GPM IMERG Final Precipitation L3 1 Month 0.1 Degree x 0.1 Degree V06. Goddard Earth Sciences Data and Information Services Center (GES DISC). Greenbelt, MD, 2019. Available online: https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGM_06/summary (accessed on 31 July 2022).
- McNally, A. GES DISC Dataset: FLDAS Noah Land Surface Model L4 Global Monthly 0.1 × 0.1 Degree (MERRA-2 and CHIRPS) (FLDAS_NOAH01_C_GL_M 001). Available online: https://disc.gsfc.nasa.gov/datasets/FLDAS_NOAH01_C_GL_M_001/summary (accessed on 31 July 2022).
- Wan, Z.; Hook, S.; Hulley, G. MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V006. NASA EOSDIS Land Processes DAAC. 2015. Available online: https://lpdaac.usgs.gov/products/mod11a1v006/ (accessed on 31 July 2022).
- Didan, K. MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006. NASA EOSDIS Land Processes DAAC. 2015. Available online: https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/MOD13Q1 (accessed on 1 August 2022).
- QGIS Development Team QGIS Geographic Information System. Available online: https://qgis.org/en/site/ (accessed on 1 August 2023).
- R Core Team R: The R Project for Statistical Computing. Available online: https://www.r-project.org/ (accessed on 1 August 2023).
- Hastie, T.; Friedman, J.; Tibshirani, R. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2001; ISBN 978-1-4899-0519-2. [Google Scholar]
- Anaraki, M.V.; Kadkhodazadeh, M.; Morshed-Bozorgdel, A.; Farzin, S. Predicting Rainfall Response to Climate Change and Uncertainty Analysis: Introducing a Novel Downscaling CMIP6 Models Technique Based on the Stacking Ensemble Machine Learning. J. Water Clim. Chang.
**2023**, 14, jwc2023477. [Google Scholar] [CrossRef] - Greitzer, F.L.; Li, W.; Laskey, K.B.; Lee, J.; Purl, J. Experimental Investigation of Technical and Human Factors Related to Phishing Susceptibility. ACM Trans. Soc. Comput.
**2021**, 4, 1–48. [Google Scholar] [CrossRef] - Tang, L.; Mahmoud, Q.H. A Survey of Machine Learning-Based Solutions for Phishing Website Detection. Mach. Learn. Knowl. Extr.
**2021**, 3, 672–694. [Google Scholar] [CrossRef] - Zhou, W. Condition State-Based Decision Making in Evolving Systems: Applications in Asset Management and Delivery. Ph.D. Thesis, George Mason University, Fairfax, VA, USA, 2023. [Google Scholar]
- Zantalis, F.; Koulouras, G.; Karabetsos, S.; Kandris, D. A Review of Machine Learning and IoT in Smart Transportation. Future Internet
**2019**, 11, 94. [Google Scholar] [CrossRef] - Harvey, A.; Laskey, K.; Chang, K.-C. Machine Learning Applications for Sensor Tasking with Non-Linear Filtering. Sensors
**2022**, 6, 2229. Available online: https://www.researchgate.net/profile/Kathryn-Laskey/publication/350358429_Machine_Learning_Applications_for_Sensor_Tasking_with_Non-Linear_Filtering/links/605e0c18299bf173676e9028/Machine-Learning-Applications-for-Sensor-Tasking-with-Non-Linear-Filtering.pdf (accessed on 28 September 2023). - Fan, Z. Models and Algorithms for Data-Driven Scheduling. Ph.D. Thesis, George Mason University, Fairfax, VA, USA, 2023. [Google Scholar]
- Fan, Z.; Chang, K.; Raz, A.K.; Harvey, A.; Chen, G. Sensor Tasking for Space Situation Awareness: Combining Reinforcement Learning and Causality. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; pp. 1–9. [Google Scholar]
- Freedman, D.A. Statistical Models: Theory and Practice; Cambridge University Press: Cambridge, UK, 2009; ISBN 978-1-139-47731-4. [Google Scholar]
- Friedman, J.H. Multivariate Adaptive Regression Splines. Ann. Stat.
**1991**, 19, 1–67. [Google Scholar] [CrossRef] - Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012; ISBN 978-0-262-30432-0. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Liaw, A.; Wiener, M. Classification and Regression by RandomForest. Forest
**2001**, 2, 18–22. [Google Scholar] - Zhang, G.; Lu, Y. Bias-Corrected Random Forests in Regression. J. Appl. Stat.
**2012**, 39, 151–160. [Google Scholar] [CrossRef] - Malhotra, S.; Karanicolas, J. A Numerical Transform of Random Forest Regressors Corrects Systematically-Biased Predictions. arXiv
**2020**, arXiv:2003.07445. [Google Scholar] - Ross, B.C. Mutual Information between Discrete and Continuous Data Sets. PLoS ONE
**2014**, 9, e87357. [Google Scholar] [CrossRef] [PubMed] - Li, X.; Li, G.; Zhang, Y. Identifying Major Factors Affecting Groundwater Change in the North China Plain with Grey Relational Analysis. Water
**2014**, 6, 1581–1600. [Google Scholar] [CrossRef] - Shi, H.; Guo, J.; Deng, Y.; Qin, Z. Machine Learning-Based Anomaly Detection of Groundwater Microdynamics: Case Study of Chengdu, China. Sci. Rep.
**2023**, 13, 14718. [Google Scholar] [CrossRef] - Sapitang, M.; Ridwan, W.M.; Ahmed, A.N.; Fai, C.M.; El-Shafie, A. Groundwater Level as an Input to Monthly Predicting of Water Level Using Various Machine Learning Algorithms. Earth Sci. Inform.
**2021**, 14, 1269–1283. [Google Scholar] [CrossRef] - Seeyan, S.; Merkel, B.; Abo, R. Investigation of the Relationship between Groundwater Level Fluctuation and Vegetation Cover by Using NDVI for Shaqlawa Basin, Kurdistan Region—Iraq. J. Geogr. Geol.
**2014**, 6, p187. [Google Scholar] [CrossRef] - Hao, X.; Li, W. Impacts of Ecological Water Conveyance on Groundwater Dynamics and Vegetation Recovery in the Lower Reaches of the Tarim River in Northwest China. Environ. Monit. Assess.
**2014**, 186, 7605–7616. [Google Scholar] [CrossRef] - Kenda, K.; Čerin, M.; Bogataj, M.; Senožetnik, M.; Klemen, K.; Pergar, P.; Laspidou, C.; Mladenić, D. Groundwater Modeling with Machine Learning Techniques: Ljubljana Polje Aquifer. Proceedings
**2018**, 2, 697. [Google Scholar] [CrossRef] - Kanyama, Y.; Ajoodha, R.; Seyler, H.; Makondo, N.; Tutu, H. Application of Machine Learning Techniques In Forecasting Groundwater Levels in the Grootfontein Aquifer. In Proceedings of the 2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Kimberley, South Africa, 25–27 November 2020; pp. 1–8. [Google Scholar]
- Sharafati, A.; Asadollah, S.B.H.S.; Neshat, A. A New Artificial Intelligence Strategy for Predicting the Groundwater Level over the Rafsanjan Aquifer in Iran. J. Hydrol.
**2020**, 591, 125468. [Google Scholar] [CrossRef] - Gintamo, T.T. Ground Water Potential Evaluation Based on Integrated GIS and Remote Sensing Techniques, in Bilate River Catchment: South Rift Valley of Ethiopia. Am. Sci. Res. J. Eng. Technol. Sci.
**2014**, 10, 85–120. [Google Scholar]

**Figure 3.**A simple neural network with one hidden layer [50].

**Figure 6.**Plot of residuals (m) versus predicted values (m) on training sample for (

**a**) MLR; (

**b**) MARS; (

**c**) ANN; (

**d**) Original RFR; (

**e**) RFR with linear transformation; (

**f**) GBR.

**Figure 8.**Observed (m) versus predicted (m) plots on full data set for (

**a**) MLR; (

**b**) MARS; (

**c**) ANN; (

**d**) Original RFR; (

**e**) RFR with linear transformation; (

**f**) GBR.

**Figure 9.**(

**a**) Taylor diagram for the training data; (

**b**) Taylor diagram for the testing data. The circle on the x-axis represents the observed or the reference dataset. The x-axis indicates the standard deviation of the predicted data, whereas the y-axis represents the centered root mean square difference between the predicted and observed data. The radial distance from the origin (angle with the x-axis) represents the correlation coefficient between the predicted and observed data. Together, these three axes provide a comprehensive view of model performance.

**Figure 10.**(

**a**) Residual plot for the predicted water level for the nearest grid point; (

**b**) Actual static water level versus predicted water level for the nearest grid point.

Data | Unit | Source and Description | Type |
---|---|---|---|

Static water level | m | 75 borehole points collected by AWTI in 2007 | Numerical |

Elevation | m | U.S. Geological Survey (USGS) Digital Elevation Shuttle Radar Topography Mission (SRTM) with 30 m resolution [31] | Numerical |

Soil type | -- | FAO Harmonized World Soil Database v 1.2 [32]. Four categories: chromic luvisols, eutric vertisols, humic nitisols, and vitric or mollic andosols | Categorical |

Precipitation | mm/hour | NASA Global Precipitation Measurement with 0.1 degree spatial resolution [33] | Numerical |

Specific humidity | Kg/Kg | NASA Famine Early Warning Systems Network Land Data Assimilation System (FLDAS) Noah Land Surface Model with 0.1 degree spatial resolution [34] | Numerical |

Wind speed | m/s | ||

LST at daytime | °K | USGS Moderate Resolution Imaging Spectroradiometer (MODIS) Terra Land Surface Temperature with 1 km spatial resolution [35] | Numerical |

LST at nighttime | |||

NDVI | -- | USGS MODIS Terra Vegetation Indices 16-day at 250 m spatial resolution [36] | Numerical |

Variable | Description | Variable | Description | Variable | Description |
---|---|---|---|---|---|

Y | Static water level | X1 | Elevation | X2 | Soil type |

X3 | Precipitation Oct to Jan (monthly ave) | X4 | Precipitation Feb to May (monthly ave) | X5 | Precipitation Jun to Sep (monthly ave) |

X6 | Specific humidity Oct to Jan (daily ave) | X7 | Specific humidity Feb to May (daily ave) | X8 | Specific humidity Jun to Sep (daily ave) |

X9 | Wind speed Oct to Jan (daily ave) | X10 | Wind speed Feb to May (daily ave) | X11 | Wind speed Jun to Sep (daily ave) |

X12 | LST daytime Oct to Jan (daily ave) | X13 | LST daytime Feb to May (daily ave) | X14 | LST daytime Jun to Sep (daily ave) |

X15 | LST nighttime Oct to Jan (daily ave) | X16 | LST nighttime Feb to May (daily ave) | X17 | LST nighttime Jun to Sep (daily ave) |

X18 | NDVI Oct to Jan (16-day ave) | X19 | NDVI Feb to May (16-day ave) | X20 | NDVI Jun to Sep (16-day ave) |

Variables | Coefficients | Standard Error | p Value |
---|---|---|---|

Intercept | 354.04 | 1963 | 0.85 |

X2 Eutric Vertisols | −97.05 | 40.22 | 0.02 ** |

X2 Humic Nitisols | −13.83 | 27.07 | 0.61 |

X2 Vitric & Mollic Andosols | −19.95 | 33.77 | 0.56 |

X3 | −9.25 | 5.93 | 0.12 |

X4 | −8.96 | 11.28 | 0.43 |

X5 | 2.03 | 1.58 | 0.20 |

X10 | 42.15 | 57.00 | 0.46 |

X13 | 3.30 | 6.26 | 0.60 |

X15 | −5.53 | 7.23 | 0.45 |

X19 | 65.85 | 162.0 | 0.69 |

X20 | 249.16 | 91.97 | 0.009 ** |

Dataset | Model | RMSE (m) | MAE (m) | R Squared |
---|---|---|---|---|

MLR | 43.56 | 28.54 | 0.40 | |

MARS | 27.85 | 18.67 | 0.76 | |

Training | ANN | 3.55 | 0.27 | 0.99 |

Original RFR | 19.51 | 9.05 | 0.88 | |

RFR with linear transformation | 15.55 | 6.41 | 0.92 | |

GBR | 15.66 | 9.28 | 0.92 | |

Testing | MLR | 45.46 | 23.62 | 0.37 |

MARS | 38.32 | 24.94 | 0.55 | |

ANN | 35.48 | 15.43 | 0.61 | |

Original RFR | 33.66 | 23.79 | 0.65 | |

RFR with linear transformation | 30.34 | 16.97 | 0.72 | |

GBR | 27.86 | 21.84 | 0.76 |

Dataset | Model | RMSE (m) | MAE (m) | R Squared |
---|---|---|---|---|

MLR | 42.49 | 29.92 | 0.43 | |

MARS | 40.56 | 28.46 | 0.47 | |

Training | ANN | 7.58 | 2.10 | 0.96 |

RFR | 19.92 | 12.83 | 0.88 | |

GBR | 12.74 | 7.58 | 0.95 | |

Testing | MLR | 46.81 | 26.14 | 0.31 |

MARS | 49.63 | 34.39 | 0.23 | |

ANN | 36.45 | 23.74 | 0.49 | |

RFR | 29.46 | 18.77 | 0.68 | |

GBR | 24.55 | 18.92 | 0.77 |

Model | Data | RMSE (m) | MAE (m) | R Squared |
---|---|---|---|---|

GBR | Training | 31.45 | 26.39 | 0.69 |

Testing | 36.61 | 30.02 | 0.60 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, W.; Finsa, M.M.; Laskey, K.B.; Houser, P.; Douglas-Bate, R.
Groundwater Level Prediction with Machine Learning to Support Sustainable Irrigation in Water Scarcity Regions. *Water* **2023**, *15*, 3473.
https://doi.org/10.3390/w15193473

**AMA Style**

Li W, Finsa MM, Laskey KB, Houser P, Douglas-Bate R.
Groundwater Level Prediction with Machine Learning to Support Sustainable Irrigation in Water Scarcity Regions. *Water*. 2023; 15(19):3473.
https://doi.org/10.3390/w15193473

**Chicago/Turabian Style**

Li, Wanru, Mekuanent Muluneh Finsa, Kathryn Blackmond Laskey, Paul Houser, and Rupert Douglas-Bate.
2023. "Groundwater Level Prediction with Machine Learning to Support Sustainable Irrigation in Water Scarcity Regions" *Water* 15, no. 19: 3473.
https://doi.org/10.3390/w15193473