Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification

Suleiman, Ahmad Abubakar; Yousafzai, Arsalaan Khan; Zubair, Muhammad

doi:10.3390/ASEC2023-15506

Open AccessProceeding Paper

Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification^†

by

Ahmad Abubakar Suleiman

^1,2,*

,

Arsalaan Khan Yousafzai

^3,4

and

Muhammad Zubair

⁵

¹

Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia

²

Department of Statistics, Aliko Dangote University of Science and Technology, Wudil 713281, Nigeria

³

Department of Civil and Environmental Engineering, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia

⁴

Department of Civil Engineering, University of Engineering & Technology, Peshawar 25000, Pakistan

⁵

Department of Computer and information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 4th International Electronic Conference on Applied Sciences, 27 October–10 November 2023; Available online: https://asec2023.sciforum.net/.

Eng. Proc. 2023, 56(1), 249; https://doi.org/10.3390/ASEC2023-15506

Published: 31 October 2023

(This article belongs to the Proceedings of The 4th International Electronic Conference on Applied Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Ensuring access to safe drinking water is a critical concern, particularly in regions with limited resources. This study evaluates groundwater potability using a range of machine learning models, including logistic regression, K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), and Random Forest, as well as deep learning models such as Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Feedforward Neural Networks (FNNs), and Long Short-Term Memory (LSTM). We collected thirty groundwater samples from residential and industrial locations in Jaen, Kano State, Nigeria, focusing on nine crucial physicochemical parameters: electric conductivity, pH, total dissolved solids, calcium, magnesium, chloride, zinc, manganese, and copper. Machine learning models, such as Logistic Regression and Random Forest, achieved accuracy scores of 0.833. They were closely followed by deep learning models, such as ANNs, with an accuracy score of 0.833, and LSTM, which scored 0.666. KNN and SVC provided moderately accurate predictions, scoring 0.667, while CNN and FNN achieved lower scores of 0.333 and 0.5, respectively. This study represents a significant step toward ensuring safe drinking water for communities and preserving the sustainability of natural resources.

Keywords:

groundwater; artificial intelligence; machine learning; deep learning; classification; logistic regression; Random Forest; artificial neural network; convolutional neural network

1. Introduction

Groundwater serves as a vital source of drinking water worldwide. To ensure the safety and purity of groundwater for domestic use, it is imperative to regularly assess its quality, a crucial step in enhancing the well-being of the growing global population [1]. Safe drinking water is a critical component of public health and environmental sustainability. Access to clean and potable groundwater is critical, especially in limited resources areas. In many of these places, numerous physicochemical parameters might impair groundwater quality, providing serious health hazards to residents. Traditional methods of determining groundwater potability require time-consuming and expensive laboratory studies. Moreover, the groundwater quality can also be affected depending upon the type of pumping technique employed [2]. Several researchers have employed various techniques to assess the quality of groundwater for drinking purposes, for instance, multivariate statistics [3], an automatic exponential smoothing model [4], explanatory analysis [5], theoretical probability models [6,7], correlation and regression analyses [8], and other methods. However, groundwater quality is influenced by a complex interplay of physicochemical parameters, including electric conductivity (EC), pH, total dissolved solids (TDS), and concentrations of various ions and minerals. Ensuring the safety of this vital resource necessitates the development of reliable predictive models that can rapidly and accurately classify groundwater samples as potable or non-potable.

The introduction of machine learning and deep learning techniques has opened new paths for efficient, accurate, and cost-effective potability prediction. These techniques have emerged as powerful tools, offering the potential to enhance and complement traditional methods for more accurate and efficient evaluation. While deep learning techniques have been widely used in computer vision applications, such as Optical Character Recognition (OCR) [9], face and skin detection/recognition [10,11], visual pattern classification [12], and image classification [13], limited work has been performed in the domain of water quality evaluation [14,15]. This study embarks on a comprehensive evaluation of machine learning and deep learning models for groundwater potability classification. We explore the efficacy of machine learning models, such as logistic regression, K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), and Random Forest, alongside deep learning models, including Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Feedforward Neural Networks (FNNs), and Long Short-Term Memory (LSTM). These models are applied to groundwater samples collected from both industrial and residential locations, with a specific focus on nine important physicochemical parameters.

The primary aim of this study is to identify the most effective machine learning models for this specific task. The motivation behind this research lies in the fundamental importance of safe and potable groundwater for public health and sustainable water resource management, particularly where water scarcity and contamination are pressing concerns. Therefore, by evaluating and comparing machine learning models, this research offers a novel and practical approach to groundwater potability prediction, bridging the gap between data science and real-world challenges in water resource management and public health.

This paper is outlined as follows: Section 2 contains the materials and methods. Section 3 provides results and discussion. Section 4 gives the concluding remarks.

2. Materials and Methods

In this section, we investigate classification algorithms based on machine learning and deep learning and evaluate their effectiveness in classifying groundwater potability. Among the machine learning algorithms are logistic regression, KNN, SVC, and Random Forest. Deep learning models, on the other hand, include ANNs, CNNs, FNNs, and LSTM. The performance of each model was evaluated based on key metrics, including accuracy, precision, recall, and F1-Score. These metrics provide insights into the models’ ability to classify groundwater samples correctly and are indicative of their overall performance. Python is used for all computations, training, and testing. The general flowchart of the investigation is depicted in Figure 1.

2.1. Water Sampling and Data Preprocessing

Thirty groundwater samples were randomly taken from open wells and boreholes in both industrial and residential areas during the study that was carried out in August 2020 in Jaen, Kano State, Nigeria. The Geographical Positioning System (GPS) was used to locate the sampling stations by determining their latitude and longitude coordinates. Following standard procedures, these samples were then meticulously maintained in sterile plastic bottles and kept in an icebox. For each groundwater sample, fifteen physicochemical parameters were measured. In this study, we are specifically focusing on nine of these parameters, namely electric conductivity (EC), pH, total dissolved solids (TDS), calcium, magnesium, chloride, zinc, manganese, and copper. Most of the parameters were expressed in milligrams per liter (mg/L), except for EC (µS/cm), pH, and TDS (NTU).

Additionally, the data set featured binary labels denoting the potability of each sample, with ‘1′ signifying potable water and ‘0′ indicating non-potable water. The potability and non-potability standards for drinking water quality in Nigeria can be found in [16]. The class distribution of the groundwater data used for binary classification is shown in Figure 2.

To prepare the data for analysis, the following preprocessing steps were performed:

Data cleaning: any missing or erroneous data points were identified and either corrected or removed from the dataset;
Feature scaling: continuous variables were scaled to have a mean of 0 and a standard deviation of 1 to ensure that all features contributed equally to model training;
Data split: the data set was divided into training and testing sets using an 80-20 split, ensuring that the same split was applied consistently across all models.

2.2. Model Selection and Training

For the comparative analysis, four machine learning models and four deep learning models were chosen.

2.2.1. Machine Learning Models

Logistic Regression

Logistic regression is one of the most popular machine learning algorithms and comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. This concept of predictive modeling falls under the classification algorithm. It is used in this study to predict a potability class of groundwater from the set of predators. The logistic curve relates the independent variable,

X

, to the mean of the dependent variable,

Y

. This relationship can be expressed as

P = \frac{\exp (a + b X)}{1 + \exp (a + b X)} = \frac{1}{1 + \exp - (a + b X)},

where

P

is the probability of a 1 (the proportion of 1s and the mean of

Y

),

\exp

is the base of the natural logarithm, and

a

and

b

are the parameters of the model.

2.: Random Forest Classifier

Random Forest is a popular ensemble learning technique used for both classification and regression tasks. It is an ensemble of decision trees, where each tree independently makes predictions, and the final prediction is determined by a majority vote (for classification) or the averaging (for regression) of the individual tree predictions. The ‘random’ aspect in Random Forest refers to two main sources of randomness:

Bootstrap Aggregation (Bagging): Multiple subsets of the training data are created through bootstrapping, a resampling technique. Each decision tree in the forest is trained on a different subset, introducing diversity among the trees.
Feature Randomness: Random subsets of features are considered when splitting nodes in each tree. This ensures that not all features are used for every split, reducing the risk of overfitting and improving generalization.

3.: K-Nearest Neighbors

The K-Nearest Neighbors (KNN) algorithm is a popular machine learning technique used for classification and regression tasks. It relies on the idea that similar data points tend to have similar labels or values. During the training phase, the KNN algorithm stores the entire training dataset as a reference. When making predictions, it calculates the distance between the input data point and all the training examples using a chosen distance metric such as Euclidean distance. Next, the algorithm identifies the K-Nearest Neighbors to the input data point based on their distances. In the case of classification, the algorithm assigns the most common class label among the K neighbors as the predicted label for the input data point. For regression, it calculates the average or weighted average of the target values of the K neighbors to predict the value for the input data point. With

N

sample size and

p_{_{i}}

probability of every

i

sample, the nearest neighbor is expressed as

p_{i} = \sum_{j \in K_{i}}^{N} p_{i j},

where

K_{i}

donates the set of points that fall within the same class as sample

i

, and

p_{i j}

donates the softmax over Euclidean distances within the embedded space as given by

p_{i j} = \frac{\exp (- {|D x_{i} - D x_{j}|}^{2})}{\sum_{m \neq i}^{N} \exp - ({|D x_{i} - D x_{m}|}^{2})}, p_{i i} = 0 .

4.: Support Vector Classifier

Support Vector Classifier (SVC) is one of the most commonly used supervised algorithms that works for both regression and classification tasks, but generally, it works best for classification problems. SVC algorithms are used to find a hyperplane that best separates the two classes in N-dimensional spaces. A hyperplane is simply a line if there are only two input features, and a two-dimensional plane if there are only two input features. The maximum distance will exist between hyperplanes and the nearest elements of the classes in the best-case scenario when the data are perfectly separable. SVC often aims to approximate this scenario as accurately as possible. Although various classes of manifolds are used instead of hyperplanes in nonlinear SVC, the basic principle is the same. In nonlinear SVC, hyperplanes are replaced with different classes of manifolds, but the principle remains the same.

2.2.2. Deep Learning Models

Artificial Neural Networks (ANNs)

ANNs are foundational deep learning models that consists of interconnected layers of artificial neurons, organized into input, hidden, and output layers. These models are highly versatile and excel at capturing complex patterns and relationships in data.

2.: Convolutional Neural Networks (CNNs)

CNNs are specifically designed for processing grid-like data, such as images. They comprise convolutional layers to automatically detect features and patterns in data. While originally developed for image analysis, CNNs can be adapted to various domains.

3.: Feedforward Neural Networks (FNNs)

FNNs, like ANNs, are a type of feedforward neural network. They are characterized by layers of interconnected neurons, each connected to the next layer without feedback loops. FNNs are commonly used for regression and classification tasks.

4.: Long Short-Term Memory (LSTM)

LSTM is designed to handle sequences and time series data. It excels at capturing long-range dependencies in data, making it a suitable choice for sequential data such as time series measurements [17].

For each model, the following steps were executed:

Model Training: The model was trained on the training dataset using default hyperparameters;
Model Evaluation: The model’s performance was evaluated on the testing dataset using accuracy score.

3. Results and Discussion

3.1. Descriptive Statistics

Table 1 provides a statistical overview of the groundwater parameters that were measured. Table 1 shows that there are significant variances within these parameters with especially high standard deviations, suggesting the greatest variability. The conductivity ranges from a low of 209 to a high of 1490, with a mean of 672.73, indicating significant diversity in groundwater conductivity. The pH of the water ranges from 5.59 to 7.22, with a mean of 6.67, indicating the presence of acidity. A positive skewness implies that the data have a right tail, whereas a negative skewness indicates that the data have a left tail. Kurtosis readings indicate that the groundwater has platykurtic distributions. These investigations not only help us understand groundwater composition, but they also have practical consequences for water quality monitoring and environmental management.

3.2. Correlation Analysis

Correlation analysis is a useful statistical tool for determining associations between various water quality parameters. We can obtain useful insights into the complex mechanisms that govern groundwater quality by identifying and these relationships [18,19,20]. The correlation analysis results demonstrate substantial positive and negative relationships among the evaluated groundwater parameters, indicating similar patterns in their variations that are statistically significant at the 1% level (Figure 3). Notably, the extraordinarily strong correlation of 0.99 between calcium and magnesium implies a nearly linear relationship, implying that changes in calcium and/or magnesium concentration have a strong relationship within groundwater samples. Furthermore, conductivity and TDS have a strong positive association of 0.99. Hence, the groundwater quality of the study area is influenced by the complex interplay of these parameters.

3.3. Model Performance Evaluation

The comparative analysis of machine learning and deep learning models for groundwater potability prediction yielded insightful results. The primary evaluation metrics, including accuracy, precision, recall, and F1-Score, were employed to assess the performance of each model, providing a comprehensive view of model effectiveness.

Table 2 shows that four distinct machine learning models were employed to assess groundwater potability. Logistic regression and Random Forest yielded the highest accuracy of 0.833, accompanied by a precision score of 0.900. Logistic regression maintained a recall of 0.750, demonstrating reliable predictive accuracy, while Random Forest achieved a recall of 0.750. KNN and SVC delivered moderate accuracy with scores of 0.667. Among these models, Random Forest and Logistic regression maintained a balanced F1-Score of 0.778. These findings underscore the potential of machine learning models for precise groundwater classification, highlighting their relevance in water quality assessment.

Figure 4 displays a graphical representation of the comparative performance metrics across various machine learning model algorithms. The visual presentation aids in readily identifying the highest accuracy scores achieved by different model algorithms.

Table 3 presents the outcomes of the deep learning models applied for groundwater potability classification assessment. Among these models, the ANN achieved the highest accuracy, 0.833, along with a remarkable precision score of 0.800. The ANN exhibited an impressive recall of 0.987, signifying its exceptional ability to correctly identify potable groundwater. LSTM also delivered reliable results with an accuracy score of 0.666 and a balanced precision of 0.750. Conversely, the CNN and FNN displayed lower accuracy scores of 0.333 and 0.500, respectively. These results highlight the diverse performance of deep learning models and their potential for accurate groundwater classification, which is especially notable in the case of ANN and LSTM, underlining their significance in groundwater quality classification assessment.

Figure 5 provides a graphical illustration of the comparative performance metrics across diverse deep learning model algorithms. This visual representation facilitates the quick identification of the highest accuracy scores attained by different model algorithms.

3.4. Discussion

The results from our comprehensive evaluation of machine learning and deep learning models have provided valuable insights into their effectiveness in predicting groundwater potability. In the realm of machine learning, Logistic regression and Random Forest have demonstrated their capability by achieving the highest accuracy scores of 0.833. Logistic regression, in particular, maintained a commendable recall of 0.750, emphasizing its consistent predictive accuracy. Similarly, Random Forest showcased its potential with a recall of 0.750. KNN and SVC exhibited moderate accuracy levels, both scoring 0.667. Notably, Random Forest and Logistic regression achieved balanced F1-Scores of 0.778, highlighting their reliability.

These findings underscore the potential of machine learning models in precisely classifying groundwater, emphasizing their significance in water quality assessment. Machine learning models are well poised for applications in real-world scenarios where accurate predictions are crucial. The strength of machine learning models lies in their ability to handle data with low correlations among parameters.

On the other hand, deep learning models have also made significant contributions to the groundwater potability prediction task. ANN delivered an accuracy score of 0.833, rivaling the performance of Logistic regression and Random Forest. LSTM demonstrated its potential with an accuracy score of 0.666. Although CNN and FNN achieved lower accuracy scores of 0.333 and 0.5, respectively, they remain promising in specific applications.

However, it is important to note that while accuracy is a valuable metric, it may not be the sole determinant of model suitability. Factors such as computational efficiency, interpretability, and the specific needs of the application must be considered when selecting a model for real-world deployment. Moreover, the potential for further model refinement through hyperparameter tuning and the exploration of ensemble techniques should not be overlooked. These avenues may enhance the predictive capabilities of the models and offer improved potability classifications for groundwater resources in the study area.

4. Concluding Remarks

This study has provided an extensive comparative assessment of machine learning and deep learning models for predicting groundwater potability in the Jaen district of Kano State, Nigeria. Our findings offer valuable insights into the selection and performance of these models, particularly in situations characterized by low correlations among groundwater parameters. Logistic regression and the decision tree classifier have emerged as standout performers, each achieving an impressive accuracy score of 0.833. These models not only exhibit robust predictive capabilities but also offer interpretability, positioning them as promising candidates for practical applications in water resource management and public health initiatives. On the other hand, our exploration of deep learning models yielded a range of outcomes. ANN exhibited a remarkable accuracy score of 0.833, underlining its potential for accurate predictions. LSTM followed closely with a score of 0.666, demonstrating strong predictive abilities. CNN and FNN delivered slightly lower scores, emphasizing the need for further investigation and refinement. However, we emphasize that the selection of a suitable model should consider various factors, such as computational efficiency and the specific requirements of the application. Future research directions include the fine-tuning of models and the exploration of ensemble techniques to enhance predictive accuracy. These ongoing efforts hold the promise of advancing the precision of groundwater potability predictions, contributing to the overarching goal of ensuring clean and safe drinking water for communities and the sustainable management of vital natural resources.

Author Contributions

Conceptualization, A.A.S.; methodology, A.A.S.; software, A.A.S.; validation, A.A.S., A.K.Y. and M.Z.; writing—original draft, A.A.S., A.K.Y. and M.Z.; data curation, A.A.S.; visualization, M.Z., A.K.Y. and A.A.S. All authors have read and agreed to the published version of the manuscript.

Funding

Special gratitude is owed to the Tertiary Education Trust Fund (TETFund) for funding this research through the institution-based grant allocated to Aliko Dangote University of Science and Technology, Wudil, Nigeria.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data in this paper are available from the corresponding author upon request.

Acknowledgments

The authors extend their appreciation to Universiti Teknologi PETRONAS for providing research facilities. Lastly, the authors would like to convey special thanks to the organizers and sponsors of this conference.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Suleiman, A.A.; Abdullahi, U.A.; Suleiman, A.; Yunus, R.B.; Suleiman, S.A. Assessment of Groundwater Quality Using Multivariate Statistical Techniques. In Intelligent Systems Modeling and Simulation II: Machine Learning, Neural Networks, Efficient Numerical Algorithm and Statistical Methods; Springer: Berlin/Heidelberg, Germany, 2022; pp. 567–579. [Google Scholar]
Adil, M.; Arshad, M.; Aslam, M. Low Cost Water Pumping for Sustainable Irrigation Using Renewable Energy Based Ram Pump. In Proceedings of the 5th International Mechanical Engineering Congress, Karachi, Pakistan, 9–10 May 2015; pp. 9–10. [Google Scholar]
Thomas, E.O. Evaluation of groundwater quality using multivariate, parametric and non-parametric statistics, and GWQI in Ibadan, Nigeria. Water Sci. 2023, 37, 117–130. [Google Scholar] [CrossRef]
Nsabimana, A.; Wu, J.; Wu, J.; Xu, F. Forecasting groundwater quality using automatic exponential smoothing model (AESM) in Xianyang City, China. Hum. Ecol. Risk Assess. Int. J. 2023, 29, 347–368. [Google Scholar] [CrossRef]
Suleiman, A.A.; Ibrahim, A.; Abdullahi, U.A. Statistical explanatory assessment of groundwater quality in Gwale LGA, Kano state, northwest Nigeria. Hydrospatial Anal. 2020, 4, 1–13. [Google Scholar] [CrossRef]
Singh, V.V.; Suleman, A.A.; Ibrahim, A.; Abdullahi, U.A.; Suleiman, S.A. Assessment of probability distributions of groundwater quality data in Gwale area, north-western Nigeria. Ann. Optim. Theory Pract. 2020, 3, 37–46. [Google Scholar]
Ibrahim, A.; Suleiman, A.A.; Abdullahi, U.A.; Suleiman, S.A. Monitoring Groundwater Quality using Probability Distribution in Gwale, Kano state, Nigeria. J. Stat. Model. Anal. (JOSMA) 2021, 3, 2. [Google Scholar] [CrossRef]
Suleiman, A.A.; Abdullahi, U.A.; Suleiman, A.; Suleiman, S.A.; Abubakar, H.U. Correlation and regression model for physicochemical quality of groundwater in the Jaen District of Kano State, Nigeria. J. Stat. Model. Anal. (JOSMA) 2022, 4, 1. [Google Scholar] [CrossRef]
Rehman, S.U.; Tu, S.; Huang, Y.; Rehman, O.U. A benchmark dataset and learning high-level semantic embeddings of multimedia for cross-media retrieval. IEEE Access 2018, 6, 67176–67188. [Google Scholar] [CrossRef]
Rehman, S.U.; Tu, S.; Huang, Y.; Yang, Z. Face recognition: A novel un-supervised convolutional neural network method. In Proceedings of the 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 28–29 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 139–144. [Google Scholar]
Rehman, S.U.; Tu, S.; Waqas, M.; Huang, Y.; Rehman, O.; Ahmok, B.; Ahmad, S. Unsupervised pre-trained filter learning approach for efficient convolution neural network. Neurocomputing 2019, 365, 171–190. [Google Scholar] [CrossRef]
Tu, S.; Huang, Y.; Liu, G. CSFL: A novel unsupervised convolution neural network approach for visual pattern classification. Ai Commun. 2017, 30, 311–324. [Google Scholar]
Rehman, S.U.; Tu, S.; Rehman, O.U.; Huang, Y.; Magurawalage, C.M.S.; Chang, C.-C. Optimization of CNN through novel training strategy for visual classification problems. Entropy 2018, 20, 290. [Google Scholar] [CrossRef]
Prasad, D.V.V.; Venkataramana, L.Y.; Kumar, P.S.; Prasannamedha, G.; Harshana, S.; Srividya, S.J.; Harrinei, K.; Indraganti, S. Analysis and prediction of water quality using deep learning and auto deep learning techniques. Sci. Total Environ. 2022, 821, 153311. [Google Scholar] [CrossRef] [PubMed]
Im, Y.; Song, G.; Lee, J.; Cho, M. Deep Learning Methods for Predicting Tap-Water Quality Time Series in South Korea. Water 2022, 14, 3766. [Google Scholar] [CrossRef]
Nigerian Standard for Drinking Water Quality. Available online: https://africacheck.org/sites/default/files/Nigerian-Standard-for-Drinking-Water-Quality-NIS-554-2015.pdf (accessed on 1 October 2023).
Othman, M.; Indawati, R.; Suleiman, A.A.; Qomaruddin, M.B.; Sokkalingam, R. Model Forecasting Development for Dengue Fever Incidence in Surabaya City Using Time Series Analysis. Processes 2022, 10, 2454. [Google Scholar] [CrossRef]
Suleiman, A.A.; Abdullahi, U.A.; Suleiman, A.; Suleiman, S.A.; Abubakar, B.; Muhammad, T.; Kabir Isah, N.; Hussaini Tafida, A. Assessment of Groundwater Quality Parameters of Jaen District, Kano State, Nigeria. BIMA 2021, 5, 288–297. [Google Scholar]
Salleh, S.F.; Suleiman, A.A.; Daud, H.; Othman, M.; Sokkalingam, R.; Wagner, K. Tropically Adapted Passive Building: A Descriptive-Analytical Approach Using Multiple Linear Regression and Probability Models to Predict Indoor Temperature. Sustainability 2023, 15, 13647. [Google Scholar] [CrossRef]
Suleiman, A.A.; Suleiman, A.; Abdullahi, U.A.; Suleiman, S.A. Estimation of the case fatality rate of COVID-19 epidemiological data in Nigeria using statistical regression analysis. Biosaf. Health 2021, 3, 4–7. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Study flowchart.

Figure 2. Distribution of dataset for binary classification.

Figure 3. Correlation matrix for the groundwater parameters.

Figure 4. Comparative performance metrics of machine learning algorithms.

Figure 5. Comparative performance metrics of deep learning algorithms.

Table 1. Statistical summary of the groundwater parameters.

Parameters	Mean	Median	Min	Max	Std. Deviation	Skewness	Kurtosis
EC	672.73	640	209	1490	302.61	0.63	0.32
pH	6.67	6.73	5.59	7.22	0.41	−0.92	0.55
TDS	339.2	322.5	104	731	151.26	0.54	0.04
Calcium	0.59	0.32	0.07	1.75	0.5	1.21	0.14
Magnesium	0.25	0.15	0.03	0.74	0.21	1.19	0.14
Chloride	1.26	0.8	0	4.6	1.18	1.25	1.09
Zinc	0.12	0.11	0.01	0.7	0.13	3.35	14.58
Manganese	0.09	0.1	0	0.3	0.09	0.77	−0.17
Copper	0.11	0.1	0.04	0.3	0.06	1.88	3.64

Table 2. Machine learning model performance for groundwater potability classification.

Machine Learning Algorithm	Accuracy	Precision	Recall	F1-Score
Logistic Regression	0.833	0.900	0.750	0.778
KNN	0.667	0.333	0.500	0.400
SVC	0.667	0.333	0.500	0.400
Random Forest	0.833	0.900	0.750	0.778

Table 3. Deep learning model performance for groundwater potability classification.

Deep Learning Algorithm	Accuracy	Precision	Recall	F1-Score
ANN	0.833	0.800	0.987	0.888
CNN	0.333	0.5000	0.500	0.500
FNN	0.5000	0.666	0.500	0.571
LSTM	0.666	0.750	0.750	0.750

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Suleiman, A.A.; Yousafzai, A.K.; Zubair, M. Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification. Eng. Proc. 2023, 56, 249. https://doi.org/10.3390/ASEC2023-15506

AMA Style

Suleiman AA, Yousafzai AK, Zubair M. Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification. Engineering Proceedings. 2023; 56(1):249. https://doi.org/10.3390/ASEC2023-15506

Chicago/Turabian Style

Suleiman, Ahmad Abubakar, Arsalaan Khan Yousafzai, and Muhammad Zubair. 2023. "Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification" Engineering Proceedings 56, no. 1: 249. https://doi.org/10.3390/ASEC2023-15506

Article Menu

Comparative Analysis of Machine Learning and Deep Learning Models for Groundwater Potability Classification^†

Abstract

1. Introduction