Machine Learning Approach to Predict the Illite Weight Percent of Unconventional Reservoirs from Well-Log Data: An Example from Montney Formation, NE British Columbia, Canada

Barham, Azzam; Zainal Abidin, Nor Syazwani

doi:10.3390/app14010318

Open AccessArticle

Machine Learning Approach to Predict the Illite Weight Percent of Unconventional Reservoirs from Well-Log Data: An Example from Montney Formation, NE British Columbia, Canada

by

Azzam Barham

and

Nor Syazwani Zainal Abidin

^*

Department of Geosciences, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(1), 318; https://doi.org/10.3390/app14010318

Submission received: 15 August 2023 / Revised: 10 December 2023 / Accepted: 11 December 2023 / Published: 29 December 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Shale mineralogy is critical for the proper design and execution of hydraulic fracturing operations and for evaluating production potential. There has been relatively little research into using artificial intelligence for mineralogical prediction for the Montney Formation. This study aims to predict the Montney Formation illite wt.% using readily available conventional logs, where illite is one of the constituents of shale and can aid in analyzing the brittle and ductile zones within the shale formation. The wt.% of illite is often determined by examining core samples or cuttings using XRD or QEMSCAN; both techniques are time-consuming, costly, and cannot be performed without physical samples. Based on conventional log readings, this study uses artificial neural networks (ANNs) and principal component analysis (PCA) to construct an accurate prediction model for illite wt.%. The feed-forward neural network (FFNN) obtained good overall performance in illite wt.% prediction (R² = 92%) utilizing the backpropagation algorithm and the B.R. technique for eight input parameters. The ANN model was tested by randomly selecting three wells from the same log dataset excluded from the core training and testing phases. Overall, R² = 88.5% was found in the tests, which is encouraging. This work demonstrates the viability of employing the ANN in applications involving evaluating mineralogical components of a target source rock using traditional logs, especially when geochemical data are missing or inadequate.

Keywords:

illite; Montney Formation; artificial neural networks; principal component analysis; backpropagation algorithm

1. Introduction

Understanding the flow and fluid transport potential of shale hydrocarbon reserves necessitates describing their textural qualities, such as porosity and permeability. The amount of pore space and reservoir permeability influences the amount of hydrocarbons stored in the rocks. Despite its tight sandstone and siltstone compositions interbedded with shale, the Lower Triassic Montney Formation of British Columbia (100–300 m thick) features fine-grained clastic deposits classified as gas shale plays. Prior to recent breakthroughs in horizontal drilling and hydraulic fracture stimulation, the unconventional source rock layers of the Montney Formation were not economically viable [1].

Unconventional hydrocarbon shale reservoirs have received less attention than conventional sandstone reservoirs in core examinations utilizing traditional analytical techniques such as X-ray diffraction (XRD) and Scanning Electron Microscopy (SEM). The standard industry practice for unconventional reservoirs is to conduct a mineralogical investigation of chosen samples to calibrate the petrophysical data provided by downhole logging devices. As a result, quantitative (or semi-quantitative) mineralogical analyses are carried out over the entire drilled stratigraphic sequence. These findings are crucial because the relative concentrations of the (mineral) constituents can make or break a potential resource play [2].

The presence of illite in petroleum reservoirs may play a crucial role in influencing management and production activities within the petroleum sector, as its exact value might have major implications. The precipitation of substances within reservoir pores has been observed to have negative effects on both porosity and permeability, leading to the obstruction of fluid flow. This phenomenon is particularly prominent in reservoirs with low porosity and permeability, such as tight reservoirs [2,3]. Nevertheless, illite is predominantly favored for the purpose of hydraulic fracturing due to its tendency to signify rocks that are brittle and non-reactive to water [2,4].

The principal elements of illite in shale strata are detrital mica, its weathering products, and diagenetic illite production during burial [4]. Illite, a potassium aluminum phyllosilicate clay mineral, is widely found in shale and other sedimentary rock formations. This mineral group includes clay minerals with a 1 nm X-ray powder diffraction repeat [5,6,7,8,9]. Within this classification, mixed-layer illite/smectite (I/S) is a distinct form of illite in which the unit cell layers are shuffled like playing cards [10].

I/S is common in shales; a significant amount of illite may also be I/S. In most sedimentary basins and geologic periods, the percentage of illite in I/S increases with depth and temperature [11]. Smectite can gradually transform into illite in a solid state or layer-by-layer, with the illite ultimately inheriting the fundamental structure of smectite [12]. Nadeau [13] has recently proposed elemental particles and interparticle diffraction to explain mixed-layer clays. Although other minerals, such as smectite and feldspar, disintegrate in shales, illite crystals tend to form in them. While the subject of I/S remains disputed, an increase in the illite content in I/S with increasing burial depth indicates the formation of progressively thicker illite crystals [14].

The precise value of illite may play a significant role in the administration and production procedures of petroleum reservoirs. The weight percentage of illite obtained here is the product of rigorous, experimental analysis of core samples. Although beneficial, these methods have limitations due to their reliance on core samples, which can sometimes be unavailable or lost, and the substantial effort and cost involved in drilling even a small number of wells.

In addition, most logging instrument operations currently available cannot directly measure the illite, only the total clay. Machine learning algorithms have been widely applied to various fields of petroleum science and engineering to improve efficiency and reliability over conventional methods [15]. Numerous previous studies have combined machine learning with mineral analysis. The use of A.I. in mineralogy prediction has received scant attention in academic literature. To better develop shale gas and oil and to provide a precise solution for various production operations without essential information about the shale mineralogical composition, it is helpful to find a suitable approach to predict these shale essential minerals in such unconventional resources [16]. The prediction of clay and quartz content in organic-rich shales has been previously investigated through the analysis of well-log responses. These responses encompass several measurements such as gamma-ray, acoustic, density, neutron porosity, resistivity, and mineralogy logs [16]. Another study used the NPOR, DEN, D.T., and G.R. logs to predict the weight fractions of quartz, k-feldspar, calcite, dolomite, pyrite, and total clay [17]. A model was proposed to predict the contents of 12 minerals in gas hydrate sediment based on XRD patterns with the integration of well-log data, using four machine learning algorithms: LSTM, MLP, R.F., and CNN algorithms [18]. We can see that nearly all of them concentrated on mineralogical contents.

Combining the dimension reduction of principal component analysis (PCA) with the A.I. technique of artificial neural networks (ANNs) has proven to be a crucial combination for estimating reservoir properties in recent years. Many applications, including data mining, prediction, risk assessment, uncertainty quantification, and data integration, have emerged due to these methods’ inherent capacities to capture the nonlinearity and complex heterogeneity in the reservoir [19,20,21,22]

This study focused entirely on illite, which causes the most extraction problems, especially for gas recovery. By developing a model based on the ANN, which is quick and shown to provide highly reliable results, this study focuses on predicting the illite wt.% while taking advantage of the conventional well log data.

2. Geological Background

The Montney Formation comprises a tight siltstone reservoir that was deposited on the western edges of the North American continent during the Lower Triassic. The sediments that make up the Montney Formation were derived mainly from the felsic topography of the Canadian Shield and deposited in the basin when the climate was hot, dry, and characterized by periodic monsoons. Sediments were deposited in the hot, acidic, and sometimes oxygen-free Triassic Ocean. Because of these reasons, the deposited sediment predominantly consisted of quartz, chert, feldspars, dolomite, and mixed-layer I/S clays. For a long time, scientists disagreed on whether or not the Montney Formation contained any clay minerals in appreciable amounts [23]. Authigenic clay precipitation occurred in an intermediate stage, followed by quartz and feldspar dissolution and the second phase of carbonate precipitation [24].

3. ANN Conception and Benefits

The ANN is the foundation of A.I. that solves issues that prove impossible or troublesome by human or statistical standards. ANNs have self-learning capabilities that modify them to provide better results as more data become available. When combined with the dimension reduction technique of principal component analysis (PCA), intelligent methods such as ANNs are effective for TOC estimation [19,25,26].

ANNs are the foremost powerful A.I. techniques. The ANN methodology is essential in building models for analysis and data analysis without relying on an ex-model or classical statistics that describe the behavior of statistical cases. The process simulates data to generate a robust model accurately representing the data points. The model is also applicable at any time [27].

Unlike mathematical models, which require complete information on all parameters and their relationships, neural networks can use data from multiple scenarios to estimate the illite wt.% parameters even when they lack information on all the possible causal variables and their relationships [21]. The neural network achieves this by creating a model that correctly interpolates between closely similar patterns on which it is trained. The network learns sophisticated nonlinear interactions even if the input relationship is noisy, imprecise, and poorly understood. Pattern recognition is recognized as one of the ANNs’ most potent skills [19,21] because of its capacity to internalize the dependency between input and output.

Because learning relies entirely on pattern recognition, a trained network classifies a learned pattern, predicting expected events [22]. Many users like how simple it is to apply ANN techniques because (1) you do not have to entirely submit to the physical formula that governs the rest of the physical world, and (2) the approaches are entirely nonparametric and linear-free [27]. Parametric tests are predicated on the assumption that the data conform to a particular distribution, such as the normal distribution, and possess specific characteristics, such as homogeneity of variance. Nonparametric tests are not reliant on the assumptions made by parametric tests and can be utilized for analyzing a wide range of data, including ordinal or nominal data [28]. In this study, we utilize the ANN approach to estimate illite wt.% using standard well-log parameters, taking advantage of the linearity-free nature of the approach. As is generally known, big data rarely matches basic patterns of predicted behavior. The fundamental problem is a lack of awareness of the linked factors influencing illite wt.%.

Even with robust computer-aided history matching, several essential relationships may be missed due to the influence of a significant latent issue or the application of inconsistent elements [27]. Some common inconsistent elements in a multi-layer perceptron (MLP) (Figure 1) are noisy data, overfitting, improper learning rate, and data imbalance. This inconsistency can be prevented by cleaning the data, using a proper regularization technique, or establishing an appropriate weight initialization.

4. Methodology

4.1. Workflow for Network Design

An ANN model’s development is anticipated inside a unified process flow. Data gathering is only the beginning of the lengthy process of building an ANN model; other aspects, such as training and validation, are just as important. Because the ANN relies on data, it needs to be stored and managed correctly. A unified process flow enables the construction, prediction, and fine-tuning of an ANN model. In a neural network, each drawback calls for a different strategy [29,30,31,32,33,34]. A graphic workflow was required to coordinate the numerous neural network model creation phases (Figure 2).

4.2. Data Acquisition

Illite wt.% was extracted from the whole quantitative rock. Clay fraction XRD data were obtained from the British Columbia Oil Gas Commission (BCOGC) database representing ten wells for a total of 306 core and cutting samples (Figure 3) along with a complete set of conventional well log data representing depth, B.D., API, resistivity, sonic (DT-C, DT-SH), and spectral gamma-rays (U, Th, K, SGR, CGR) (Table 1). The well-log responses represent only the depths of the obtained core samples.

To prevent utilizing data representing any other mineralogical constituents, notably the I/S mixed layer, only data representing illite wt.% were chosen depending on the Th/K ratio; hence, data falling between 2 and 3.5 (ppm/wt.%) according to [35] were used to build the network. A total of 206 readings out of 307 fall within the range suitable for this study.

4.3. Statistical Processing

With neural networks, predictions can be made using statistics from the data and all available information. When utilizing the model to make predictions, we must consider how much our predictions differ from the actual values. If the prediction errors are significant, we must guarantee that the data used to train the model are consistent. Statistics helps us achieve this by employing whole numbers and considering the data’s minimum, maximum, and standard deviation. In this study, we looked at the smallest and largest values in the data and the average and spread of the data (standard deviation) [36]. We further analyzed the data using PCA and the Pearson correlation coefficient to understand the data better. Based on previous research, these approaches simplify the machine-learning process by reducing the factors we must consider [37].

4.4. Input Data Selection and Division

Neural networks often depict the strength of individual neurons, which analyze many input values and produce an output value as a continuous variable [38]. Selecting data for a neural network is tricky, whereby good correlations require input variables [39]. Data physics connects parameters and elements to simplify understanding. Identifying every potential input is challenging due to the multiple complexities involved, including evaluating the input data generation settings [40]. You can train numerous networks with different input parameters to identify and select the best model [40]. Finding the inputs with the greatest impact on the output is critical for accurate prediction [32,41]. These methodologies were tested to discover which elements performed best, providing the majority of the data for this study.

The neural network data are divided into training, validation, and testing [20]. Training sets contain data used to train and adjust the network’s structure. Training technique success depends on selecting a meaningful and well-chosen training set. Test sets are data patterns with exact input and output variables used for network testing to ensure good performance. A validation dataset evaluates a neural network’s capacity to generalize to novel patterns. Its ability to generalize new patterns cannot be assessed. Therefore, training patterns do not include test patterns [38].

A training set is only used to evaluate training performance and data. This set does not support network adaptation. A good practice is to build the test set by extracting 20–30% of the patterns before training, which works well. Usually, 70–80% of data are for training and the rest are for testing [42]. This study split the original data sets as 70% for training and 30% for testing.

4.5. Data Scaling

Before loading the variables into the network, they must be scaled from their original range into the range the neural network can handle efficiently. Networks often function in two numerical domains, ranging from 0 to 1 and −1 to 1, depending on the activation functions used.

The data were scaled or normalized to ensure that all their attributes were roughly of the same size and significance. Normalization is advantageous for classification algorithms that utilize distance or neural networks, such as nearest-neighbor classification and clustering [43]. The maximum and minimum methods normalize data to the interval [0, 1], while the mean and standard deviation methods normalize data to the interval [0, 1] [43]. Using the Min-Max technique, the data were organized as shown below (Equation (1)):

P_{n} = 2 * \frac{(P - P_{m i n})}{(P_{m a x} - P_{m i n})} - 1

(1)

where the parameter (P) represents input data.

4.6. ANN Architecture

Various network topologies are accessible, including competitive neural networks, Jordan neural networks, Fully Recurrent Neural Networks (FRNNs), and Multi-Layer Perceptron neural networks (MLP). In this study, we employed the most popular type of neural network model, the MLP network, trained using feed-forward (FF-ANN) learning [29,44]. The FF-ANN distinguishes itself through its rapid processing capabilities compared to alternative models, evident through its use of supervised learning and nonlinear classification techniques. Nonetheless, it is essential to acknowledge the inherent limitations of this network, which encompass challenges in providing comprehensive explanations for specific outcomes, as the inner workings of the hidden layers remain unobservable. Additionally, this network necessitates substantial volumes of data for effective operation and is susceptible to overfitting processes.

Each FF-ANN ANN model comprises at least one hidden layer with multiple nodes, one output layer with one or more output nodes, and one input layer with multiple input nodes (Figure 4). Each node in a network can manipulate data using a unique activation function, such as the sigmoidal function [21,44]. FF-ANNs, on the other hand, have no connection between nodes located on the same layer. Regularization is often utilized in learning methods to increase generalization capacity when working with small sample sets or when data samples are meaningfully contaminated by noise [45]. In this research, we used the Bayesian Regularization (B.R.) method to improve the ANN’s generalization ability by modifying the objective function of the ANN to avoid overfitting and to smooth down ANN’s mapping by enforcing certain prior distributions on the model parameters and penalizing large weights [46].

The network identifies the deviation by comparing its output to the desired outcome. The network then modifies its weights (w) and biases (b) (Tables S1 and S2) so that its output (y) is more in line with the target value (o). The mean sum of squares of network errors, a measure of the gap between the target and actual outputs (t and y, respectively) (Equation (2)) [47], is a common performance function.

o = \frac{1}{m} \sum_{i = 1}^{m} {(t_{i} - y_{i})}^{2}

(2)

Figure 4. Visualizations of artificial neural networks (ANNs) showing the architecture following [48].

5. Results and Discussion

5.1. Statistical Analysis

For each conventional log, 206 were taken, and the essential statistical properties were determined using descriptive statistics to shed light on the physical significance of the data (Table 1). The data were analyzed using the Pearson correlation coefficient and principal component analysis, identifying the factors that significantly impact the illite wt.% to choose the most relevant input parameters.

According to the Pearson correlation matrix, the parameters affecting the illite wt.% in ascending order were K, Resist., Depth, CGR, GR, B.D., SGR, and U (Table 2). Dimensional reduction evaluation PCA is considered one of the oldest and most widely used algorithms. Its basic premise is to minimize a dataset’s dimensionality while retaining as much “variability” (i.e., statistical information) as possible. As a descriptive tool, PCA requires no distributional assumptions and, as such, is a flexible exploratory method that may be used for a wide range of numerical data [49]. PCA results classify the parameters into groups (components) and identify the variables with the most significant influence on the illite wt.%. (Table 3). The extracted components show that illite is highly affected by component 1, which contains (B.D., SGR, GR, and U), followed by factor 3, which has (Resist.), while the weakest influence was by factor 2, which contains (Depth, K, and CGR) (Table 3). These results are in line with the Pearson correlation coefficient. Based on the statistical analysis outcomes and the graphical depiction of the log responses in relation to the illite wt.% (Figure 5), the inputs for the ANN comprised the log responses of K, Resist., Depth, CGR, GR, B.D., SGR, and U. These variables correspond to the depth at which the XRD samples were collected. The sole output of the ANN was the illite wt.%.

5.2. Model Training and Testing

The parameters influencing illite wt.% as determined by the Pearson correlation coefficient and the PCA were embedded in an FF-ANN consisting of an input layer, two hidden layers, and an output layer (Figure 6). The training procedure was repeated using hidden layer neuron combinations from 5 to 50 with a step value of 5 to find the optimal model for the target parameters. The optimum results were recorded and are listed in Table 4.

The overall performance of selected A.I. algorithms for FF-ANNs using the B.R. method with 30 and 15 neurons in the hidden layers, respectively, generates an R² value of 0.92 for the illite wt.% predictors (Table 4). For large models with sufficient representational capacity to overfit the task, it is frequently observed that the training error decreases consistently over time while the validation error begins to rise after a certain point. A popular regularization strategy known as early stopping can be employed to address this issue. In this strategy, the optimal parameter setting yielding the smallest validation error is reserved. Once training is terminated with the smallest training error, the model associated with the optimal parameter setting (smallest validation error), and not the most recent model, is selected.

As a result, the network in this study was calibrated by being trained on a separate test set. When calibrated, a network can deliver correct results once applied to new data since it has been tuned to its optimal parameters for the test set.

The outcomes were quite promising after training to find the most optimal model for our goals. We aimed for a high R² and a low mean square (MSE) error between the predicted and the desired values. A scatter plot for the illite wt.% was created using an R² value of 0.92 from the best-fit model extracted during training (Figure 7).

The neural network was trained using available data and subsequently tested against data from three wells distinct from those used in training and initial testing to determine the illite wt.%. The network’s performance was evaluated based on its ability to accurately predict the illite wt.% for these unseen wells. The results demonstrated a favorable average coefficient of determination R² of 0.885 across the three evaluated wells. This indicates a strong correlation between the predicted and actual illite weight percentages, highlighting the effectiveness of the neural network model in accurately estimating this critical parameter (Figure 8).

Using the petrophysical properties of conventional well logs and XRD data, this study provided optimistic predictions about the illite wt.% using a four-layer FFNN backpropagation algorithm and tangent sigmoidal hyperbolic transfer function. After applying the model to wells not part of the training and testing processes, the findings showed that it was excellent and dependable at predicting the value of illite wt.%. The PCA application contributed to the accuracy of the results by efficiently factorizing the input data to zero in on the most important contributors to the output and provide a high degree of predictability in the matches. Artificial intelligence approaches, including petrophysical logs, proved useful for estimating illite wt.%. Our findings prove the usefulness, speed, and low cost of employing ANNs to estimate illite wt.% in the absence of core samples. The model’s high level of predictability can be used for a wide range of Montney plays and even other plays.

6. Conclusions

This study integrates the petrophysical properties of conventional well logs and XRD results to predict the illite wt.% using an FF-ANN with a backpropagation algorithm and tangent sigmoid hyperbolic transfer function. Additionally, increasing the conventional log parameter inputs significantly improves the performance of the considered algorithm, which is also evident from correlation values; the results showed that the ability of this method to predict the values of illite wt.% was excellent and reliable after applying the model to wells that were outside the training and testing processes. The application of PCA supported the reliability of the results. The following are some of the conclusions that are derived from this study:

(1): PCA is an excellent method for reducing the number of input parameters since it focuses on the parameters that mainly influence the output and provides a high level of matching predictivity.
(2): The utilization of artificial intelligence methods, specifically FF-ANNs, in calculating illite wt.%, along with integrating petrophysical log data, proved highly advantageous. This approach yielded results that exhibited a remarkable level of accuracy, as seen by the excellent match rate observed during both the training and testing stages, with an R² value of 92%. This strategy conferred a notable advantage.
(3): The tests conducted on a data set kept hidden from the testers for three wells demonstrate that the developed model had an exceptional capacity to forecast, giving R² = 88.5% for illite wt.% in test wells. These findings demonstrate the exceptional efficacy of the designed model.
(4): The findings indicate that ANNs have the potential to be a practical, speedy, and low-cost method for calculating the illite weight percentage in the absence of core samples.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app14010318/s1, Table S1: Weights and biases of the optimized ANN model, between the input layer, the first hidden layer; Table S2: Weights and biases of the optimized ANN model, between the second hidden and output layers (first part from W2-1 to W2-15, second part from W2-16 to W2-30).

Author Contributions

Methodology, A.B.; Software, A.B.; Validation, N.S.Z.A.; Resources, N.S.Z.A.; Data curation, A.B.; Writing–original draft, A.B.; Writing–review & editing, N.S.Z.A.; Funding acquisition, N.S.Z.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Yayasan Universiti Teknologi PETRONAS grant number YUTP-FRG 015LC0-433.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wüst, R.A.; Nassichuk, B.R.; Bustin, R.M. 9 Porosity Characterization of Various Organic-Rich Shales from the Western Canadian Sedimentary Basin, Alberta and British Columbia, Canadal; AAPG: Tulsa, OK, USA, 2013. [Google Scholar]
Alexander, T.; Baihly, J.; Boyer, C.; Clark, B.; Waters, G.; Jochen, V.; Le Calvez, J.; Lewis, R.; Miller, C.K.; Thaeler, J. Shale gas revolution. Oilfield Rev. 2011, 23, 40–55. [Google Scholar]
Luffel, D.L.; Herrington, K.L.; Harrison, C.W. Fibrous illite controls productivity in frontier gas sandstones, Moxa Arch, Wyoming. SPE Form. Eval. 1992, 7, 345–351. [Google Scholar] [CrossRef]
Kennedy, R.L.; Knecht, W.N.; Georgi, D.T. Comparisons and contrasts of shale gas and tight gas developments, North American experience and trends. In Proceedings of the SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition, Dammam, Saudi Arabia, 24–27 April 2017; p. SPE–160855-MS. [Google Scholar]
Grim, R.E.; Bray, R.H.; Bradley, W.F. The mica in argillaceous sediments. Am. Mineral. J. Earth Planet. Mater. 1937, 22, 813–829. [Google Scholar]
Srodon, J. Illite, In Micas. Rev. Mineral. 1984, 13, 495–539. [Google Scholar]
Jackson, J.A. Glossary of Geology; American Geological Institute: Alexandria, Egypt, 1997; 769p. [Google Scholar]
Brindley, G.; Brown, G. Quantitative X-ray mineral analysis of clays. Cryst. Struct. Clay Miner. Their X-ray Identif. 1980, 5, 411–438. [Google Scholar]
Newman, A.C. Chemistry of Clays and Clay Minerals; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
Jeans, C.; Moore, D.M.; Reynolds, R.C., Jr. X-ray Diffraction and the Identification and Analysis of Clay Minerals. Geol. Mag. 1998, 135, 819–842. [Google Scholar] [CrossRef]
Jeans, C.; Eslinger, E.; Pevear, D. Clay Minerals for Petroleum Geologists and Engineers; SEPM Short Course Notes no. 22; Society of Economic Paleontologists and Mineralogists: Tulsa, OK, USA, 1985; ISBN 0 918985 000. [Google Scholar]
Hower, J. Shale Diagenesis. Clays and the Resource Geologist; Cambridge University Press: Cambridge, UK, 1981. [Google Scholar]
Nadeau, P. The physical dimensions of fundamental clay particles. Clay Miner. 1985, 20, 499–514. [Google Scholar] [CrossRef]
Pevear, D.R. Illite and hydrocarbon exploration. Proc. Natl. Acad. Sci. USA 1999, 96, 3440–3446. [Google Scholar] [CrossRef]
Yin, Q.; Yang, J.; Hou, X.; Tyagi, M.; Zhou, X.; Cao, B.; Sun, T.; Li, L.; Xu, D. Drilling performance improvement in offshore batch wells based on rig state classification using machine learning. J. Pet. Sci. Eng. 2020, 192, 107306. [Google Scholar] [CrossRef]
Mustafa, A.; Tariq, Z.; Mahmoud, M.; Radwan, A.E.; Abdulraheem, A.; Abouelresh, M.O. Data-driven machine learning approach to predict mineralogy of organic-rich shales: An example from Qusaiba Shale, Rub’al Khali Basin, Saudi Arabia. Mar. Pet. Geol. 2022, 137, 105495. [Google Scholar] [CrossRef]
Kim, D.; Choi, J.; Kim, D.; Byun, J. Predicting mineralogy by integrating core and well log data using a deep neural network. J. Pet. Sci. Eng. 2020, 195, 107838. [Google Scholar] [CrossRef]
Park, S.Y.; Son, B.-K.; Choi, J.; Jin, H.; Lee, K. Application of machine learning to quantification of mineral composition on gas hydrate-bearing sediments, Ulleung Basin, Korea. J. Pet. Sci. Eng. 2022, 209, 109840. [Google Scholar] [CrossRef]
Huang, Z.; Williamson, M.A. Artificial neural network modelling as an aid to source rock characterization. Mar. Pet. Geol. 1996, 13, 277–290. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice-Hall, Inc.: Hoboken, NJ, USA, 2007. [Google Scholar]
Cranganu, C.; Luchian, H.; Breaban, M.E. Artificial Intelligent Approaches in Petroleum Geosciences; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Mohaghegh, S. Virtual-intelligence applications in petroleum engineering: Part 1—Artificial neural networks. J. Pet. Technol. 2000, 52, 64–73. [Google Scholar] [CrossRef]
Vaisblat, N.; Harris, N.B.; Bish, D.L. Origin and Evolution of Clay Mineralogy in the Montney Formation; University of Calgary: Calgary, AB, Canada, 2019. [Google Scholar]
Vaisblat, N.; Harris, N.; DeBhur, C.; Euzen, T.; Gasparrini, M.; Crombez, V.; Rohais, S.; Krause, F.; Ayranci, K. Diagenetic model for the deep Montney Formation, northeastern British Columbia. Geosci. BC Rep. 2017, 1, 37–48. [Google Scholar]
Mahmoud, A.A.A.; Elkatatny, S.; Mahmoud, M.; Abouelresh, M.; Abdulraheem, A.; Ali, A. Determination of the total organic carbon (TOC) based on conventional well logs using artificial neural network. Int. J. Coal Geol. 2017, 179, 72–80. [Google Scholar] [CrossRef]
Ouadfeul, S.-A.; Doghmane, M.Z.; Aliouane, L. Wellbore Stability in Shale Gas Reservoirs with a Case Study from the Barnett Shale. In Unconventional Hydrocarbon Resources: Prediction and Modeling Using Artificial Intelligence Approaches; Wiley Online Library: Hoboken, NJ, USA, 2023; p. 21. [Google Scholar]
Luger, G.F. Artificial Intelligence: Structures and Strategies for Complex Problem Solving; Pearson Education: London, UK, 2005. [Google Scholar]
Hoskin, T. Parametric and Nonparametric: Demystifying the Terms; InMayo Clinic: Rochester, MN, USA, 2012; pp. 1–5. [Google Scholar]
Hush, D.R.; Horne, B.G. Progress in supervised neural networks. IEEE Signal Process. Mag. 1993, 10, 8–39. [Google Scholar] [CrossRef]
Bowden, G.J.; Maier, H.R.; Dandy, G.C. Optimal division of data for neural network models in water resources applications. Water Resour. Res. 2002, 38, 2-1–2-11. [Google Scholar] [CrossRef]
Shahin, M.A.; Maier, H.R.; Jaksa, M.B. Predicting settlement of shallow foundations using neural networks. J. Geotech. Geoenviron. Eng. 2002, 128, 785–793. [Google Scholar] [CrossRef]
Goda, H.M.; Maier, H.; Behrenbruch, P. The development of an optimal artificial neural network model for estimating initial water saturation-Australian reservoir. In Proceedings of the SPE Asia Pacific Oil and Gas Conference and Exhibition, Jakarta, Indonesia, 5–7 April 2005. [Google Scholar]
Al-Bulushi, N. Predicting Reservoir Properties Using Artificial Neural Networks (ANNs); Imperial College London: London, UK, 2008. [Google Scholar]
Guillod, T.; Papamanolis, P.; Kolar, J.W. Artificial neural network (ANN) based fast and accurate inductor modeling and design. IEEE Open J. Power Electron. 2020, 1, 284–299. [Google Scholar] [CrossRef]
Fertl, W.H. Gamma ray spectral data assists in complex formation evaluation. Log Anal. 1979, 20. [Google Scholar]
Da Fonseca, J.; Gnoatto, A.; Grasselli, M. A flexible matrix Libor model with smiles. J. Econ. Dyn. Control 2013, 37, 774–793. [Google Scholar] [CrossRef]
Landau, S. A Handbook of Statistical Analyses Using SPSS; CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
Fausett, L.; Elwasif, W. Predicting performance from test scores using backpropagation and counterpropagation. In Proceedings of the 1994 IEEE International Conference on Neural Networks (ICNN’94), Orlando, FL, USA, 28 June–2 July 1994; pp. 3398–3402. [Google Scholar]
Masood, I.; Hassan, A. Issues in development of artificial neural network-based control chart pattern recognition schemes. Eur. J. Sci. Res. 2010, 39, 336–355. [Google Scholar]
Ataie-Ashtiani, B.; Hassanizadeh, S.M.; Oung, O.; Weststrate, F.; Bezuijen, A. Numerical modelling of two-phase flow in a geocentrifuge. Environ. Model. Softw. 2003, 18, 231–241. [Google Scholar] [CrossRef]
Lachtermacher, G.; Fuller, J.D. Back propagation in time-series forecasting. J. Forecast. 1995, 14, 381–393. [Google Scholar] [CrossRef]
Assidjo, E.; Yao, B.; Kisselmina, K.; Amané, D. Modeling of an industrial drying process by artificial neural networks. Braz. J. Chem. Eng. 2008, 25, 515–522. [Google Scholar] [CrossRef]
Han, J.; Kamber, M.; Pei, J. Data mining concepts and techniques third edition. Morgan Kaufmann Ser. Data Manag. Syst. 2011, 5, 83–124. [Google Scholar]
Rumelhart, D.; Hinton, G.; Williams, R. Learning Internal Representations by Error Propagation, Parallel Distributed Processing; Foundations; MIT Press: Cambridge, MA, USA, 1986; Volume 1. [Google Scholar]
Jiang, Y.; Zur, R.M.; Pesce, L.L.; Drukker, K. A study of the effect of noise injection on the training of artificial neural networks. In Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 14–19 June 2009; pp. 1428–1432. [Google Scholar]
Okut, H. Bayesian regularized neural networks for small n big p data. Artif. Neural Netw. Models Appl. 2016, 28–48. [Google Scholar]
Hagan, M.T.; Demuth, H.B.; Beale, M. Neural Network Design; PWS Publishing Co.: Boston, MA, USA, 1997. [Google Scholar]
Barham, A.; Ismail, M.S.; Hermana, M.; Padmanabhan, E.; Baashar, Y.; Sabir, O. Predicting the maturity and organic richness using artificial neural networks (ANNs): A case study of Montney Formation, NE British Columbia, Canada. Alex. Eng. J. 2021, 60, 3253–3264. [Google Scholar] [CrossRef]
Diamantaras, K.I.; Kung, S.Y. Principal Component Neural Networks: Theory and Applications; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1996. [Google Scholar]

Figure 1. Two hidden layers in an MLP network that display stacked weight vectors.

Figure 2. Flowchart illustrating the neural network preparation processes.

Figure 3. Map showing the studied wells’ locations in this study (numbers represent the W.A. #).

Figure 5. Petrophysical logs and the relationship with the illite wt.%.

Figure 6. The MLP ANN structure shows the input, hidden, and output layers with sigmoid-sigmoid transform functions and the neurons in the hidden layers.

Figure 7. Linear regression graph shows the R² relation between measured and predicted illite wt.%.

Figure 8. A regression model comparing observed and forecasted illite wt.% values in test wells.

Table 1. Descriptive analysis of the conventional log data in line with illite wt.%.

	Depth (m)	B.D. (g/cm³)	G.R. (API)	Resist. (Ohm.m)	DT-C (µs/m)	DT-SH (µs/m)	SGR (API)	U (ppm)	Th (ppm)	K (wt.%)	CGR (API)	Illite (wt.%)
Mean	2360.93	2.604	132.4	132.87	232.120	351.11	134.41	7.46	8.26	2.75	74.55	12.62
SD	369.83	0.05	34.90	118.36	45.16	57.31	28.72	3.64	2.25	0.73	21.95	5.35
Min	1827.3	2.46	63.04	12.78	173.66	196.73	70.86	1.83	2.67	0.87	25.88	1.00
Max	3085.0	2.74	309.72	498.39	367.83	485.87	271.22	23.53	14.01	4.36	145.00	28.00
Count	206	206	206	206	206	206	206	206	206	206	206	206

BD: bulk density. GR: gamma-ray counts. Resist.: deep resistivity. DT-C: compressional slowness. DT-SH: shear slowness. SGR: spectral gamma-ray. U: uranium concentration. Th: thorium concentration. K: potassium concentration. CGR: spectral gamma-ray without uranium.

Table 2. The correlation matrix shows the strength of the relationship between the log parameters and illite.

				Correlation Matrix
	Depth	B.D.	G.R.	Resist.	DT-C	DT-SH	SGR	U	Th	K	CGR	Illite
Depth	1
B.D.	−0.12	1
GR	0.12 **	−0.42 **	1
Resist.	0.08	0.01	0.31 **	1
DT-C	−0.45 **	−0.11	0.04	−0.22 **	1
DT-SH	0.19 **	−0.27 **	0.03	−0.18 **	−0.59 **	1
SGR	−0.03	−0.37 **	0.71 **	0.19 **	0.02	0.16 *	1
U	0.24 **	−0.41 **	0.77 **	0.22 **	−0.03	0.16 *	0.79 **	1
Th	0.05	−0.07	0.09	0.10	0.11	−0.1	0.17 *	0.06	1
K	0.25 **	−0.11	0.08	−0.02	0.001	−0.06	0.16 *	0.08	0.87 **	1
CGR	0.26 **	−0.09	0.11	0.04	0.07	−0.12	0.28 **	0.20 **	0.87 **	0.91 **	1
Illite	−0.16 *	0.26 **	−0.26 **	−0.13	−0.03	0.016	−0.29 **	−0.49 **	−0.07	−0.10	−0.21 **	1

** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed).

Table 3. The principal component analysis (PCA) performed on the log data reveals the separation of the three components.

Rotated Component Matrix ^a
	Component
	1	2	3
Depth	0.089	0.422	0.407
BD.	−0.662	−0.056	0.294
GR	0.837	0.002	0.272
Resist.	0.147	−0.056	0.87
SGR	0.857	0.092	0.045
U	0.9	0.07	0.208
K	0.057	0.955	−0.057
CGR	0.149	0.947	0.012
Illite	−0.507	−0.16	−0.124

Extraction method: principal component analysis. Rotation method: varimax with Kaiser normalization. ^a Rotation converged in 4 iterations.

Table 4. The optimum results from the different neuron combinations in the hidden layers.

Network	Neurons in Hidden Layer 1	Neurons in Hidden Layer 2	Training R²	Testing R²
1	10	5	0.85	0.77
2	15	5	0.81	0.86
3	25	10	0.86	0.78
4	25	15	0.79	0.63
5	30	15	0.92	0.885
6	35	20	0.83	0.76
7	45	25	0.84	0.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Barham, A.; Zainal Abidin, N.S. Machine Learning Approach to Predict the Illite Weight Percent of Unconventional Reservoirs from Well-Log Data: An Example from Montney Formation, NE British Columbia, Canada. Appl. Sci. 2024, 14, 318. https://doi.org/10.3390/app14010318

AMA Style

Barham A, Zainal Abidin NS. Machine Learning Approach to Predict the Illite Weight Percent of Unconventional Reservoirs from Well-Log Data: An Example from Montney Formation, NE British Columbia, Canada. Applied Sciences. 2024; 14(1):318. https://doi.org/10.3390/app14010318

Chicago/Turabian Style

Barham, Azzam, and Nor Syazwani Zainal Abidin. 2024. "Machine Learning Approach to Predict the Illite Weight Percent of Unconventional Reservoirs from Well-Log Data: An Example from Montney Formation, NE British Columbia, Canada" Applied Sciences 14, no. 1: 318. https://doi.org/10.3390/app14010318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Approach to Predict the Illite Weight Percent of Unconventional Reservoirs from Well-Log Data: An Example from Montney Formation, NE British Columbia, Canada

Abstract

1. Introduction

2. Geological Background

3. ANN Conception and Benefits

4. Methodology

4.1. Workflow for Network Design

4.2. Data Acquisition

4.3. Statistical Processing

4.4. Input Data Selection and Division

4.5. Data Scaling

4.6. ANN Architecture

5. Results and Discussion

5.1. Statistical Analysis

5.2. Model Training and Testing

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI