Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring

Silberstein, Jonathan; Wellbrook, Matthew; Hannigan, Michael

doi:10.3390/s24020519

Open AccessCommunication

Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring

by

Jonathan Silberstein

^1,*

,

Matthew Wellbrook

² and

Michael Hannigan

¹

Department of Mechanical Engineering, University of Colorado at Boulder, 1111 Engineering Drive, Boulder, CO 80309, USA

²

Urban Labs, University of Chicago, 33 North LaSalle Street Suite 1600, Chicago, IL 60602, USA

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(2), 519; https://doi.org/10.3390/s24020519

Submission received: 13 December 2023 / Revised: 5 January 2024 / Accepted: 10 January 2024 / Published: 14 January 2024

(This article belongs to the Section Environmental Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The use of low-cost sensors (LCSs) for the mobile monitoring of oil and gas emissions is an understudied application of low-cost air quality monitoring devices. To assess the efficacy of low-cost sensors as a screening tool for the mobile monitoring of fugitive methane emissions stemming from well sites in eastern Colorado, we colocated an array of low-cost sensors (XPOD) with a reference grade methane monitor (Aeris Ultra) on a mobile monitoring vehicle from 15 August through 27 September 2023. Fitting our low-cost sensor data with a bootstrap and aggregated random forest model, we found a high correlation between the reference and XPOD CH₄ concentrations (r = 0.719) and a low experimental error (RMSD = 0.3673 ppm). Other calibration models, including multilinear regression and artificial neural networks (ANN), were either unable to distinguish individual methane spikes above baseline or had a significantly elevated error (RMSD_ANN = 0.4669 ppm) when compared to the random forest model. Using out-of-bag predictor permutations, we found that sensors that showed the highest correlation with methane displayed the greatest significance in our random forest model. As we reduced the percentage of colocation data employed in the random forest model, errors did not significantly increase until a specific threshold (50 percent of total calibration data). Using a peakfinding algorithm, we found that our model was able to predict 80 percent of methane spikes above 2.5 ppm throughout the duration of our field campaign, with a false response rate of 35 percent.

Keywords:

low-cost sensors; oil and gas well emissions; mobile monitoring; model calibration; quantification; screening tools

Graphical Abstract

1. Introduction

Methane (CH₄) is a colorless, odorless, flammable gas that comprises the majority of natural gas. Methane is typically used for power generation and heating, as well as fuel for vehicles powered via natural gas. Methane emissions represent the second largest contributor to climate change, following carbon dioxide (CO₂) [1,2]. The annual mass of methane emissions is only three percent of that associated with CO₂; however, the 100-year global warming potential of methane is 28 times greater than that of CO₂, as the radiative forcing of methane is much greater than CO₂ on a per mass basis [3]. Methane concentrations have increased dramatically from preindustrial levels of approximately 690 parts per billion (ppb) to current concentrations of 1850 parts per billion [4,5]. Elevated atmospheric methane concentrations contribute significantly to climate change and tropospheric ozone. Understanding the scope of methane sources may help forestall further increases in atmospheric methane.

Fugitive methane emissions arise from a variety of sources, including livestock, landfills, coal mining, and oil and natural gas wells. Within the United States, it is estimated between 50 and 65 percent of total methane emissions originate from anthropogenic activities [5]. Of the anthropogenic fraction of US methane emissions, oil and natural gas wells account for approximately one-third of the total flux [6]. Well sites are not distributed evenly throughout the US and are often situated in socioeconomically disadvantaged areas. This raises environmental justice concerns [7,8]. Leaking wells release not only methane but also other harmful volatile organic compounds (VOCs), including benzene, toluene, ethylbenzene, and xylene (collectively referred to as BTEX). Exposure to BTEX has been linked to increased incidences of asthma, cancer, and other serious cardiovascular impacts [9,10]. As the production of natural gas has expanded in recent decades with an increase in the number of wells and the proliferation of new drilling methods, the corresponding share of methane emissions attributable to oil and gas infrastructure has risen accordingly [11,12].

Accurately assessing methane concentrations is a necessary step in determining the emission rate of various point sources (such as oil and gas infrastructure) [13]. Commercially available tools used for fence line methane quantification typically rely on optical measurements to accurately measure methane concentrations. However, these tools are not without drawbacks, as they require costly equipment that can require extensive training and expertise to run correctly. Alternatively, over the past decade, studies have pioneered the use of low-cost sensors (LCS) to accurately quantify methane concentrations in lab studies [14,15,16], stationary deployments in urban areas [17,18,19], and fence line monitoring [20]. Other studies have leveraged the combination of low-cost sensors with machine learning methods to specify and predict individual VOC concentrations in stationary laboratory and field experiments with high fidelity [21,22]. Similarly, researchers have demonstrated the ability of machine learning algorithms for the classification and quantification of individual VOCs [22,23] and CH₄ [24]. At a fraction of the cost of regulatory and research-grade monitors, LCS networks are used to supplement regulatory monitors by providing high-resolution spatiotemporal pollutant data and to inform local policies to best mitigate exposure.

Low-cost methane sensors typically fall within one of two classes: electrochemical (EC) and metal oxide (MOx) sensors. MOx sensors operate via an oxidation or reduction reaction when exposed to a target gas [25]. Target species interact with the sensor surface, resulting in the introduction or removal of free electrical charge in the semiconductive material [25]. This process changes the resistivity of the material, which is then measured and converted to a gas concentration. The resistivity of MOx sensors is highly dependent on both temperature and humidity, and sensor performance can degrade with time [26]. MOx sensors designed to react with CH₄ as the target gas typically employ SnO₂ deposited on an electrode and an Al₂O₃ substrate [27]. EC sensors operate via chemical reactions within a cell, which produce a current proportional to the concentration of the target gas [28]. However, these sensors are also susceptible to long-term degradation and effects from temperature and humidity changes [29,30]. The calibration of both MOx and EC LCS is often difficult due to a combination of sensor cross-sensitivity and performance degradation over time. Other commonly employed low-cost sensor technologies used for VOC assessment, such as photoionization detectors (PIDs), are not sensitive to changes in methane concentrations [31,32].

While the applications of low-cost sensor networks for oil and gas emission monitoring have been demonstrated in many studies [19,33,34,35], there is little research on the efficacy of these sensors for mobile applications. Tracking emissions over a large spatial boundary with transient sources requires an LCS platform capable of mobile measurements. In stationary studies, LCSs are deployed at either one or several fixed locations throughout the study duration, whereas for mobile studies, the sensor package is constantly moving. The use of LCS for mobile applications requires the consideration of several additional criteria. Sensors may be exposed to an emission plume for a period of multiple hours in stationary studies, whereas sensors in mobile monitoring applications may only be exposed to a source for several seconds. Sensors should be calibrated to collect data at a greater time resolution to account for the decreased time exposed to an emissions plume. Additionally, a subset of low-cost sensors interact with both VOCs and carbon monoxide, both of which are produced from vehicle exhaust. To avoid artifacts stemming from sensors responding to vehicle exhaust rather than methane emissions, sensors must be sited on the monitoring vehicle to minimize exposure to tailpipe emissions; while commercial monitors have been successfully employed to track these emissions, the cost of these instruments is often prohibitive [36]. A calibrated LCS device capable of accurately assessing CH₄ concentrations may act as a low-cost alternative to costly research-grade equipment for the identification of CH₄ spikes when coupled with an accurate peakfinding algorithm. To our knowledge, this study represents the first attempt to leverage LCS technology for mobile CH₄ tracking from oil and gas infrastructure.

2. Materials and Methods

LCS data were collected over 16 days from 15 August 2023 to 27 September 2023. Data were collected using the XPOD monitoring platform—a low-cost sensor package employing commercially available sensors designed for air quality monitoring developed by the Hannigan Lab. Raw XPOD data were collected during monitoring every two seconds. Sensors employed by the XPOD monitoring platform to assess methane concentrations are shown in Table 1. Sensors were selected for this study based on a combination of price, sensing technology, and widespread usage in the relevant literature. Though selected sensors have differing manufacturer-prescribed sensing ranges and target gases, all sensors (with the exception of Alphasense VOC-B4, which, to our knowledge, has not been previously characterized in the literature) have been extensively studied and have been shown to correlate well with CH₄ [16,33,37]. The XPOD monitor was placed within a University of Chicago emission monitoring vehicle (Figure S1). The average velocity of the vehicle near O&G facilities was several meters per second. All data collected from the XPOD and Aeris over the duration of this study were mobile data. The inlet of the XPOD was connected to the roof of the monitoring vehicle via 8 feet of inert Tygon tubing. Inlet air was pumped into the XPOD via a micro pump (Sensidyne) calibrated to a flow rate of 2.5 L per minute. Reference CH4 measurements were provided via a research-grade Aeris Ultra gas analyzer (Aeris Ultra, Project Canary, Denver, CO, USA). The inlet of the Aeris gas analyzer was placed adjacent to the XPOD inlet on the roof of the monitoring vehicle to minimize differences in the gas composition during deployment [38] (Figure S2). The response lags for pumped gas to enter both instruments were equal for the XPOD and Aeris Ultra.

Sampling occurred in the Julesburg Basin, a region comprising the area east of the Rocky Mountains in Colorado and Wyoming that extends to the western portion of Kansas and Nebraska. The Julesburg Basin produces both oil and natural gas from a combination of sand and shale formations [39]. Large-scale commercial oil and gas extraction within the Julesburg Basin have occurred since the early 1950s, resulting in a large number of legacy wells [39,40]. With the development of new extraction techniques, including the combination of horizontal drilling and hydraulic fracturing, new wells have continued to proliferate within the Julesburg Basin [41] (Figures S3 and S4). Methane leaks from O&G infrastructure are treated as a point source, and are often analyzed using a Gaussian plume dispersion model [42]. Many of these wells are in close proximity to large population centers within this region, including the cities of Greeley, Cheyenne, and Denver. A map of the Julesburg Basin and the spatial extent of our field monitoring are shown below in Figure 1.

XPOD CH₄ concentrations were calculated using raw signal from sensors displayed in Table 1, as well as temperature and humidity. Reference and XPOD data were time-averaged in 15 s intervals using mean values over each period. To reconstruct CH₄ concentrations from model variables, we applied multilinear regressions, random forest models (RF), and artificial neural networks (ANN), trained using Aeris reference measurements. We assessed the performance of these models using 2-fold cross-validation. Training and evaluation datasets for RF and ANN models were fit according to the methodology outlined in [43,44], with 80 percent of data used for training and 20 percent used for testing. Prior to fitting, XPOD sensor warmup periods were removed from deployment data, and the distribution of reference CH₄ data was cleaned and then resampled using five concentration bins according to the methodology outlined in a study by Furuta et al. [37]. Following data binning, the training dataset comprised 18 h of data (corresponding to approximately 4300 data points), and the evaluation dataset comprised 4.5 h of data (corresponding to approximately 1100 data points). Resampling results in an improved balance among the number of experimental observations at different CH₄ concentrations, at the expense of reducing the size of the overall dataset (Figure S5). The original and resampled reference CH₄ distributions, as well as model fitting parameters on the original data distribution, are displayed in Figure S5 and Tables S1–S10.

2.1. Multilinear Regression Model

Multilinear regressions between reference Aeris CH₄ concentrations and XPOD sensor signals, as well as temperature and humidity, are the simplest models to employ, as well as the easiest to interpret. Other authors have used these models during stationary deployments to accurately assess CH₄ concentrations via LCS [17,41]. We produced multilinear regressions between reference CH₄ from the Aeris Ultra monitor and the XPOD, calculated as

\begin{matrix} {\hat{y}}_{C H_{4}} (S_{2600}, S_{2602}, S_{2611 C 00}, S_{2611 E 00}, S_{A l p h a s e n s e V O C}, S_{M Q 4}, T_{a i r}, R H_{a i r}) = \\ α_{0} + α_{1} S_{2600} + α_{2} S_{2602} + α_{3} S_{2611 C 00} + α_{4} S_{2611 E 00} + α_{5} S_{A l p h a s e n s e V O C} + α_{6} S_{M Q 4} + α_{7} T_{a i r} + α_{8} R H_{a i r} \end{matrix}

(1)

where

{\hat{y}}_{C H_{4}}

is the predicted CH₄ concentration (in ppm),

S_{x x}

represents raw sensor outputs for respective sensors,

T_{a i r}

is the temperature of the air (in K), and

R H_{a i r}

is the relative humidity of the air (in %).

2.2. Artificial Neural Networks

Recent studies of stationary low-cost sensor networks have shown that artificial neural network machine learning models are able to accurately translate sensor voltages to CH₄ concentrations [19]. ANNs are composed of single units (neurons) ordered in a connected layer, with weights and an activation function applied to each neuron. Each layer of the ANN is connected to units in the previous layer. In our ANN, information is propagated forward through the network from the inputs, through hidden layers and bias functions to the output. For this model, the hyperbolic tangent function was chosen as the activation function. Neuron and bias weights in the ANN network were assigned randomly and iteratively adjusted to minimize a predetermined cost function. The number of hidden layer neurons were manually tuned to achieve optimal fitting performance. For our ANN, we employed a Bayesian regularization training function, which applies an additional term to the cost function that penalizes the network for increased complexity in order to help prevent overfitting. This regularization algorithm has been previously shown to perform well for regression applications independent of network architecture [19,45]. Other commonly used regularization functions, including Levenberg–Marquardt regularization and gradient decent regularization, displayed poorer fits than Bayesian regularization (Table S10). More complex ANN architectures were not employed in this study, as more intricate ANN designs with additional hidden layers and neurons are likely to result in overfitting in smaller datasets [46]. A visualization of the ANN architecture is shown in Figure S6, and additional information regarding model hyperparameters and settings is included in Table S11.

2.3. Random Forest Models

Random forests are a general classification of machine learning ensemble models consisting of several decision trees used to fit complicated data [47]. Random forests operate by creating an ensemble of decision trees fit on a training dataset, constructed from a random subset of predictor variables. Fitting parameters, including the number of leaves, the number of observed predictors included in the model, and sampling with replacement, were determined by minimizing the fitting error on testing data (Tables S12–S14). Accordingly, the minimum leaf size in our random forest was set to five, and all eight parameters were sampled at each node. Data were sampled with replacement, and the prediction was generated by averaging the outputs of all trees. For this analysis, bootstrap aggregation (bagging) was employed for RF models due to the low dimensionality of our dataset.

2.4. Model Performance Evaluation

Optimal model parameters were selected by minimizing the root mean squared deviation (RMSD) between the reference and experimental measurements [48,49]. The RMSD consists of the sum of squared bias (SB), the difference in magnitude fluctuation (SDSD), and the lack of positive correlation multiplied by the standard deviation (LCS):

S B = {(\bar{y_{C H_{4}}} - \bar{{\hat{y}}_{C H_{4}}})}^{2}

(2)

S D S D = {(σ_{r e f} - σ_{m o d e l})}^{2}

(3)

L C S = 2 σ_{r e f} σ_{m o d e l} (1 - ρ)

(4)

R M S D = \sqrt{(S B + S D S D + L C S)}

(5)

where

\bar{y_{C H_{4}}}

is the mean of the reference data,

\bar{{\hat{y}}_{C H_{4}}}

is the mean of the model-predicted data,

σ_{r e f}

is the standard deviation of the reference data,

σ_{m o d e l}

is the standard deviation of the model-predicted data, and

ρ

is the Pearson correlation coefficient between the model-predicted and reference data.

3. Results and Discussion

3.1. Calibration and Model Parameters

In developing a mobile CH₄ model, we found a large variation in the efficacy of specific sensors in quantifying CH₄. Alphasense’s electrochemical VOC sensor and MOx sensors designed with CH₄ as a target gas displayed greater correlation with reference CH₄ than general VOC MOx sensors (Figure 2). Relative humidity and temperature displayed significant correlations with reference CH₄ concentrations over the duration of our study, which may be attributed to variable local meteorological conditions. All variables shown in Figure 2 were employed in calibration models during XPOD deployment to assess CH₄ concentrations.

MLR and NN models were able to quantify longer-term changes in CH₄ baseline but were unable to process rapid changes in CH₄ from fugitive O&G leaks (Figure 3). RF models were able to quantify both short-term and longer-term fluctuations in CH₄ signals (Figure 3). All three models displayed lower signal variability than the reference data at high CH₄ concentrations due to the large contribution of baseline data during deployment. We ran each model on training data 100 times to minimize stochastic variation between runs of the same model, and selected the best-performing models within each model class for further analysis. Pre-binned data displayed no variation in RMSD from run to run as model parameters were fit to the same sample dataset. Additional information regarding model parameters and statistics can be found in Tables S1–S14.

Prior to binning, model fits displayed high errors, as the RMSD for pre-binned data (

R M S D_{M L R}

= 0.4189,

R M S D_{A N N}

= 0.6917, and

R M S D_{R F}

= 0.5023) was comparable to the variation between data points. The distribution of the reference methane concentrations (Figure S5) is heavily weighted toward baseline concentrations, as the majority of measurements occur at ambient conditions rather than an even distribution across the measured concentration spectrum. Applying data binning to reduce the inequality across the measured concentration gradient dramatically reduced fitting errors, as our applied calibration models provided a greater relative composition of higher concentration methane data.

Following data binning analysis of RMSD on testing data, it was shown that for NN models, a model consisting of a single hidden layer and 10 neurons minimized errors (

μ_{R M S D} = 0.4669

ppm and

σ_{R M S D} = 0.0737

ppm). NN models displayed high variability as the number of neurons changed, indicating a sensitivity to tuning parameters. NN model variability may be attributed to the small sample size of the dataset and the select range of variables to alter. RMSDs for MLR models ((

μ_{R M S D}

= 0.3652 ppm and

σ_{R M S D}

= 0.0012 ppm) were lower than machine learning model configurations, but they displayed the lowest sensitivity to short-term variation in CH₄, making these models poorly suited for mobile monitoring where CH₄ spikes may last only several seconds. Larger NN models consisting of additional neurons better fit the training data, but they resulted in nonphysical interpretations of testing data due to overfitting, thus resulting in greater RMSDs. RF models most accurately assessed variation in short-term CH₄ spikes and captured the overall trends in baseline variation. The RMSD on testing data showed a lower error for all RF models than any of the assessed NN models, and a similar magnitude of error to that of MLR models. RF models displayed significantly lower variability as the model inputs (tree number) were changed when compared to NNs. We attribute this diminished variability to the ensemble bagging process employed by RF models, which aggregates predictions from multiple trees to inform the final model prediction, thus reducing the weight of predictions from any single tree. Furthermore, the presence of outlier CH₄ spikes and the high correlation between many of our sensors are well suited for random forest regression [47]. The optimal RF configuration consists of a forest comprising 100 trees (

μ_{R M S D}

= 0.3674 ppm and

σ_{R M S D} = 0.0182

ppm). This model was analyzed in further detail in the subsequent sections.

3.2. Evaluating the Impact of Additional RF Training Data

For regression applications, machine learning model performance often varies non-linearly with the amount of training data employed [50]; while reducing the total amount of training data in machine learning models can lead to overfitting, gains from including additional training data must be balanced by the cost of data collection [51]. We investigated the error between reference and model data for different percentages of total training data (Figure 4). We reduced data in all bins by percentages varying between 5 and 95 percent and ran 100 RF models on each reduced set of training data. Each 5 percent of binned data represents approximately 4 h of sampling before any preprocessing functions are applied. Between 5 and 50 percent of the total training data, the RMSD for testing data decreases, indicating that additional data points reduce error and further improve the RF model (Figure 4). A t-test analysis of adjacent data percentages (Table S8) shows that the mean RMSDs are more likely to be statistically distinct for different amounts of data between 5 and 50 percent (Z1 in Figure 4) than between 50 and 100 percent (Z2 in Figure 4). With larger percentages of training data, additional data points no longer improve the RMSD, indicating that a precision XPOD sensor array may limit the predictive power of the RF model when more data points are used.

3.3. Assessing RF Model Sensor Performance

Due to the black-box nature of machine learning regression algorithms, it is often difficult to interpret which variables are contributing to model performance [52]. To qualitatively assess which variables are most relevant for RF model performance, we analyzed the distribution of predictor importance estimates for all model variables by running our chosen RF model 100 times, removing a specific sensor variable for each set of runs (Figure 5). We assessed the distribution of error values for our subset model and then subtracted the mean baseline error for the full RF model. Variables that, when removed, resulted in significant increases in model error have greater predictive importance than those that have a minimal impact. The sensor variables with the highest experimental correlations (Figure 2) with CH₄ (Fig 2600, Fig2611-E00, Alphasense) have the greatest importance in our RF model, indicating sensor variables that are highly correlated with methane have greater predictive power. The removal of the Fig-2600 and Fig2611-E00 sensors dramatically increased model error by 25 and 30 percent, respectively. The MQ4 MOx sensor, which displayed a moderate correlation with the reference CH₄, had a lower predictive importance than other sensors with similar correlation coefficients (2611-E00 and Alphasense). We hypothesize that the lower predictive importance of the MQ4 and Fig-2611 may be attributed to the high correlation between the MQ4 and Fig2611-E00 sensors (r = 0.93) and the Fig2611 and Fig2611-E00 sensors (r = 0.84). Permutations to the MQ4 signal may have diminished the influence on the error metric, as this sensor may provide redundant data to the model, with the weight of the data accordingly reduced. RF models, excluding Fig-2602, which displayed a negative correlation with CH₄ over the course of the deployment, marginally reduced the RMSD when it was excluded from the RF model, indicating that this sensor may have contributed to overfitting during model calibration.

3.4. Utility of RF Model for CH₄ Peakfinding

We further investigated the capability of our chosen RF model to assess short-term spikes in CH₄ concentrations. CH₄ spikes were determined via a one-dimensional peakfinding algorithm, whereby a peak was defined as a sample greater than its two nearest neighbors. We included additional prominence and magnitude constraints in the peakfinding algorithm to assess only the largest methane spikes. Reference CH₄ spikes were defined as CH₄ concentrations greater than 2.5 ppm with >0.5 ppm prominence. Throughout the monitoring campaign, there were a total of 20 peaks that met these criteria (Figure 6). The CH₄ spike criteria for RF model data, which displayed lower variability at elevated CH₄ concentrations when compared to the reference data, were adjusted accordingly. RF CH₄ spikes were defined as points where model CH₄ concentrations were greater than 2.05 ppm and model prominence was >0.1 ppm. Using the calibrated RF model, there were 27 peaks that fit these criteria (Figure 6). Of the 20 peaks defined by the reference CH₄, 16 overlapped between reference and calibrated data, 3 peaks for reference CH₄ displayed an increased model CH₄ concentration below our target threshold, and 1 peak was missed by the RF model, indicating an accuracy of 80 percent for our RF model in determining CH₄ spikes. Of the seven extra peaks predicted by the RF model, all but one occurred in regions with elevated CH₄ baselines, as the RF model may have difficulty in predicting additional elevations to CH₄ when baseline concentrations are raised. Increasingly sophisticated peakfinding algorithms, which employ local concentration and prominence thresholds rather than global values, may display better predictive value for assessing CH₄ spikes.

4. Summary and Conclusions

Over a month-long deployment, we employed an array of LCS mounted on a mobile monitoring platform to reconstruct short- and longer-term fluctuations in CH₄ concentrations stemming from fugitive oil and gas emissions. For mobile monitoring, specific MOx sensors targeting CH₄ were able to quantify CH₄ variations more accurately than general gas phase VOC sensors. Employing preprocessing functions to equitably sample CH₄ concentrations across the full range of concentration space drastically improved fitting performance. Testing a wide range of models to fit our deployment data, we found that an RF model outperformed both ANN and MLR. RFs were able to capture longer-term variation in CH₄ concentrations, as well as short-term spikes caused by fugitive emissions. Additionally, as the percentage of colocation data was reduced, the RF model performance did not significantly suffer until approximately half of the data were removed, indicating that, even in data-scarce environments, RF models may achieve high performance given a small parameter space from which to sample. Accordingly, even short-term field campaigns with LCS networks may be sufficient to achieve relatively high fidelity for CH₄ measurements, assuming that measurements within the concentration space are well distributed.

Using our model, we were able to achieve similar error metrics when compared to other stationary LCS CH₄ quantification studies [17]. Given the transient sampling environment in which our study occurred, our model may be less likely than stationary studies to be fit to specific local behavior and may be more generalizable, thus minimizing the risk of overfitting. However, in our study, we found the distribution of CH₄ measurements to be heavily skewed toward baseline values, which may have caused our model to underpredict concentrations of CH₄ spikes. Finally, the cost associated with generating data for our study was much greater than stationary monitoring, as the XPOD required active transportation to different field sites. In the future, extended field campaigns will need to be conducted to better understand and model longer-term seasonal fluctuations in CH₄.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/s24020519/s1, Figure S1: XPOD low-cost methane sensor configuration; Figure S2: Inlet setup atop monitoring vehicle; Figure S3: Oil and gas flaring facility on the Eastern plains of Colorado; Figure S4: Oil and gas facility on the Eastern plains of Colorado; Figure S5: Pre-and post-binning reference methane distribution; Figure S6: Sample ANN architecture employed in model fitting; Figure S7: Time series plot comparison during data collection on August 30th, 2023; Table S1: Aeris raw data; Table S2: XPod raw data; Table S3: No binning fits’ Table S4: MLR fits; Table S5: ANN fits; Table S6: RF fits; Table S7: Validation data post-binning; Table S8: RF percentages and Ttests; Table S9: MSD without parameters; Table S10: ANN regularizations; Table S11: ANN hyperparameters; Table S12: RF leaf size parameter fits; Table S13: RF num predictor fits; Table S14; RF sample WO replacement.

Author Contributions

Conceptualization, J.S. and M.W.; methodology, J.S. and M.W.; software, J.S.; validation, J.S.; formal analysis, J.S.; investigation, M.H.; resources, M.H.; data curation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, J.S., M.H. and M.W.; visualization, J.S.; supervision, M.H.; project administration, M.H.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed in this study are provided in the Supplementary Materials. Datasets include sensor values, reference Aeris values, and model fitting parameters.

Acknowledgments

Thank you to Emma Burke for her information regarding Aeris, Covert and the Urban Labs at the University of Chicago for providing the opportunity to conduct this project, Caroline Frischmon for her information regarding machine learning models, and the Colorado Department of Public Health and Environment for aiding in conceptualization and collaboration.

Conflicts of Interest

The authors declare no conflict of interest.

References

Balcombe, P.; Speirs, J.F.; Brandon, N.P.; Hawkes, A.D. Methane emissions: Choosing the right climate metric and time horizon. Environ. Sci. Process. Impacts 2018, 20, 1323–1339. [Google Scholar] [CrossRef] [PubMed]
Saunois, M.; Jackson, R.; Bousquet, P.; Poulter, B.; Canadell, J. The growing role of methane in anthropogenic climate change. Environ. Res. Lett. 2016, 11, 120207. [Google Scholar] [CrossRef]
Etminan, M.; Myhre, G.; Highwood, E.J.; Shine, K.P. Radiative forcing of carbon dioxide, methane, and nitrous oxide: A significant revision of the methane radiative forcing. Geophys. Res. Lett. 2016, 43, 12–614. [Google Scholar] [CrossRef]
Mitchell, L.E.; Brook, E.J.; Sowers, T.; McConnell, J.; Taylor, K. Multidecadal variability of atmospheric methane, 1000–1800 CE. J. Geophys. Res. Biogeosci. 2011, 116. [Google Scholar] [CrossRef]
Miller, S.M.; Wofsy, S.C.; Michalak, A.M.; Kort, E.A.; Andrews, A.E.; Biraud, S.C.; Dlugokencky, E.J.; Eluszkiewicz, J.; Fischer, M.L.; Janssens-Maenhout, G.; et al. Anthropogenic emissions of methane in the United States. Proc. Natl. Acad. Sci. USA 2013, 110, 20018–20022. [Google Scholar] [CrossRef]
Francoeur, C.B.; McDonald, B.C.; Gilman, J.B.; Zarzana, K.J.; Dix, B.; Brown, S.S.; de Gouw, J.A.; Frost, G.J.; Li, M.; McKeen, S.A.; et al. Quantifying methane and ozone precursor emissions from oil and gas production regions across the contiguous US. Environ. Sci. Technol. 2021, 55, 9129–9139. [Google Scholar] [CrossRef]
Lieberman-Cribbin, W.; Fang, X.; Morello-Frosch, R.; Gonzalez, D.J.; Hill, E.; Deziel, N.C.; Buonocore, J.J.; Casey, J.A. Multiple dimensions of environmental justice and oil and gas development in Pennsylvania. In Environmental Justice; Mary Ann Liebert, Inc.: Larchmont, NY, USA, 2022. [Google Scholar]
Johnston, J.E.; Chau, K.; Franklin, M.; Cushing, L. Environmental justice dimensions of oil and gas flaring in South Texas: Disproportionate exposure among Hispanic communities. Environ. Sci. Technol. 2020, 54, 6289–6298. [Google Scholar] [CrossRef]
Bolden, A.L.; Kwiatkowski, C.F.; Colborn, T. New look at BTEX: Are ambient levels a problem? Environ. Sci. Technol. 2015, 49, 5261–5276. [Google Scholar] [CrossRef]
Davidson, C.J.; Hannigan, J.H.; Bowen, S.E. Effects of inhaled combined Benzene, Toluene, Ethylbenzene, and Xylenes (BTEX): Toward an environmental exposure model. Environ. Toxicol. Pharmacol. 2021, 81, 103518. [Google Scholar] [CrossRef]
Williams, J.P.; Regehr, A.; Kang, M. Methane emissions from abandoned oil and gas wells in Canada and the United States. Environ. Sci. Technol. 2020, 55, 563–570. [Google Scholar] [CrossRef]
Stern, D.I.; Kaufmann, R.K. Estimates of global anthropogenic methane emissions 1860–1993. Chemosphere 1996, 33, 159–176. [Google Scholar] [CrossRef]
National Academies of Sciences, Engineering, and Medicine. Improving Characterization of Anthropogenic Methane Emissions in the United States; National Academies Press: Washington, DC, USA, 2018. [Google Scholar]
Furst, L.; Feliciano, M.; Frare, L.; Igrejas, G. A portable device for methane measurement using a low-cost semiconductor sensor: Development, calibration and environmental applications. Sensors 2021, 21, 7456. [Google Scholar] [CrossRef] [PubMed]
Aldhafeeri, T.; Tran, M.K.; Vrolyk, R.; Pope, M.; Fowler, M. A review of methane gas detection sensors: Recent developments and future perspectives. Inventions 2020, 5, 28. [Google Scholar] [CrossRef]
Nagahage, I.S.P.; Nagahage, E.A.A.D.; Fujino, T. Assessment of the applicability of a low-cost sensor–based methane monitoring system for continuous multi-channel sampling. Environ. Monit. Assess. 2021, 193, 509. [Google Scholar] [CrossRef] [PubMed]
Collier-Oxandale, A.; Casey, J.G.; Piedrahita, R.; Ortega, J.; Halliday, H.; Johnston, J.; Hannigan, M.P. Assessing a low-cost methane sensor quantification system for use in complex rural and urban environments. Atmos. Meas. Tech. 2018, 11, 3569–3594. [Google Scholar] [CrossRef]
Lin, J.J.; Buehler, C.; Datta, A.; Gentner, D.R.; Koehler, K.; Zamora, M.L. Laboratory and field evaluation of a low-cost methane sensor and key environmental factors for sensor calibration. Environ. Sci. Atmos. 2023, 3, 683–694. [Google Scholar] [CrossRef]
Casey, J.G.; Collier-Oxandale, A.; Hannigan, M. Performance of artificial neural networks and linear models to quantify 4 trace gas species in an oil and gas production region with low-cost sensors. Sens. Actuators B Chem. 2019, 283, 504–514. [Google Scholar] [CrossRef]
Riddick, S.N.; Riley, A.; Fancy, C.; Bell, C.S.; Duggan, A.; Bennett, K.E.; Zimmerle, D.J. A cautionary report of calculating methane emissions using low-cost fence-line sensors. Elementa 2022, 10, 00021. [Google Scholar] [CrossRef]
Okorn, K.; Hannigan, M. Applications and Limitations of Quantifying Speciated and Source-Apportioned VOCs with Metal Oxide Sensors. Atmosphere 2021, 12, 1383. [Google Scholar] [CrossRef]
Ma, D.; Gao, J.; Zhang, Z.; Zhao, H. Gas recognition method based on the deep learning model of sensor array response map. Sens. Actuators B Chem. 2021, 330, 129349. [Google Scholar] [CrossRef]
Tang, S.; Chen, W.; Jin, L.; Zhang, H.; Li, Y.; Zhou, Q.; Zen, W. SWCNTs-based MEMS gas sensor array and its pattern recognition based on deep belief networks of gases detection in oil-immersed transformers. Sens. Actuators B Chem. 2020, 312, 127998. [Google Scholar] [CrossRef]
Wu, X.; Zhao, Z.; Wang, L. Deep belief network based coal mine methane sensor data classification. J. Phys. Conf. Ser. 2019, 1302, 032013. [Google Scholar] [CrossRef]
Masson, N.; Piedrahita, R.; Hannigan, M. Approach for quantification of metal oxide type semiconductor gas sensors used for ambient air quality monitoring. Sens. Actuators B Chem. 2015, 208, 339–345. [Google Scholar] [CrossRef]
Abdullah, A.N.; Kamarudin, K.; Mamduh, S.M.; Adom, A.H.; Juffry, Z.H.M. Effect of environmental temperature and humidity on different metal oxide gas sensors at various gas concentration levels. IOP Conf. Ser. Mater. Sci. Eng. 2020, 864, 012152. [Google Scholar] [CrossRef]
Hong, T.; Culp, J.T.; Kim, K.J.; Devkota, J.; Sun, C.; Ohodnicki, P.R. State-of-the-art of methane sensing materials: A review and perspectives. TrAC Trends Anal. Chem. 2020, 125, 115820. [Google Scholar] [CrossRef]
Hagan, D.H.; Isaacman-VanWertz, G.; Franklin, J.P.; Wallace, L.M.; Kocar, B.D.; Heald, C.L.; Kroll, J.H. Calibration and assessment of electrochemical air quality sensors by co-location with regulatory-grade instruments. Atmos. Meas. Tech. 2018, 11, 315–328. [Google Scholar] [CrossRef]
Laref, R.; Losson, E.; Sava, A.; Siadat, M. Empiric unsupervised drifts correction method of electrochemical sensors for in field nitrogen dioxide monitoring. Sensors 2021, 21, 3581. [Google Scholar] [CrossRef]
Wei, P.; Ning, Z.; Ye, S.; Sun, L.; Yang, F.; Wong, K.C.; Westerdahl, D.; Louie, P.K. Impact analysis of temperature and humidity conditions on electrochemical sensor response in ambient air quality monitoring. Sensors 2018, 18, 59. [Google Scholar] [CrossRef]
Nyquist, J.E.; Wilson, D.L.; Norman, L.A.; Gammage, R.B. Decreased sensitivity of photoionization detector total organic vapor detectors in the presence of methane. Am. Ind. Hyg. Assoc. J. 1990, 51, 326–330. [Google Scholar] [CrossRef]
Rutolo, M.F.; Clarkson, J.P.; Harper, G.; Covington, J.A. The use of gas phase detection and monitoring of potato soft rot infection in store. Postharvest Biol. Technol. 2018, 145, 15–19. [Google Scholar] [CrossRef]
Okorn, K.; Jimenez, A.; Collier-Oxandale, A.; Johnston, J.; Hannigan, M. Characterizing methane and total non-methane hydrocarbon levels in Los Angeles communities with oil and gas facilities using air quality monitors. Sci. Total Environ. 2021, 777, 146194. [Google Scholar] [CrossRef] [PubMed]
Okorn, K.; Hannigan, M. Improving Air Pollutant Metal Oxide Sensor Quantification Practices through: An Exploration of Sensor Signal Normalization, Multi-Sensor and Universal Calibration Model Generation, and Physical Factors Such as Co-Location Duration and Sensor Age. Atmosphere 2021, 12, 645. [Google Scholar] [CrossRef]
Cho, Y.; Smits, K.M.; Riddick, S.N.; Zimmerle, D.J. Calibration and field deployment of low-cost sensor network to monitor underground pipeline leakage. Sens. Actuators B Chem. 2022, 355, 131276. [Google Scholar] [CrossRef]
Commane, R.; Hallward-Driemeier, A.; Murray, L.T. Intercomparison of commercial analyzers for atmospheric ethane and methane observations. Atmos. Meas. Tech. 2023, 16, 1431–1441. [Google Scholar] [CrossRef]
Furuta, D.; Sayahi, T.; Li, J.; Wilson, B.; Presto, A.A.; Li, J. Characterization of inexpensive metal oxide sensor performance for trace methane detection. Atmos. Meas. Tech. 2022, 15, 5117–5128. [Google Scholar] [CrossRef]
Khreis, H.; Johnson, J.; Jack, K.; Dadashova, B.; Park, E.S. Evaluating the performance of low-cost air quality monitors in Dallas, Texas. Int. J. Environ. Res. Public Health 2022, 19, 1647. [Google Scholar] [CrossRef] [PubMed]
Arps, J.J.; Roberts, T.G. Economics of drilling for Cretaceous oil on east flank of Denver-Julesburg basin. AAPG Bull. 1958, 42, 2549–2566. [Google Scholar]
Townsend-Small, A.; Ferrara, T.W.; Lyon, D.R.; Fries, A.E.; Lamb, B.K. Emissions of coalbed and natural gas methane from abandoned oil and gas wells in the United States. Geophys. Res. Lett. 2016, 43, 2283–2290. [Google Scholar] [CrossRef]
Casey, J.G.; Hannigan, M.P. Testing the performance of field calibration techniques for low-cost gas sensors in new deployment locations: Across a county line and across Colorado. Atmos. Meas. Tech. 2018, 11, 6351–6378. [Google Scholar] [CrossRef]
Riddick, S.N.; Cheptonui, F.; Yuan, K.; Mbua, M.; Day, R.; Vaughn, T.L.; Duggan, A.; Bennett, K.E.; Zimmerle, D.J. Estimating regional methane emission factors from energy and agricultural sector sources using a portable measurement system: Case study of the Denver–Julesburg Basin. Sensors 2022, 22, 7410. [Google Scholar] [CrossRef]
Liang, Y.C.; Maimury, Y.; Chen, A.H.L.; Juarez, J.R.C. Machine learning-based prediction of air quality. Appl. Sci. 2020, 10, 9151. [Google Scholar] [CrossRef]
Mahajan, S.; Kumar, P. Evaluation of low-cost sensors for quantitative personal exposure monitoring. Sustain. Cities Soc. 2020, 57, 102076. [Google Scholar] [CrossRef]
Burden, F.; Winkler, D. Bayesian regularization of neural networks. In Artificial Neural Networks: Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 23–42. [Google Scholar]
Sheela, K.G.; Deepa, S.N. Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013, 2013, 425740. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rivera Martinez, R.A.; Santaren, D.; Laurent, O.; Broquet, G.; Cropley, F.; Mallet, C.; Ramonet, M.; Shah, A.; Rivier, L.; Bouchet, C.; et al. Reconstruction of high-frequency methane atmospheric concentration peaks from measurements using metal oxide low-cost sensors. Atmos. Meas. Tech. 2023, 16, 2209–2235. [Google Scholar] [CrossRef]
Kobayashi, K.; Salam, M.U. Comparing simulated and measured values using mean squared deviation and its components. Agron. J. 2000, 92, 345–352. [Google Scholar] [CrossRef]
Rodríguez-Pérez, R.; Bajorath, J. Prediction of compound profiling matrices, part II: Relative performance of multitask deep learning and random forest classification on the basis of varying amounts of training data. ACS Omega 2018, 3, 12033–12040. [Google Scholar] [CrossRef]
Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
Zhao, Q.; Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. 2021, 39, 272–281. [Google Scholar] [CrossRef]

Figure 1. Map of study area. The pink-shaded area represents the Denver-Julesburg Basin, and the blue-shaded region represents the extent of our study.

Figure 2. Scatter plot matrix of the raw sensor signal. The bottom triangle displays pair plots between individual variables, blue plots along the diagonal display distributions of each variable, and the upper triangle displays Pearson correlation coefficients for each variable pair.

Figure 3. Model comparison over a single day of data acquisition. Individual models are displayed on the left-side plots, and the full Aeris reference dataset is displayed on the right-side time series. Additional model comparisons on August 30th are displayed in Figure S7.

Figure 4. Error comparison for various percentages of training data run on 100 RF models. Error bars represent 1

σ

of RMSD.

Figure 4. Error comparison for various percentages of training data run on 100 RF models. Error bars represent 1

σ

of RMSD.

Figure 5. RF model

Δ

RMSD (ppm) excluding specific sensor signal over 100 runs.

Δ

RMSD (ppm) is calculated as the difference in RMSD between the sensor-excluded model and the base RF model. Error bars represent 1

σ

of

Δ

RMSD.

Figure 5. RF model

Δ

RMSD (ppm) excluding specific sensor signal over 100 runs.

Δ

RMSD (ppm) is calculated as the difference in RMSD between the sensor-excluded model and the base RF model. Error bars represent 1

σ

of

Δ

RMSD.

Figure 6. CH₄ spike prediction using reference (orange) and calibrated model (blue) data.

Table 1. Sensors employed in mobile CH₄ monitoring platform.

Sensor	Manufacturer	Target Gas	Sensing Range	Aprox. Cost (USD 2023)	Technology
Figaro 2600	Figaro Engineering (Osaka, Japan)	General VOC/air pollutants	Hydrogen 1–30 ppm	10	MOx
Figaro 2602	Figaro Engineering	General VOC/air pollutants	Ethanol 1–30 ppm	10	MOx
Figaro 2611-C00	Figaro Engineering	Methane	10,000–250,000 ppm	10	MOx
Figaro 2611-E00	Figaro Engineering	Methane	10,000–250,000 ppm	15	MOx
Alphasense VOC	Alphasense (Great Notley, Braintree, Essex, United Kingdom)	General VOCs	Gas dependent	150	EC
MQ4	Henan Hanwei Electronics (Zhengzhou, China)	Methane	200–10,000 ppm	5	MOx

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Silberstein, J.; Wellbrook, M.; Hannigan, M. Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring. Sensors 2024, 24, 519. https://doi.org/10.3390/s24020519

AMA Style

Silberstein J, Wellbrook M, Hannigan M. Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring. Sensors. 2024; 24(2):519. https://doi.org/10.3390/s24020519

Chicago/Turabian Style

Silberstein, Jonathan, Matthew Wellbrook, and Michael Hannigan. 2024. "Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring" Sensors 24, no. 2: 519. https://doi.org/10.3390/s24020519

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring

Abstract

1. Introduction