Active Knowledge Extraction from Cyclic Voltammetry

Vaddi, Kiran; Wodo, Olga

doi:10.3390/en15134575

Open AccessArticle

Active Knowledge Extraction from Cyclic Voltammetry

by

Kiran Vaddi

and

Olga Wodo

^*

Materials Design and Innovation Department, University at Buffalo, Buffalo, NY 14260, USA

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(13), 4575; https://doi.org/10.3390/en15134575

Submission received: 6 May 2022 / Revised: 13 June 2022 / Accepted: 16 June 2022 / Published: 23 June 2022

(This article belongs to the Special Issue Advanced Materials and Devices for Energy Application)

Download

Browse Figures

Versions Notes

Abstract

:

Cyclic Voltammetry (CV) is an electro-chemical characterization technique used in an initial material screening for desired properties and to extract information about electro-chemical reactions. In some applications, to extract kinetic information of the associated reactions (e.g., rate constants and turn over frequencies), CV curve should have a specific shape (for example an S-shape). However, often the characterization settings to obtain such curve are not known a priori. In this paper, an active search framework is defined to accelerate identification of characterization settings that enable knowledge extraction from CV experiments. Towards this goal, a representation of CV responses is used in combination with Bayesian Model Selection (BMS) method to efficiently label the response to be either S-shape or not S-shape. Using an active search with BMS oracle, we report a linear target identification in a six-dimensional search space (comprised of thermodynamic, mass transfer, and solution variables as dimensions). Our framework has the potential to be a powerful virtual screening technique for molecular catalysts, bi-functional fuel cell catalysts, and other energy conversion and storage systems.

Keywords:

accelerated catalyst discovery; gaussian processes; bayesian model selection; active learning; cyclic voltammetry

1. Introduction

Cyclic Voltammetry (CV) is an electro-chemical characterization technique that measures current generated under a cyclic voltage load between an initial and final voltage varied at a given rate [1]. The measured current is a highly nonlinear response from various physical phenomenon such as mass transport, kinetics, adsorption, etc. In principle, it is possible to determine the properties associated with the underlying physical phenomenon. However, the property extraction is a non-trivial task. In a CV experiment, a steady state current is obtained when all reactions in the mechanism have the same apparent rate constants [2]. This is because the facile reactions in the sequence are held back from their maximum rates by the sluggish reaction(s) called rate determining step that also determines the magnitude of steady state current. Extracting rate constants of the rate determining step thus requires the CV curve to be in a S-shape [3,4] with a clear steady state current region resolved during measurement. Towards this goal, obtaining an S-shaped CV curve requires the experiment to be run with a set of conditions (e.g., temperature, substrate concentration, scan rate), amenable for S-shape CV curves which are unknown a priori. Moreover, choosing conditions where a given electrochemical system exhibits an S-shaped CV curve is dependent on an underlying system of electrochemical reaction(s) which is(are) also unknown for novel materials. In the absence of a known mechanism, an exhaustive search over all the possible tunable parameters is performed [5] to narrow down the region of interest. Such exhaustive strategy comes at a price of very high computational costs especially in a high-dimensional search spaces of multiple complex reaction mechanisms [6].

As an alternative approach, experts define a figure of merit (FOM) [4] (a performance measure) as a proxy signature of a physical phenomenon of interest. FOM extracted from CV can also be used in material discovery using data-driven methods. For example, in [7,8,9,10], different types of FOM have been used for catalyst discovery using data-driven methods. Given a FOM, the goal then is to find a material that produces a response with an FOM that is better than that of known materials. For instance, in case of a high-throughput exploration for a new catalyst, the overpotential is a common FOM [11] (or performance measure) used in the combinatorial searches [10]. The overpotential can be thought of as the voltage (beyond the thermodynamic requirement) required to produce a (pre-defined) target current. This FOM has clear utility to screen for well performing materials, but misses on the main advantage of CV, which is the capability to extract the kinetic information (such as rate constants [12], turn-over frequencies [13,14]).

Given the time and financial constraints, we propose to accelerate the process of extracting kinetic information from CV curves using the active learning technique [15,16,17]. Rather than relying on the selection of figure of merit, we build function space representations of our target (S-shaped) and non-target (everything else) CV responses and use Bayesian Model Selection (BMS) for automatic classification. We encode prior knowledge of target and non-target CV responses using the basis functions of the function space of a given Gaussian process (

GP

).

GP

have been previously used to infer the kinetic parameters [18,19] of a CV response by using a maximum likelihood estimate and

GP

regression. In another work [8,20], a Bayesian approach is used to search for an approximate rate constant when the reaction mechanism is known. In this work, however, we use

GP

as a data representation model to distinguish S-shaped CV curves from other types of continuous CV curves. Once a S-shaped CV curve is collected, the foot-of-the-wave analysis (FOWA) [12] can be used to extract the rate constant of a rate determining step. When combined together with FOWA, the proposed approach can be a robust technique that does not require any knowledge of the actual reaction mechanism.

In this work, we focus on S-shaped CV responses due to their utility for: (a) extracting kinetic information [3,4]–using the foot of the wave analysis [12] that can only be applied to a S-shaped CV curve; (b) screening for bi-functional catalysts—materials that produce CV curves similar to S-shape in two different voltage sweep ranges [21,22]. While the two applications are different, they can be approached under a common framework of active search in a combinatorial space, where we are interested in finding S-shaped CV curves within the combinatorial space.

The rest of the paper is organized as follows: (i) First, we introduce Bayesian active learning framework with a general probabilistic model. We establish a connection between collected data at observed locations with the oracle used to classify and update the decision model used for active learning. (ii) We introduce the Bayesian Model Selection (BMS) procedure that computes a classification preference for targets and non-targets based on collected data and set of parametric models. (iii) We then motivate a

GP

model as a function space representation for collected data and use it as a parametric model in BMS. (iv) We apply our methodology on a search space of a simple EC mechanism and demonstrate the application of the BMS oracle to classify CV responses in order of its S-shape. (v) Finally, we use the BMS oracle in active search to address the challenges in knowledge extraction, virtual screening of materials for electrochemical applications using cyclic voltammetry.

2. Methods

Our goal is to identify the measurement conditions from which one can extract kinetic information captured in a CV response. Towards this goal, we seek to identify measurement conditions for which an S-shape CV curve is collected and registered as such by our oracle. We use an active learning technique summarized in Figure 1 to accelerate the search for measurement settings within a fixed computational budget. Our active learning approach involves iterative collection of data points from a search space

S

. The process starts with a small set of observed data

D = (S, Y)

, where

S \in S

are the observed locations and

Y \in {- 1, 1}

are corresponding labels. In each iteration, the algorithm collects data and incrementally updates the decision model

p (y = + 1 | D)

it aims to learn with y representing a label. A user-defined selector (or policy) identifies a (or a batch of) candidate location(s) in the search space for observing the responses. The policy typically maximizes a utility function given the decision model. For example, given D, we can define a policy using a utility function that simply counts number of targets in the dataset

u (S) = \sum_{s_{i} \in S, y_{i} \in Y} [y_{i} = + 1]

. A policy can be defined to potentially select more targets to be added to the data pool D using:

s^{*} = \underset{s \in S}{argmax} E [u (S ∖ S | D)]

(1)

Given a location

s^{*} \in S

, the corresponding experiment is performed and a response is collected. In this work, we collect a CV response curve from a simulator and then pass it onto an oracle. Oracle labels the response to be either a target or non-target (for example in Figure 1, we show a non-target like CV shape which will be assigned

y^{*} = - 1

as a label). The next step is to augment D using the data collected in the current iteration i.e.,

D^{*} = D \cup (s^{*}, y^{*})

. The decision model is then updated with

D^{*}

. This process is repeated until a computational budget—defined in terms of total number of label queries or equivalently number of simulations—is exhausted. As an oracle, we use Bayesian Model Selection (BMS) that operates on two models

M_{1}, M_{2}

referred to as null model (representing a typical CV curve) and target model (representing an S-shaped CV curve), respectively. Moreover, we use a variation of active learning called active search [23] which maximizes the number of targets found in contrast to traditional active learning where the selector is defined with a goal to closely approximate

p (y = + 1 | D)

.

3. Bayesian Model Selection

The key component of our active learning framework is the oracle. We use BMS as a tool to identify a preferred model from a family of parametric probability distributions, each of which can explain the observed data with differing degrees of fidelity. Using a supervised learning procedure that compares an input

X

and output

y

, we compute a model posterior using Bayes rule to select the model that best explains the observed data

D = (X, y)

.

Given the observed data

D = (X, y)

, we compute the probability that the data are sampled from any given model encoding our prior information using a posterior probability. Computed posterior probabilities will be used as a score to differentiate whether the collected response (for example, a CV response) is a target (with higher probability for the corresponding target model) or not. In this work, we use both BMS and active learning in a related but different context. BMS is used with observed data encoding a single CV curve while active learning is used in the search space with their corresponding binary labels (i.e., a target or not) as observed data. Moreover, BMS is used as an oracle for the active learning task with models

M_{j}

defined using

GP

.

For each model

M

with a parameter index

θ

—a concatenated vector of hyper-parameters—we first compute model evidence

p (y | X, M)

on the observed data

D = (X, y)

:

p (y | X, M) = \int p (y | X, θ, M) p (θ | M) d θ

(2)

where

p (y | X, θ, M)

is the probability of obtaining outputs

y

given input data

X

and a model

M

.

p (θ | M)

represents a probability distribution of parameter

θ

for any given model

M

.

To understand which model to prefer from a finite set of models

{\{M_{i}\}}_{i = 1}^{n}

, we apply the Bayes rule to compute the posterior probability of each model

M_{j} (j \in {1, 2, \dots n})

given data

D

using the posterior of Equation (2):

p (M_{j} | D) = \frac{p (y | X, M_{j}) p (M)}{p (y, X)}

(3)

where

p (M)

represents a prior over the finite set of models that is typically taken to be uniform i.e., no prior preference to any single model. One common approach is to use a logarithm of the probability which can be interpreted as the information content of a probability model given data. Taking the logarithm of Equation (3), we get the following (See Supplementary Information):

log p (M_{j} | D) = - log [1 + \sum_{i \neq j}^{n} \frac{p (y | X, M_{i})}{p (y | X, M_{j})}]

(4)

3.1. $GP$ Models for Catalytic Responses

In this paper, we consider a

GP

as a distribution over smooth real valued functions

g : X \to R

. Assuming the observation model

p (y \in R | g)

is known, the standard approach is to use non-parametric Bayesian approach by placing a

GP

distribution over g, i.e.,

p (g) = GP (μ (x), k (x, x^{'}))

. Here,

μ (x) : X \to R

is a mean function and

k (x, x^{'}) : X \times X \to R

is a covariance function. A function-space viewpoint provides an intuitive explanation of

GP

as vector space of functions in a chosen (potentially nonlinear) feature space with

ϕ (x)

as a basis. In the function space representation, the observation model plays the role of weights W with function g represented using

g (x) = ϕ {(x)}^{⊤} W

. It can be shown (see Supplementary Information) that

ϕ (x)

can be implicitly defined using the covariance function

k (x, x^{'})

between pair of inputs

x, x^{'} \in X

and a mean function

μ (x)

of W. For this reason, we use

k (x, x^{'})

and basis function of

GP

interchangeably in this paper. The mean function

μ

encodes an average behavior of the function g over

X

. The covariance function

k (x, x^{'})

encodes the correlations between outputs

g (x), g (x^{'})

for any given pair of input points (

x, x^{'}

). In this work, we denote the concatenated vector of the parameters in

μ (x)

and

k (x, x^{'})

as

θ

. Once we select a

GP

encoding our prior beliefs, we use Bayes rule to update our posterior

p (g | D)

conditioned on observed data

D = (X, y)

, where

y

are the discrete evaluations of function g at inputs

X

. For more information on

GP

, readers are referred to [24].

We represent a typical response from a cyclic voltammetry experiment as a function

I (t, v)

with I being the current response collected at a time (t) for a time dependent applied voltage

v = V (t)

. The voltage load

V (t)

is typically chosen to be linear, and the voltammetry is often referred to as direct current voltammetry [8]. Classification of a CV into an S-shape (or not S-shape) can be looked at as determining a model evidence of a function defined by CV curve

(v, t) \mapsto I

under a

GP

function space with observed data

D

given by the discrete CV curve

X = I, y = (v, t)

. The covariance of a CV curve gives rise to the basis functions in the

GP

space and the time-voltage grid becomes the input space where the function is evaluated. For any given CV curve, its representation in the

GP

function space is obtained by finding a

θ

that maximizes the posterior probability

p (X = I | y = (v, t))

. We use the maximum a posteriori or MAP estimation. We choose the

GP

model with a non-stationary covariance as a target model

M_{2}

. It follows from the reproducing kernel Hilbert space (RKHS) theorem (Ch 12.4 in [25]) that any smooth function can be represented using a kernel or a covariance function. Thus, for the null model (

M_{1}

), it is sufficient to use a

GP

with smoothness controllable covariance function. A brief overview of the covariance functions selected as basis functions is described below. For both models

M_{1}

and

M_{2}

, the mean function is chosen to be

μ (x) = 0

as we normalize the response curves

I (v, t)

to be with in

(0, 1)

and expect the covariance function to determine the shape of the CV curve.

3.1.1. Squared Exponential Covariance

We use the commonly known squared exponential kernel (in Equations (2) and (5)) as a covariance model for

M_{1}

where the resulting feature map

ϕ (x)

forms a basis for functions that are smooth and stationary:

k (x, x^{'}) = σ_{f}^{2} exp ({(x - x^{'})}^{⊤} Λ^{- 1} (x - x^{'}))

(5)

In Equation (5),

σ_{f}

is scaling parameter, and

Λ

is a diagonal matrix with each entry as a length scale for the corresponding dimension of

x, x^{'} \in X

. The left panel of Figure 2 depicts five samples drawn at random from the

GP

with the covariance in Equation (5). The right panel of the same figure depicts the covariance function visualized on a uniform grid of

X \times X

as contours. From Figure 2, it can be seen that the covariance is stronger (≈1) between inputs with Euclidean norm (i.e., distance) less than a length scale controlled by the parameter

Λ

.

3.1.2. Neural Network Covariance

We use a neural network covariance kernel to build a

GP

function space representation for the target model

M_{2}

(shown in Equation (6) and Figure 3). The fast kinetic (or S-shape curve) responses have a non-stationary covariance and hence we choose a covariance that is effective in handling rapidly changing signals:

\begin{matrix} k (x, x^{'}) & = σ_{f}^{2} {sin}^{- 1} (\frac{x^{⊤} Λ^{- 2} x^{'}}{\sqrt{h (x) h (x^{'})}}) \\ h (x) & = 1 + x^{⊤} Λ^{- 2} x \end{matrix}

(6)

In Equation (6),

σ_{f}

is scaling parameter, and

Λ

is a diagonal matrix with each entry as a length scale. Figure 3 is analogous to Figure 2, and it can be seen that the covariance is high (≈1 in two blocks of input locations that are separated by a completely un-related input locations (covariance

\approx 0

). This is in contrast to

M_{1}

where the covariance is determined by some form of distance between input points.

4. Results

To demonstrate the application of active search for S-shaped CV curves, we choose a classic EC-mechanism, which consists of two reactions: E and C (reactions (R1) and (R2)) corresponding to one electron transfer reaction (E) and one chemical reaction (C), respectively. EC-mechanism is selected as it is a well studied mechanism [3,4,26] that produces a variety of CV shapes thus serves as a good test case for the oracle proposed in this paper. In this work, we use the MECSim [27] simulator to generate CV curves on demand.

4.1. Data Generation

The EC mechanism is a two step reaction comprising of an electron transfer—Equation (R1) —followed by a chemical reaction in Equation (R2):

\begin{matrix} P + e ⇌ Q \end{matrix}

(R1)

\begin{matrix} Q + A \to P \end{matrix}

(R2)

Electro-chemical kinetics of the EC mechanism can be modeled and solved using governing partial differential equations [28]. In this work, we are interested in modeling the kinetics of species (and electron) that contributes to current generation under cyclic voltage sweep at a given sweeping rate.

Towards this goal, the transport of the three species (P, Q, A) in the solution is modeled using Fick’s second law of diffusion with a source term corresponding to the heterogeneous reactions:

\begin{matrix} \frac{\partial C_{P}}{\partial t} & = D_{diff} \frac{\partial^{2} C_{P}}{\partial u^{2}} + k_{s} C_{Q} C_{A} \\ \frac{\partial C_{Q}}{\partial t} & = D_{diff} \frac{\partial^{2} C_{Q}}{\partial u^{2}} - k_{s} C_{Q} C_{A} \\ \frac{\partial C_{A}}{\partial t} & = D_{diff} \frac{\partial^{2} C_{A}}{\partial u^{2}} - k_{s} C_{Q} C_{A} \end{matrix}

(7)

with the boundary conditions defined as follows:

\begin{matrix} t & = 0, \forall u & C_{P} = C_{P}^{0}, C_{A} = C_{A}^{0}, C_{Q} = C_{Q}^{0} \\ t & > 0, u \to \infty & C_{P} = C_{P}^{0}, C_{A} = C_{A}^{0}, C_{Q} = C_{Q}^{0} \\ t & > 0, \forall u & \frac{\partial C_{A}}{\partial u} = 0; \frac{\partial C_{P}}{\partial u} + \frac{\partial C_{Q}}{\partial u} = 0; C_{P} / C_{Q} = exp (\frac{F}{R T} (V - E^{0})) \end{matrix}

(8)

In Equations (7) and (8), the formal reversible potential of electron transfer reaction (R1) is

E^{0}

, the concentration of catalyst P is

C_{P}

, species Q is

C_{Q}

, and substrate A is

C_{A}

.

D_{diff}

is a common diffusion coefficient for all species, and

k_{s}

is the rate constant of the forward reaction in Equation (R2). The spatial domain is denoted as u starting from the working electrode (i.e.,

u = 0

) assuming a semi-infinite domain. The time scale of the simulation is denoted as t. Initial concentrations (i.e., at

t = 0

) are denoted with a superscript 0. V represents the time varying applied voltage. For a cyclic voltage sweep between voltages

V_{i}, V_{f}

at a rate of

ν V / s

, we get Equation (9) for V (

T_{s}

is switching time):

\begin{matrix} V (t) = \{\begin{matrix} V_{i} + ν t & 0 < t < T_{s} \\ V_{f} - ν t & T < t < 2 T_{s} \end{matrix} \end{matrix}

(9)

Digital simulation of a system of partial differential equations in Equations (7) and (8) is performed to determine spatio-temporal concentration profiles of species P, Q, A. The Faradaic current observed during the cyclic voltage load is computed using Equation (10) following the Butler–Volmer model for heterogeneous electron transfer at the electrode surface:

\begin{matrix} i (t, v) = F A_{surf} k^{0} [C_{Q} exp (\frac{α F}{R T} (V - E^{0})) - C_{P} exp (\frac{(1 - α) F}{R T} (V - E^{0}))] \end{matrix}

(10)

In Equation (10), F is Faraday’s constant,

A_{surf}

is surface area of electrode (=1 cm²), R is universal gas constant, and T is room temperature.

k^{0}

is heterogeneous electron transfer rate constant and

α

is a symmetric charge transfer coefficient (=0.5).

We use the freeware software MECSim [27,29] to digitally simulate the cyclic voltammetry response in the voltage range of [−0.5 V, 0.5 V] (http://www.garethkennedy.net/MECSimDownload.html, accessed on 1 June 2021). Along with the parameters used in Equations (7) and (8), MECSim can also simulate the effects of an uncompensated resistance (

R_{u}

), double layer capacitance (

C_{d l}

), which is not used in this work. We form a six-dimensional design search space using

C_{P}^{0}, C_{0}^{A}, k_{s}, k^{0}, ν, E^{0}

and set the values of

R_{u} = 0, C_{d l} = 0, α = 0.5, D = 1 \times 10^{- 5}

. Table 1 lists the combinatorial space defined with six design variables (dimensions of search space) and number of samples along the dimension used to create an exhaustive search grid of tunable settings. After excluding responses from a diverging simulation arising from a combination of non-physical parameters for MECSim (see MECSim documentation for known limitations), we get a total of

\approx 17 \times 10^{3}

CV curves in our database.

4.2. Using BMS as an Oracle to Identify S-Shaped CV Curves

We demonstrate the application of the BMS oracle to label the CV responses as a target, if they are of S-shape, and as non-targets otherwise. We use the proposed BMS oracle to label the CV responses and couple it with standard active search techniques to find our “targets” within a given budget of label queries (i.e., number of queries to the simulator). To accommodate for the high-throughput search running a batch of experiments at a time, we run the active search using both sequential selection of query locations (batch size

b = 1

) and a batch selection (

b = 100

). We use the design space in Table 1 and aim to find as many targets as possible in the resulting combinatorial design space

S

of six parameters (dimension of

S

).

To demonstrate the efficacy of the proposed methods, we first pre-compute labels for a set of ten CV curves in

S

of varying shapes. Figure 4 depicts the chosen CV curves ordered based on model posterior (Equation (4)) percentile rank. Notably, the highest scored CV curves have the S-shape which are of interest in this work. From Figure 4, it can be noted that BMS assigned the highest score to CV curves where the forward and backward sweeps overlap exactly i.e., no hysteresis or capacitive behavior (highlighted using a red box). On the other spectrum, the oracle labels several types of CV curves with low scores. These types include the classic “duck-shape” curves, or curves that are diffusion driven with peaks in the forward and backward sweep.

We attribute the good performance of BMS in our study to its inherent ability to handle Gaussian noise, frequently observed with cyclic voltammetry responses [18]. For comparison, we studied two other oracles to rank a CV curve based on their S-shape characteristics using point-wise comparison: (a) similarity score that uses a generic approach of computing the Euclidean norm between a given CV curve and user defined reference S-shape; and (b) FOWA score that is physics inspired and defined as the

R^{2}

-value for any CV curve in Foot of the Wave Analysis (FOWA) coordinate space (a perfect symmetric S-shaped CV curve has

R^{2} = 1

).

4.2.1. Similarity Score Based Oracle

We compute a Euclidean norm between a reference CV curve

I_{r e f}

and any given CV curve I using

{\sqrt{{(I - I_{r e f})}^{2}}}

both represented as vectors in a high-dimensional space. In Figure 5, ten representative CV curves are depicted sorted and color-coded based on the similarity score (similar to Figure 4). The results showcase poor performance of the similarity score and ordering the CV curves with only two targets in the top three. Although it marks the first curve correctly with a high score, it fails to identify other S-shaped curve that is shifted along the voltage with respect to the first curve.

4.2.2. FOWA-Based Score

We present a scoring method using the Foot of the Wave Analysis (FOWA) presented in [3]. In the FOWA analysis, the original signal

(I, V)

is transformed using the map

(I, V) \mapsto (I, 1 + exp [\frac{F}{R T} (E - E^{0})])

. An S-shaped CV curve is expected to be a linear plot passing through origin in FOWA coordinate space. Thus, we propose to score a CV curve based on its

R^{2}

-value computed with reference to user defined S-shape (for multiple S-shaped CV curves, maximal

R^{2}

value over the set is used), quantifying the linearity of the curve after the FOWA transformation. Figure 6 is similar to Figure 4 with the FOWA-based score being used to rank the CV curves. FOWA-based score ranks two of the target S-shaped CV curves in the top category. However, it fails to rank another S-shaped CV curve, like the fourth curve with a current onset only slightly shifted. When discussing the effectiveness, we note that both of the approaches mentioned in this section depend on the usage of reference S-shapes from which the

R^{2}

value is computed. In particular, a user needs to select a set of S-shapes that span the expected range in terms of current onset, rate constant etc., which is a non-trivial choice and adds to the heuristics. For these reasons, BMS is the preferred approach as it exploits the geometrical shape of CV curves directly without need for any user defined references. Moreover, we note that BMS has an inherent ability to account for Gaussian noise.

5. Active (Batch) Search for S-Shaped CV Curves

Active search with batch selection of locations in the design space has been recently studied and successfully applied to high throughput combinatorial search of material and drug discovery [15]. We use the state-of-the-art active batch search introduced in Jiang et al. [15], with a fixed budget of 1000 queries (≈6% of exhaustive search with details in Table 1) to the simulator for batch sizes of

b \in {1, 100}

to actively query our combinatorial search space

S

. For batch

b = 1

, the decision model

p (y = + 1 | D)

is updated after each iteration, while, for batch size

b = 100

, the decision model is updated after 100 CV measurements from the simulator and oracle. The batch size reflects the setting of the high throughput analysis, as often material is prepared in batches.

A label for any given location is assigned based on the application of BMS oracle to the corresponding CV curve

I (v, t)

simulated by solving Equations (7) and (8) over

4 \times 10^{3}

discrete time points. We label a CV response as a target if its BMS oracle score is in the range defined by the top three percentile ranks shown in Figure 4. This is a heuristic and can be altered based on application. Similarly, for active search of bi-functional oxygen electrocatalysts, one can assign a material as a target if both of its OER and ORR experimental CV curves are in the top three percentile ranks of BMS scores.

We assume that our design space is continuous; thus, a k-nearest neighbor probability distribution is used as a decision model in Bayesian active learning following the approach in [15]. This assumption implies that, if we find a target at a certain location in the search space, k-closest neighbors in the design space also are highly likely to be a target as well.

In Figure 7, we report the average number of targets found in the design space over the number of label queries for two batch sizes

(b = 1100)

considered. The number of targets are averaged over a total of 20 active searches each time starting with a randomly selected sample in the search space. Our results demonstrate that searching the design space using active learning can be useful, with a near linear target detection. It can also be noted from Figure 7 for any given number of allowed label queries to the oracle (or equivalently number of simulation queries to the simulator), the sequential selection finds marginally more targets than the batch selection

b = 100

. This observation is in accordance with Theorem 1 in [15]. Jiang et al. [15] argue that batch selection suffers from having to select a batch from the search space with fewer observed responses and locations. However, from an experimental point of view, one needs to consider the advantages and disadvantages of sequential selection over batch selection.

Our results highlight the capability to detect the target type of signal at a low cost. As the number of queries increases, the number of targets found also increases. Moreover, the cost is linearly proportional to the work done. This is in addition to the low budget of 6% queries required by the exhaustive search. Altogether, our results showcase the acceleration compared to the exhaustive search.

6. Conclusions and Future Work

In conclusion, we defined and evaluated a

GP

-based oracle for materials discovery using cyclic voltammetry. Next, we combined the oracle with a state-of-the-art active batch search to identify conditions resulting in the targeted shape of CV curve. We demonstrated a robust high throughput combinatorial search to find the target responses using only <6% of the total number of CV experiments from the corresponding exhaustive search (with a discrete sampling of a modest five levels per dimension).

This work has implications in identification of characterization conditions where kinetic knowledge extraction from the cyclic voltammetry can be performed more effectively. Specifically, we have illustrated a framework that can be used to identify S-shaped CV curves. Once an S-shaped CV curve is obtained, a foot of the wave analysis can be applied [12] to extract rate constant for rate determining step, overpotential dependent turn over frequency, etc. This work has been motivated by the challenges in catalyst screening for bi-functional alkaline fuel cells [30]. However, this work applies to any homogeneous and heterogeneous catalysts screening used in energy conversion and storage systems [31,32].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/en15134575/s1. Reference [33] are cited in the Supplementary Materials.

Author Contributions

Conceptualization, K.V. and O.W.; methodology, K.V. and O.W.; software, K.V.; validation, K.V.; resources, O.W.; data curation, K.V.; writing K.V. and O.W. All authors have read and agreed to the published version of the manuscript.

Funding

The authors thank the Toyota Research Institute for providing the support through Accelerated Material Design Discovery program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data and code to reproduce the experiments from this paper can be found at https://github.com/kiranvad/gpcv, accessed on 1 June 2021.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gosser, D.K. Cyclic Voltammetry: Simulation and Analysis of Reaction Mechanisms; VCH: New York, NY, USA, 1993; Volume 43. [Google Scholar]
Bard, A.J.; Faulkner, L.R. Fundamentals and applications. Electrochem. Methods 2001, 2, 580–632. [Google Scholar]
Costentin, C.; Saveant, J.M. Cyclic voltammetry analysis of electrocatalytic films. J. Phys. Chem. C 2015, 119, 12174–12182. [Google Scholar] [CrossRef]
Rountree, E.S.; McCarthy, B.D.; Eisenhart, T.T.; Dempsey, J.L. Evaluation of homogeneous electrocatalysts by cyclic voltammetry. Inorg. Chem. 2014, 53, 9983–10002. [Google Scholar] [CrossRef] [PubMed]
Martin, D.J.; McCarthy, B.D.; Rountree, E.S.; Dempsey, J.L. Qualitative extension of the EC Zone Diagram to a molecular catalyst for a multi-electron, multi-substrate electrochemical reaction. Dalton Trans. 2016, 45, 9970–9976. [Google Scholar] [CrossRef] [Green Version]
Shim, H.S.; Yeo, I.H.; Park, S.M. Simultaneous multimode experiments for studies of electrochemical reaction mechanisms: Demonstration of concept. Anal. Chem. 2002, 74, 3540–3546. [Google Scholar] [CrossRef] [PubMed]
Stein, H.S.; Guevarra, D.; Shinde, A.; Jones, R.J.; Gregoire, J.M.; Haber, J.A. Functional mapping reveals mechanistic clusters for OER catalysis across (Cu–Mn–Ta–Co–Sn–Fe)O_x composition and pH space. Mater. Horizons 2019, 6, 1251–1258. [Google Scholar] [CrossRef] [Green Version]
Li, J.; Kennedy, G.F.; Gundry, L.; Bond, A.M.; Zhang, J. Application of Bayesian Inference in Fourier-Transformed Alternating Current Voltammetry for Electrode Kinetic Mechanism Distinction. Anal. Chem. 2019, 91, 5303–5309. [Google Scholar] [CrossRef]
Haber, J.A.; Xiang, C.; Guevarra, D.; Jung, S.; Jin, J.; Gregoire, J.M. High-Throughput Mapping of the Electrochemical Properties of (Ni-Fe-Co-Ce) Ox Oxygen-Evolution Catalysts. ChemElectroChem 2014, 1, 524–528. [Google Scholar] [CrossRef] [Green Version]
Suram, S.K.; Haber, J.A.; Jin, J.; Gregoire, J.M. Generating information-rich high-throughput experimental materials genomes using functional clustering via multitree genetic programming and information theory. ACS Comb. Sci. 2015, 17, 224–233. [Google Scholar] [CrossRef] [Green Version]
Nørskov, J.K.; Rossmeisl, J.; Logadottir, A.; Lindqvist, L.; Kitchin, J.R.; Bligaard, T.; Jonsson, H. Origin of the overpotential for oxygen reduction at a fuel-cell cathode. J. Phys. Chem. B 2004, 108, 17886–17892. [Google Scholar] [CrossRef]
Wang, V.C.C.; Johnson, B.A. Interpreting the Electrocatalytic Voltammetry of Homogeneous Catalysts by the Foot of the Wave Analysis and Its Wider Implications. ACS Catal. 2019, 9, 7109–7123. [Google Scholar] [CrossRef]
Costentin, C.; Drouet, S.; Robert, M.; Saveant, J.M. Turnover numbers, turnover frequencies, and overpotential in molecular catalysis of electrochemical reactions. Cyclic voltammetry and preparative-scale electrolysis. J. Am. Chem. Soc. 2012, 134, 11235–11242. [Google Scholar] [CrossRef] [PubMed]
Matheu, R.; Neudeck, S.; Meyer, F.; Sala, X.; Llobet, A. Foot of the wave analysis for mechanistic elucidation and benchmarking applications in molecular water oxidation catalysis. ChemSusChem Commun. 2016, 8, 3361–3369. [Google Scholar] [CrossRef] [PubMed]
Jiang, S.; Malkomes, G.; Moseley, B.; Garnett, R. Efficient nonmyopic active search with applications in drug and materials discovery. arXiv 2018, arXiv:1811.08871. [Google Scholar]
Oglic, D.; Oatley, S.A.; Macdonald, S.J.; Mcinally, T.; Garnett, R.; Hirst, J.D.; Gärtner, T. Active search for computer-aided drug design. Mol. Inform. 2018, 37, 1700130. [Google Scholar] [CrossRef]
Gardner, J.R.; Song, X.; Weinberger, K.Q.; Barbour, D.L.; Cunningham, J.P. Psychophysical Detection Testing with Bayesian Active Learning. In Proceedings of the UAI, Amsterdam, The Netherlands, 12–16 July 2015; pp. 286–295. [Google Scholar]
Gavaghan, D.J.; Cooper, J.; Daly, A.C.; Gill, C.; Gillow, K.; Robinson, M.; Simonov, A.N.; Zhang, J.; Bond, A.M. Use of Bayesian inference for parameter recovery in DC and AC Voltammetry. ChemElectroChem 2018, 5, 917–935. [Google Scholar] [CrossRef] [Green Version]
Robinson, M.; Simonov, A.N.; Zhang, J.; Bond, A.M.; Gavaghan, D. Separating the Effects of Experimental Noise from Inherent System Variability in Voltammetry. Anal. Chem. 2018, 91, 1944–1953. [Google Scholar] [CrossRef] [Green Version]
Gundry, L.; Guo, S.X.; Kennedy, G.; Keith, J.; Robinson, M.; Gavaghan, D.; Bond, A.M.; Zhang, J. Recent advances and future perspectives for automated parameterisation, Bayesian inference and machine learning in voltammetry. Chem. Commun. 2021, 57, 1855–1870. [Google Scholar] [CrossRef]
Bradley, K.; Giagloglou, K.; Hayden, B.E.; Jungius, H.; Vian, C. Reversible perovskite electrocatalysts for oxygen reduction/oxygen evolution. Chem. Sci. 2019, 10, 4609–4617. [Google Scholar] [CrossRef] [Green Version]
Jung, J.I.; Risch, M.; Park, S.; Kim, M.G.; Nam, G.; Jeong, H.Y.; Shao-Horn, Y.; Cho, J. Optimizing nanoparticle perovskite for bifunctional oxygen electrocatalysis. Energy Environ. Sci. 2016, 9, 176–183. [Google Scholar] [CrossRef] [Green Version]
Garnett, R.; Krishnamurthy, Y.; Xiong, X.; Schneider, J.; Mann, R. Bayesian optimal active search and surveying. arXiv 2012, arXiv:1206.6406. [Google Scholar]
Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
Deisenroth, M.P.; Faisal, A.A.; Ong, C.S. Mathematics for Machine Learning; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
Kennedy, G.F.; Zhang, J.; Bond, A.M. Automatically identifying electrode reaction mechanisms using deep neural networks. Anal. Chem. 2019, 91, 12220–12227. [Google Scholar] [CrossRef] [PubMed]
Kennedy, G.F.; Bond, A.M.; Simonov, A.N. Modelling ac voltammetry with MECSim: Facilitating simulation–experiment comparisons. Curr. Opin. Electrochem. 2017, 1, 140–147. [Google Scholar] [CrossRef]
Saveant, J.; Su, K. Homogeneous redox catalysis of electrochemical reaction: Part VI. Zone diagram representation of the kinetic regimes. J. Electroanal. Chem. Interfacial Electrochem. 1984, 171, 341–349. [Google Scholar] [CrossRef]
Kennedy, G. Monash Electrochemistry Simulator (MECSim). 2015. Available online: http://www.garethkennedy.net/MECSim.html (accessed on 1 June 2021).
Guerin, S.; Hayden, B.E.; Lee, C.E.; Mormiche, C.; Owen, J.R.; Russell, A.E.; Theobald, B.; Thompsett, D. Combinatorial electrochemical screening of fuel cell electrocatalysts. J. Comb. Chem. 2004, 6, 149–158. [Google Scholar] [CrossRef]
Jin, X.; Duan, X.; Jiang, W.; Wang, Y.; Zou, Y.; Lei, W.; Sun, L.; Ma, Z. Structural design of a composite board/heat pipe based on the coupled electro-chemical-thermal model in battery thermal management system. Energy 2021, 216, 119234. [Google Scholar] [CrossRef]
Duan, X.; Jiang, W.; Zou, Y.; Lei, W.; Ma, Z. A coupled electrochemical–thermal–mechanical model for spiral-wound Li-ion batteries. J. Mater. Sci. 2018, 53, 10987–11001. [Google Scholar] [CrossRef]
Gardner, J.; Malkomes, G.; Garnett, R.; Weinberger, K.Q.; Barbour, D.; Cunningham, J.P. Bayesian active model selection with an application to automated audiometry. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2386–2394. [Google Scholar]

Figure 1. Active learning framework as a flowchart. Active learning iterations start with a few labeled data points in the search space

S

. We stop collecting data when we have selected a pre-defined number (called budget) of locations for updating our decision (or belief) model.

Figure 1. Active learning framework as a flowchart. Active learning iterations start with a few labeled data points in the search space

S

. We stop collecting data when we have selected a pre-defined number (called budget) of locations for updating our decision (or belief) model.

Figure 2. A pictorial representation of Equation (5). Left panel: five samples drawn at random from the

GP

marked with five colors, built using Equation (5), captures the smooth and locally correlated nature of the

GP

. Right panel: a contour plot depicting correlations between outputs of one-dimensional vectors

x, x^{'} \in X

. Color code represents the covariance

k (x, x^{'})

with red representing high covariance i.e., output values

g (x), g (x^{'})

are highly correlated and vice-versa.

Figure 2. A pictorial representation of Equation (5). Left panel: five samples drawn at random from the

GP

marked with five colors, built using Equation (5), captures the smooth and locally correlated nature of the

GP

. Right panel: a contour plot depicting correlations between outputs of one-dimensional vectors

x, x^{'} \in X

. Color code represents the covariance

k (x, x^{'})

with red representing high covariance i.e., output values

g (x), g (x^{'})

are highly correlated and vice-versa.

Figure 3. A pictorial representation of Equation (6). Left panel: five samples drawn at random from the

GP

marked with five colors, built using Equation (6), captures the non-stationary nature nature of the

GP

signified by constant values and sharp rises in the response values. Right panel: a contour plot of covariance between two one-dimensional vectors

x, x^{'} \in X

as inputs. A positive value for

k (x, x^{'})

signifies that output values

g (x), g (x^{'})

are highly correlated and vice-versa.

Figure 3. A pictorial representation of Equation (6). Left panel: five samples drawn at random from the

GP

marked with five colors, built using Equation (6), captures the non-stationary nature nature of the

GP

signified by constant values and sharp rises in the response values. Right panel: a contour plot of covariance between two one-dimensional vectors

x, x^{'} \in X

as inputs. A positive value for

k (x, x^{'})

signifies that output values

g (x), g (x^{'})

are highly correlated and vice-versa.

Figure 4. Representative CV curves from the dataset ordered and color coded using the BMS score. CV curves boxed in red will be labeled as targets by the oracle.

Figure 5. Representative CV curves from the dataset ordered and color coded using the similarity score. CV curves in the red box will be labeled as targets by the similarity score based oracle described in this work.

Figure 6. Representative CV curves from the dataset ordered and color coded using the FOWA score. CV curves in the red box will be labeled as targets by the similarity FOWA score based oracle described in this work.

Figure 7. Active target detection in the EC mechanism combinatorial search space (see Table 1 for definition of search space). We repeat the active search 20 times, each time starting with a randomly chosen non-S-shape data point in

S

.

Figure 7. Active target detection in the EC mechanism combinatorial search space (see Table 1 for definition of search space). We repeat the active search 20 times, each time starting with a randomly chosen non-S-shape data point in

S

.

Table 1. Combinatorial space used to generate CV responses in an EC mechanism along with the number of levels used in the exhaustive search.

Parameter	Range	Number of Levels per Dimension
$log C_{P}^{0}$	[−2, 3]	5
$log C_{A}^{0}$	[−2, 3]	5
$E^{0}$	[−0.4, 0.4]	5
$log k_{s}$	[−1, 6]	5
$log k^{0}$	[−1, 6]	5
$log ν$	[−2, 4]	6

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vaddi, K.; Wodo, O. Active Knowledge Extraction from Cyclic Voltammetry. Energies 2022, 15, 4575. https://doi.org/10.3390/en15134575

AMA Style

Vaddi K, Wodo O. Active Knowledge Extraction from Cyclic Voltammetry. Energies. 2022; 15(13):4575. https://doi.org/10.3390/en15134575

Chicago/Turabian Style

Vaddi, Kiran, and Olga Wodo. 2022. "Active Knowledge Extraction from Cyclic Voltammetry" Energies 15, no. 13: 4575. https://doi.org/10.3390/en15134575

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Active Knowledge Extraction from Cyclic Voltammetry

Abstract

1. Introduction

2. Methods

3. Bayesian Model Selection

3.1. $GP$ Models for Catalytic Responses

3.1.1. Squared Exponential Covariance

3.1.2. Neural Network Covariance

4. Results

4.1. Data Generation

4.2. Using BMS as an Oracle to Identify S-Shaped CV Curves

4.2.1. Similarity Score Based Oracle

4.2.2. FOWA-Based Score

5. Active (Batch) Search for S-Shaped CV Curves

6. Conclusions and Future Work

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Active Knowledge Extraction from Cyclic Voltammetry

Abstract

1. Introduction

2. Methods

3. Bayesian Model Selection

3.1. GP Models for Catalytic Responses

3.1.1. Squared Exponential Covariance

3.1.2. Neural Network Covariance

4. Results

4.1. Data Generation

4.2. Using BMS as an Oracle to Identify S-Shaped CV Curves

4.2.1. Similarity Score Based Oracle

4.2.2. FOWA-Based Score

5. Active (Batch) Search for S-Shaped CV Curves

6. Conclusions and Future Work

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. $GP$ Models for Catalytic Responses