The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering

Sun, Yudong; Huang, Xiaoqiang; Osawa, Yoichi; Chen, Yuqing Eugene; Zhang, Haoming

doi:10.3390/molecules28145353

Open AccessReview

The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering

by

Yudong Sun

¹,

Xiaoqiang Huang

^2,*

,

Yoichi Osawa

¹,

Yuqing Eugene Chen

² and

Haoming Zhang

^1,*

¹

Department of Pharmacology, University of Michigan, Ann Arbor, MI 48109, USA

²

Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA

^*

Authors to whom correspondence should be addressed.

Molecules 2023, 28(14), 5353; https://doi.org/10.3390/molecules28145353

Submission received: 20 June 2023 / Revised: 7 July 2023 / Accepted: 10 July 2023 / Published: 12 July 2023

(This article belongs to the Special Issue Protein Design and Protein Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Wild-type cytochrome P450 CYP102A1 from Bacillus megaterium is a highly efficient monooxygenase for the oxidation of long-chain fatty acids. The unique features of CYP102A1, such as high catalytic activity, expression yield, regio- and stereoselectivity, and self-sufficiency in electron transfer as a fusion protein, afford the requirements for an ideal biocatalyst. In the past three decades, remarkable progress has been made in engineering CYP102A1 for applications in drug discovery, biosynthesis, and biotechnology. The repertoire of engineered CYP102A1 variants has grown tremendously, whereas the substrate repertoire is avalanched to encompass alkanes, alkenes, aromatics, organic solvents, pharmaceuticals, drugs, and many more. In this article, we highlight the major advances in the past five years in our understanding of the structure and function of CYP102A1 and the methodologies used to engineer CYP102A1 for novel applications. The objective is to provide a succinct review of the latest developments with reference to the body of CYP102A1-related literature.

Keywords:

CYP102A1; directed evolution; computational protein design; machine learning

Graphical Abstract

1. Introduction

Cytochrome P450 (CYP or P450) is a superfamily of heme-containing enzymes that have been extensively studied for over five decades. A large number of CYP genes have been found in all kingdoms of life, including 57 in humans, 1056 in mammals, 3753 in bacteria, and many more in plants and fungi [1]. P450s utilize electrons from NAD(P)H to catalyze oxygen insertion into inert C–H bonds, overcoming a high activation energy barrier. P450-catalyzed reactions were first reported in 1955 [2]; it was found that tissue catalysts in the liver microsomes of rabbits were responsible for the metabolism of multiple foreign compounds like hexobarbital and ephedrine. In 1958, Klingenberg characterized and named the catalyst P450 because he discovered the emergence of absorption at 450 nm from rat liver microsomes upon the addition of carbon monoxide (CO) [3]. In 1963, the seminal work by Omura and Sato identified P450s to be a heme-containing protein [4]. Human P450s play an important role in the metabolism of drugs, biopharmaceutical reagents, toxins, and environmental pollutants [5]. The metabolism of drugs by P450s leads to drug metabolites, which are more hydrophilic and are excreted more easily. In bacteria and fungi, P450s are involved in both anabolic and catabolic processes, such as antibiotic synthesis and degradation.

P450-catalyzed reactions depend on electron transfer (ET) from NAD(P)H via redox partners to split dioxygen and insert a single oxygen atom into the substrates [6,7]. The redox partner of mammalian hepatic P450s is NADPH-cytochrome P450 oxidoreductase (POR). POR is a diflavinal protein containing one flavin mononucleotide (FMN) and one flavin adenine dinucleotide (FAD) as its cofactors, capable of sequentially transferring two electrons from NADPH to the heme moiety of P450s where monooxygenation occurs. The center of all P450s is a prosthetic heme where the heme Fe is ligated to the four pyrrole rings of protoporphyrin IX, forming the heme plane to a cysteinyl residue forming the axial thiolate ligand, as in chloroperoxidase and nitric oxide synthetase. The unique thiolate-Fe ligand produces a signature absorption peak at 450 nm when CO binds to the ferrous heme, Fe²⁺. At the resting state (Fe³⁺), the sixth axial ligand is usually occupied by a weakly bound water molecule, resulting in a low-spin ferric heme [8,9]. When the water ligand is expelled by substrate molecules, the heme transits from a low- to high-spin state, a useful feature widely used for spectral titration to determine the substrate binding constant.

P450s catalyze the oxidation of a vast number of small molecules, estimated to be in the range of millions. The catalytic prowess of P450s can be attributed to their ability to accumulate oxidizing equivalents through the catalytic cycle (Figure 1). As aforementioned, substate binding dissociates the sixth aquo ligand, increasing the redox potential of the heme that facilitates ET [10,11]. Then, an electron from NADPH is transferred to the heme, reducing the ferric iron to the ferrous state (the first electron), followed by rapid binding of dioxygen to the heme forming the oxyferrous species. Subsequent delivery of the second electron and two protons through proton relay machinery leads to the formation of a highly oxidizing intermediate, referred to as Compound I. P450 Compound I is a Fe(IV)-oxo species with an additional oxidizing equivalent delocalized in the thiolate-heme moiety [12]. Through a proton abstraction and rebound mechanism originally postulated by Groves [13], a single oxygen atom is inserted into the C–H bond, forming hydroxylated products. Dissociation of the hydroxylated product from the active site returns the catalytic cycle to the resting ferric state. The redox potential of P450 Compound I is estimated to be >1 V [14]. As such, P450 is nicknamed mother nature’s “blow torch”. An unwanted feature of the catalytic cycle is the formation of byproducts, such as superoxide and hydrogen peroxide, via uncoupling reactions (Figure 1, dashed arrows). The formation of these byproducts not only reduces the catalytic coupling efficiency (the molar ratio of the product/NADPH) but also increases the risk of oxidative stress in vivo.

The extraordinary capability of P450s to activate inert C–H bonds at ambient temperatures has attractive applications in biosynthesis, drug discovery, and bioremediation [15,16]. Furthermore, the P450-catalyzed reaction is regio- and stereoselective. These desirable properties of P450s appeal to synthetic chemists because conventional chemical synthesis requires harsh conditions to activate inert C–H bonds, often with low yield and poor regio- and stereoselectivity. The synthesis of drug metabolites was incentivized after the Federal Drug Administration (FDA) issued guidance on the safety testing of drug metabolites in 2016. According to the latest guidance issued in March 2020, disproportionate drug metabolites, referred to as metabolites found only in humans or present at higher levels in humans than in animals, should be considered for safety assessment prior to drug approval. As such, the FDA recommends additional toxicity studies of these metabolites for safety concerns. However, the synthesis of sufficient quantities of drug metabolites, which often contain chiral centers, can be highly challenging using conventional chemical synthesis. Unfortunately, human P450s are poor biocatalysts for practical applications because of their instability, low turnover rate, and high cost.

Fortunately, an ideal biocatalyst for P450-catalyzed reactions is CYP102A1. Wild-type (WT) CYP102A1 catalyzes the regio- and stereoselective hydroxylation of long-chain fatty acids at a high turnover rate of >3000 min⁻¹, the highest among all P450s [17,18,19]. Hydroxylation of long-chain fatty acids by CYP102A1 is highly coupled with a catalytic efficiency as high as 98%. Furthermore, recombinant CYP102A1 is highly expressed in bacteria and yeast at a yield of 0.5–1 g/L of the culture. CYP102A1 is a self-sufficient, highly active, and highly expressed biocatalyst for activating hydrocarbons.

The unique features of CYP102A1 have stimulated intense interest in engineering CYP102A1 to harness its catalytic prowess. Pioneered by Nobel Laureate Dr. Frances H. Arnold, a large repertoire of CYP102A1 variants has been generated by directed evolution and/or rational design to oxidize a plethora of non-natural substrates. Over 15,371 patents have been filed that utilize CYP102A1 variants in a wide range of applications. Here, we briefly review the recent advances in the understanding and application of CYP102A1, with a focus on protein engineering and design. This review is not intended to be comprehensive since several excellent reviews have been published in the past covering various aspects of CYP102A1 [6,18,20].

2. Structure and Function of CYP102A1

CYP102A1, found in Bacillus megaterium in the 1980s, is a soluble fusion form of the P450 enzyme, where its catalytic heme domain (BMP) and diflavinal reductase domain (BMR) are fused in a single peptide chain (Figure 2A). Both the BMP and BMR are highly homologous to the human P450 and POR, respectively. The catalytic properties of CYP102A1 can be attributed to its unique structural organization. The early work by Black and Martin found that CYP102A1 is not monomeric in a solution but rather oligomeric and likely a dimer [21]. It was later found that CYP102A1 is active as a homodimer at a salt concentration of >20 mM but inactive at a low salt concentration of ≤5 mM due to monomerization [22]. The salt-dependent dimerization of CYP102A1 points to an electrostatic nature responsible for dimerization.

The inactive monomeric CYP102A1 raises an interesting question about the mechanism by which the electron is transferred from NADPH to the heme. Biochemical and site-directed mutagenesis studies have proposed two models: (1) intraflavin ET followed by interchain ET to the heme and (2) interflavin ET followed by intrachain ET to the heme [22,23]. The CYP102A1 homodimer structures determined by our group have provided strong evidence in favor of Model 1 [24]. As shown in Figure 2B, the CYP102A1 dimer exists in both “closed” and “open” conformations. The N-terminal BMP and C-terminal BMR domains are connected by a linker peptide. Intraflavin ET from the FAD to the FMN occurs in closed conformation due to the proximity of the FAD and FMN within the same BMR domain. The open conformation supports the interchain ET from the FMN to the heme since the rotation of the FMN subdomain shortens the FMN–heme distance from over 40 Å to ~21 Å [24] (Figure 2B). Consistent with mammalian P450s, the electron is delivered to the heme through the proximal side of the heme, where the thiolate ligand is located, as previously reported [25]. The structures of the individual domains and ET pathways of CYP102A1 are highly homologous to mammalian counterparts. Recently, an intermolecular-crossed complex model of the CYP102A1 homodimer was proposed by Urban and Pompon based on AlphaFold2 modeling [26]. In their model, as shown in Figure 3A, the FMN domain is dimerized with the FAD domain of the opposing monomer, which would favor interflavin ET and intrachain ET to the heme (i.e., Model 2). We thus compared the density map generated from their AlphaFold2 model with our cryo-EM density map. The simulated density map (7.6 Å) for the AlphaFold2 model shows the crossed-density connectivity between the FMN and FAD domains (Figure 3A). However, as shown in Figure 3B, the density connectivity is absent in our experimental cryo-EM map (7.6 Å) at an appropriate contour threshold level of >2.0 to remove background noises between the two BMR domains, suggesting that the two BMR domains are not interconnected. This is further supported by the cryo-EM map from a deletion of the CYP102A1 variant (Δ12) obtained at 6.3 Å (Figure 3C). The Δ12 variant lacks 12 amino acid residues in the linker region (amino acids 456–478). As a result of the deletion, the map was solved at a higher resolution and clearly shows the connectivity from the BMP domain to the FMN domain and then to the FAD domain within the same monomer.

Regardless of the mode of dimerization, the active site architecture of WT CYP102A1 is tailored to accommodate long-chain fatty acids through evolution. Thus, it shows poor activity toward non-natural substrates. For instance, the promiscuous activity toward non-natural substrates such as chlorzoxazone and coumarin is more than 100-fold lower than that of long-chain fatty acids [27]. Extensive efforts have been made in the past three decades to engineer CYP102A1 for the oxidation of non-natural substrates. Most of the efforts have focused on the modification of the catalytic BMP domain, especially residues lining the active site. In addition, many peripheral residues in the BMP domain have been reported to significantly affect the catalytic activity for reasons that are not well understood. It is also recognized that residues in the linker region and BMR domain may also affect the catalytic efficiency. We reported that shortening the linker enhanced the hydroxylation of omeprazole (OMP) by the CYP102A1 A82F variant, whereas others reported that mutations in the FAD domain enhanced coupling reactions [28]. Readers are referred to the excellent review by Whitehouse and coworkers for an extended list of sites mutated to engineer CYP102A1 [18].

3. Engineering and Design of CYP102A1

Protein engineering and design require modification of protein primary sequences to achieve desirable protein properties, such as catalytic activity, stability, regio- and stereoselectivity, etc. Many methods and strategies have been applied to engineer and design CYP102A1, generating novel applications.

3.1. Laboratory Directed Evolution (LDE)

LDE is the most widely used method to engineer CYP102A1 and many other protein targets. It involves the construction of mutant libraries, selection of desirable mutants by screening and triage, and validation with a purified variant protein. To construct the libraries, mutations are introduced by random mutagenesis, targeted mutagenesis, or recombination (family shuffling between homologs). Random mutagenesis utilizes error-prone DNA polymerase to induce mutations randomly in the cDNA sequence of CYP102A1, primarily in the BMP domains. Target mutagenesis can be carried out at selected sites to saturate the mutations. Mutation saturation at the selected sites with prior knowledge can greatly enhance the efficiency in obtaining mutants with desired properties. Recombination can generate chimeric enzymes using protein homologs (usually >70% of the sequence homology). Although LDE is highly efficient for constructing mutant libraries, the burden of success depends on the screening and triage of positive hits with the desired properties.

Using LDE, a large library of novel CYP102A1 variants have been identified that carry out the oxidation of a variety of non-natural substrates, including alkane, alkene, aromatics, pharmaceuticals, drugs, etc. WT CYP102A1 is not known to oxidize alkanes. By directed evolution of the BMP domain coupled with a rapid, reproducible colorimetric screening method, novel variants of CYP102A1 were found to oxidize octane to octanol [29]. After two generations of directed evolution, the turnover rate for the oxidation of octane increased by more than five-fold compared with that of WT CYP102A1. Additional rounds of LDE with modified screening methods permitted the selection of mutant 139-3, which turned out to be the fastest alkane hydroxylase known [30]. In addition, mutant 139-3 is also capable of oxidizing a range of alkenes, including styrene, benzene, cyclohexene, 1-hexene, and propylene [31]. LDE introduced a total of 11 mutations in mutant 139-3 (V78A, H138Y, T175I, V178I, A184V, H236Q, E252G, R255S, A290V, A195T, and L353V). Out of the 11 residues, V78A was the only residue in direct contact with the substrate. This raises intriguing questions about the roles of peripheral residues.

Another promising application of engineering CYP102A1 by LDE is found in the synthesis of human drug metabolites. The synthesis of sufficient quantities of human drug metabolites for preclinical bioactivity and toxicology evaluation can be challenging using conventional methods of chemical synthesis because human drug metabolites often contain chiral centers that require stringent control of regio- and/or stereoselectivity. As reported by Sawayama and coworkers [32], a panel of 120 CYP102A1 variants was selected by a combination of genetic diversification techniques, including random mutagenesis, targeted mutagenesis, and homolog recombination, to metabolize verapamil (a calcium channel blocker) and astemizole (antihistamine, withdrawn from the market in 1999). A subset of 43 variants was active in the metabolism of verapamil, generating all six human metabolites. For example, variant D6H10 converts 78% verapamil with a turnover rate of >1500 min⁻¹, whereas variant 9-10A F87L converts 49% to a single metabolite with a 97% yield. A similar observation was made for astemizole, where a subset of 42 variants produced all seven human metabolites. It was reported by van Vugy-Lussenburg and coworkers that three generations of random mutagenesis resulted in a 200-fold increase in the turnover rate toward the drug substrate dextromethorphan and 3,4-methylenedioxymethylamphetamine [33]. Two of the mutations, F81I and E267V, which have great effects on activity, were not located in the active site. CYP102A1 variants generated by random mutagenesis were also found capable of producing chiral metabolites of simvastatin and lovastatin, two drugs prescribed to treat hyperlipidemia and hypercholesterolemia [34]. In humans, they are primarily metabolized by human CYP3A4/5 to produce several metabolites, including 6′β-hydroxy and exomethylene. The production of these metabolites can be recapitulated at high turnover rates by the M16 and M17 variants. Both variants contain multiple mutations in the BMP domain, including R47L, F81I, F87V, E143G, L188Q, and E267V in M16 and an extra mutation of E64G in M17. The same group also identified CYP102A1 variants by mutagenesis that metabolized OMP, a proton pump inhibitor used to treat gastric acid hypersecretion, to produce 5-hydroxy OMP; the reaction was, however, not stereoselective [35].

In addition to the BMP domain, LDE has also been applied to engineer the BMR domain. Unlike the oxidation of fatty acids by WT CYP102A1, a common problem with CYP102A1 variants in the oxidation of non-natural substrates is the low coupling efficiency; a large portion of consumed NADPH is non-productive, resulting in the formation of superoxide and hydrogen peroxide (Figure 1, the dashed arrows). Using a domain-based strategy where LDE was performed on each domain separately, followed by the combination of the best hits from each domain, Fasan and coworkers reported that novel variants, P450_PMOR1 and P450_PMOR2, exhibited native-like catalytic properties [28]. The two variants oxidized propane with >94% coupling efficiency. Mutations of P654K, D698G, and E1037G in the BMR domain contributed to the high coupling reaction.

LDE is, in general, effective in generating novel mutants with enhanced activity, provided that an effective screening method is available. Furthermore, it is sensitive to subtle changes in locations that are not well understood but exert great efforts on enzyme activity. However, the LDE workload can be demanding, with unpredictable outcomes. For a protein of 200 amino acids, it may be necessary to screen thousands of clones, which rapidly add up in multi-generations of evolution. As more mutant libraries are generated and the structure–function relationship of CYP102A1 is characterized in more detail, many residues that affect substrate selectivity and turnover have been recognized [36,37,38], which facilitates the rational design of CYP102A1 variants. LDE and rational design mutually benefit each other, facilitating the selection of desired mutants.

3.2. Rational Design

Based on prior knowledge of the structure–function relationship of CYP102A1, rational design can be efficient and cost-effective. Early work by Ost and coworkers rationally redesigned the substrate binding site of CYP102A1 to accommodate short-chain fatty acids (C4–C10) based on the crystal structure of WT CP102A1 in a complex with palmitoleate [39]. Mutation of the residues involved in palmitoleate binding, such as R47A/Y51F, L181K, and L181K/L75T, altered the substrate specificity to accommodate the short-chain fatty acids. The catalytic efficiency for hexanoate was increased by 15-fold in the L181K/L75T double mutant. Guided by the structural knowledge of CYP102A1, Dr. Munro’s group converted the WT CYP102A1 to a human CYP2C19-like enzyme by mutating a single residue (A82 → F) [36]. The A82F variant is capable of metabolizing OMP, a typical substrate of CYP2C19, to 5-hydroxy OMP. In humans, OMP is principally metabolized by CYP2C19 to produce 5-hydroxy OMP [40,41]. The catalytic efficiency (k_cat/K_M) of the A82F mutant for OMP hydroxylation is 78 µM⁻¹·min⁻¹. The OMP hydroxylase activity was further enhanced to 777 µM⁻¹·min⁻¹ in the A82F/F87V double mutant. Subsequent studies by the same group revealed that these variants also metabolize a range of proton pump inhibitors, including esomeprazole (S-OMP), lansoprazole, pantoprazole, and rabeprazole [42], providing new routes to the production of these metabolites. F87 is a key residue in the active site of CYP102A1; the mutation of F87 alters the binding and regioselectivity of fatty acids [18,37]. Thus, the enhancement in the hydroxylase activity by F87V can be rationalized. The role of A82F is intriguing. As documented by Butler and coworkers [36], A82 is a “gatekeeper” that controls conformation landscapes. The A82F mutation alters the landscapes in favor of the substrate-bound state. As we can see, knowledge-based design can be of great value in searching for novel variants for biotechnological applications.

3.3. Computational Protein Design (CPD)

Over the past decade, the availability of fast-speed computers and sophisticated algorithms for protein modeling has stimulated increasing attention to CPD. CPD offers researchers in silico tools to perform the redesign and de novo design of enzymes with more predictable outcomes and a higher probability of selecting the desired enzymes. It is a cost-effective approach, complementary to LDE since the time and cost spent for experimental testing are significantly reduced. CPD for enzyme design typically starts with a good understanding of the transition state (TS) or near-attack conformation (NAC) state for the substrate–enzyme complex. Then, an adequate ensemble of ligand conformers that approximate the TS/NAC state is generated in the active-site pocket of a given enzyme scaffold and used for enzyme sequence design. Next, the binding residues within the pocket are redesigned to enhance the ligand–enzyme binding affinity without destabilizing the enzyme’s active site. The designer sequences are usually triaged by an energy-based score function, which varies from one enzyme design tool to another. The designs can then be re-ranked and filtered by more computational analysis, such as molecular dynamics simulation. Regardless, the top-ranked designer enzymes need to be validated experimentally.

Recently, we presented UniDesign (https://github.com/tommyhuangthu/UniDesign, accessed on 15 May 2023) as a universal CPD framework for designing functional protein sequences. Its effectiveness has not only been demonstrated in protein–nucleic acid interaction design by accurately modeling the CRISPR/Cas9 PAM recognition [43] but also in small molecule-catalyzing enzyme design by redesigning CYP102A1 to improve its stereoselectivity for metabolizing OMP (in press). As illustrated in Figure 4, UniDesign consists of five steps: (1) define a suitable set of mutable design sites and another set of repackable sites based on the enzyme structure; (2) generate an ensemble of ligand conformers to mimic the TS/NAC state using a “grow-and-check” strategy, where the conformer generation needs to satisfy appropriate catalytic constraints defined by geometric parameters; (3) search for structure analogs to the enzyme scaffold to build an evolution-based position-specific scoring matrix (PSSM); (4) perform a simulated annealing Monte Carlo simulation to redesign/repack the chosen sites to optimize the weighted total energy of the enzyme–ligand system using a composite score function, which is a combination of the PSSM and a physics- and knowledge-based energy function (UniEF); and (5) rank and choose the top designs for experimental test.

Using UniDesign, we successfully redesigned CYP102A1 variants that hydroxylate R-OMP with high regio- and stereoselectivity. As aforementioned, Dr. Munro’s group reported that CYP102A1 mutants A82F and F87V and the double mutant (A82F/F87V) hydroxylate OMP to produce 5-hydroxy OMP [36,42]. Furthermore, a triple variant (R47L/F87V/L188Q) was reported to hydroxylate OMP to 5-hydroxy OMP [44]. However, these variants do not show stereoselectivity for R-OMP over S-OMP. OMP is administered as a racemate (50/50) consisting of R- and S-OMP. R-OMP is efficiently metabolized by CYP2C19, while S-OMP is slowly metabolized by CYP3A4 [41]. Differentiating the enantiomers would construct a truly 2C19-like CYP102A1 variant. According to the UniDesign binding energy for the two enantiomers, we selected three variants (UD1, UD2, and UD3) for experimentation. All three variants contain a triple mutant (A82F/F87V/L188Q) and one additional mutation of L75I in UD1, A264G in UD2, and A328V in UD3. Detailed characterization showed that all three variants hydroxylate R-OMP to produce 5-hydroxy OMP with high turnover rates; in marked contrast, they all exhibit low turnover rates for S-OMP. The enantiomeric excess (e.e.) values are 92, 87, and 70% for UD1, UD2, and UD3, respectively, a marked improvement over the previously reported CYP102A1 variants for OMP oxidation. The results demonstrate the effectiveness of UniDesign for engineering enzyme stereoselectivity.

In another study, the Rosetta CPD approach has been recently applied to CYP105AS1 to invert the stereoselectivity for the production of the cholesterol-lowering drug pravastatin [45]. CYP105AS1 from Amycolatopsis orientalis hydroxylates compactin (also known as mevastatin in the statin class) to 6-epi-pravastatin. To produce pravastatin, the stereoselectivity for the hydroxylation of compactin must be inverted from a pro-R to a pro-S reaction. LDE was initially adopted to engineer the stereoselectivity of CYP105AS1. More than 15,000 clones were screened and triaged in three generations of LDE. One variant carrying five mutations, referred to as P450_Prava, was selected to produce pravastatin as a major product [45,46]. To further increase the e.e. value, Ashworth and coworkers employed the CoupledMoves protocol in Rosetta to design new variants from P450_Prava. The advantage of the CoupledMoves protocol is the inclusion of protein plasticity in Rosetta, which generates an ensemble of mutant configurations. Short molecular dynamics simulations were performed to eliminate unwanted variants. In the initial round of searching, 14 positions in the active site were chosen based on the alignment analysis to homologous sequences in UniProt. As a result, 9052 low-energy designer sequences were generated, and eight out of the fourteen chosen positions showed credibility for modifying stereoselectivity to the pro-S pathway. Among them, P450_Prava T95F and P450_Prava V180M produced a roughly equal ratio of pravastatin to 6-epi-pravastatin, around 95:5. Rosetta modeling suggested that T95F and V180M were compatible, and their combination (P450_Prava T95F/V180M) produced nearly 100% pravastatin without detection of any unwanted side products.

Li and coworkers employed an integrated approach, combining CPD, evolutionary information, and experimental data-driven optimization to redesign a plant P450 enzyme CYP87D20 for the production of mogrol [47]. The plant family Cucurbitaceae biosynthesizes cucurbitacins, bitter taste compounds, and mogrosides, potent natural sweeteners, from the same cucurbitane backbone. In cucumber, the biosynthesis of cucurbitacin C is catalyzed by CYP87D20 to oxidize cucurbitadienol at C11 to form a keto product, 11C-cucurbitadienol, and then at C25 to form a hydroxylated product, 11C-20H-cucurbitadienol. In the first step toward a new metabolic pathway to biosynthesize mogrol, the authors redesigned CYP87D20 to carry out a single hydroxylation at C11 to produce 11H-cucurbitadienol. Due to the lack of crystal structures of plant CYP enzymes, a homology model was constructed using Rosetta, followed by docking cucurbitadienol into the active site. All side-chain and backbone atoms with 6 Å of the ligand were analyzed using evolutionary information by both a PSSM and their co-evolutionary couplings. In the first round of design, 24 positions with positive PSSM values were identified, resulting in 30 designs that were experimentally evaluated. The results showed that E286 and F112 were key residues for the discrimination between the hydroxylation and ketone formations at C11 of the substrate. This piece of information was then fed into the subsequent rounds of optimization to further improve specificity and activity. The authors systematically generated all protein variants in silico, where >27,000 designs were clustered and filtered. Several design variants, such as L109/F113L/E286A, showed high selectivity to produce 11H-cucurbitadienol.

The CPD methodology is clearly cost-effective in surveying a large protein sequence space for the intended properties. However, the outcome is weighed on an energy-based score function and knowledge of the enzyme–ligand complex. Many factors affect enzyme–ligand interactions, but their effects may not be fully accounted for. For instance, the effects of peripheral residues and conformational dynamics on enzyme activity are difficult to quantify in terms of energy contribution. If no experimental structure is available, CPD will depend on homology models that need validation.

3.4. Machine Learning (ML)

The “holy grail” for protein design and engineering is a fully automated process with little human intervention. The success of the Google AlphaFold project propels innovations in this direction. Through deep learning, AlphaFold2 employs a neural network with repeated layers to predict protein 3D structures from input primary sequences, resulting in astonishingly accurate predictions [48]. The goal to predict protein function seems increasingly plausible with the development of machine learning (ML) algorithms and available big data. Over the past three decades, the LDE of CYP102A1 has already generated a large amount of data, but only a tiny portion of which is deemed useful for the selection of positive clones, while unimproved and inactive mutants toward specific substrates are discarded. However, for ML models that are data-driven, both functionally improved and unimproved/inactive mutants can be useful for model training and refinement, as illustrated in Figure 5. LDE-, CPD-, and ML-based enzyme design approaches are mutually beneficial to each other, not exclusive [49,50].

There are multiple types of algorithms with different specialties having the same goal of discovering the patterns within provided datasets. The most straightforward model is linear regression, which directly correlates the input with the output. Linear algorithms are fundamental ML models typically used to test the suitability of datasets for training purposes. In most cases, a well-prepared dataset is split into at least two parts, training and testing (and sometimes having one more group for validation) [50,51,52]. There are many other advanced ML models, including regression trees, neural networking, and the Gaussian process (GP) [53,54,55]. It is not tenable to assert the superiority or inferiority of one model over another, as all models are inherently flawed. However, sometimes, some are useful.

ML algorithms are classified into supervised and unsupervised, depending on the labeling of the datasets. Supervised ML models are usually adopted for enzyme function predictions due to the presence of labeled enzyme properties, including the catalytic parameters, solubility, and thermostability. They can efficiently discover the patterns between characterized features and labels, forming a decision boundary to predict the properties of the unlabeled or designer inputs [52]. Unsupervised ML models are usually used to analyze unlabeled data so that some unknown sequence–function relationships can be found. However, it can be harder to test the accuracy of unsupervised ML models because unlabeled data may not provide good standards (decision boundaries) for classification.

In 2013, Romero and coworkers utilized the GP, a Bayesian learning technique, to navigate how protein sequence maps function, termed the “fitness landscape”, and they demonstrated that the fitness landscape could be inferred from experimental data with high accuracy [56]. Various protein sequence properties, such as the functional status, thermostability, enzyme activity, and ligand binding affinity, can be modeled using the GP. Initially, the GP model was trained on a set of 242 experimental T₅₀ data (the temperature at which half of the protein is irreversibly inactivated in 10 min) from a diverse set of chimeric P450s whose sequences were generated by combining eight sequence fragments from the heme domains of CYP102A1, CYP102A2, and CYP102A3 (the three isoforms share an ~65% pairwise sequence identity). The trained GP model showed excellent predictive power with a cross-validated Pearson correlation coefficient of 0.95 and a mean absolute error of 1.4 °C. The GP model outperformed a linear regression model weighed on individual sequence fragments. Furthermore, training directly on the experimental data allows the GP model to capture numerous implicit factors, most of which are unknown, and determine whether a sequence will encode a functional P450 enzyme. The trained model predicted 20 sequences for functional enzymes, 17 out of which were experimentally confirmed. Building on this set of sequences, the second-round design produced nine out of ten sequences. Next, a total of 29 functional enzymes were expressed and characterized. The experimental data were used in the GP model to predict the enzyme activity and ligand binding affinity. After a few rounds of optimization, the GP model predicted sequences encoding for highly thermostable enzymes. The most stable enzyme, EXPc5, had a T₅₀ of 69.5 °C, higher than any previously identified variants through directed evolution [56,57].

Although the application of ML models to P450 enzymes is scarce, the number of ML applications to other enzyme targets has mushroomed [51,58]. Fox and coworkers used a partial least-square regression algorithm to identify beneficial mutants in a bacterial halohydrin dehalogenase, leading to an ~4000-fold increase in volumetric productivity [59]. This met the criteria for the commercially relevant process to produce atorvastatin. Liao and coworkers combined high-throughput gene synthesis and ML-based algorithms to engineer proteinase K [60]. Through two cycles of training and testing, a 20-fold increase in proteinase K activity was achieved after testing only 95 variants. Du and coworkers used ML to assist in the design of new antioxidants [61]. Han et al. reported an ML application to improve protein solubility and activity [62].

3.5. Chemical Engineering

The aforementioned strategies are aimed at tuning the protein properties to fit substrates of interest. Conversely, chemical engineering optimizes substrate oxidation not by altering the enzyme but rather by fitting the substrate of interest into the active site of the WT CYP102A1 with a decoy molecule system [63,64] (Figure 6). Taking advantage of the high oxidizing potential of Compound I, decoy molecules facilitate the generation of Compound I and the positioning of the substrate close to the heme iron for oxidation. Assisted with perfluorinated carboxylic acids as decoy molecules, WT CYP102A1 is capable of oxidizing gaseous alkanes, such as propane and butane, as well as benzene and toluene [64,65]. Recently, WT CYP102A1 was reported to oxidize small gaseous methane to methanol at room temperature at 10 MPa [66]. It is notoriously difficult to functionalize methane; the small size of methane does not fit well into the active site of CYP102A1. Using the decoy molecule (3CHPA-Pip-Phe) that mimics N-palmitoylglycine, the authors were able to sequester methane in the proximity of the heme iron for oxidation.

4. Concluding Remarks

Protein design and engineering have transformed CYP102A1 from a fatty acid hydroxylase with high specificity to a highly versatile biocatalyst capable of oxidizing a wide range of non-natural substrates. The utility of CYP102A1 variants has extended to many areas of applications, such as drug discovery, biotechnology, biosynthesis, bioremediation, and beyond. LDE and structured-based CPD methods play a major role in the design and engineering of CYP102A1 for this remarkable transformation. LDE is relatively straightforward to implement, but the outcome depends on rapid, reproducible screening methods, and the workload can be daunting. CPD is cost-effective but requires solid structure–function modeling to effectively design functional variants. Thanks to the functional and structural characterization of the WT and variants of CYP102A1, a wealth of information is already available, which undoubtedly facilitates the CPD approaches. Nonetheless, the factors that affect enzyme activity are complex and not fully understood. Moreover, the protein sequence space is vast, and the functional variant constitutes only a tiny fraction of the sequence space. Thus, the energy-based score function employed by CPD may not fully account for all these factors. The structure–function relationship of CYP102A1 merits further investigation, and we need a holistic approach to understand the molecular machinery of CYP102A1. One of the main challenges for the use of CYP102A1 on the preparative scale is to improve the coupling efficiency, which requires careful consideration of the interactions between the BMP and BMR domains. The cryo-EM structures of the full-length CYP102A1 homodimer show that the CYP102A1 homodimer is highly dynamic, and the ET from the BMR domain to the heme domain involves significant conformational changes, contributing to the overall coupling. The more information available about the structure and function of CYP102A1 mutants, the better ML models can be constructed to predict new variants with enhanced enzyme properties. The use of ML to design and engineer enzymes is a new field with tremendous potential. The success of ML depends on experimental big data and specific algorithms to augment its predictive power. ML for CYP102A1 engineering is in its infancy, with one successful application to improve the thermostability of chimeric P450 enzymes, but the outcome is astonishing. We anticipate that more ML applications to CYP102A1 will emerge in the future.

Author Contributions

Conceptualization, X.H. and H.Z.; writing—original draft preparation, Y.S.; writing—review and editing, X.H., H.Z., Y.O. and Y.E.C.; project administration, X.H. and H.Z.; funding acquisition, X.H., H.Z., Y.O. and Y.E.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the National Institutes of Health (GM149016 to X.H., GM077430 to Y.O., HL159900, HL147527, HL159871 to Y.E.C., and ES030791 to H.Z.) and the University of Michigan Pandemic Research Recovery fund (to H.Z.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the University of Michigan Advanced Research Computing for providing computing resources.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nelson, D.R. Cytochrome P450 diversity in the tree of life. Biochim. Biophys. Acta Proteins Proteom. 2018, 1866, 141–154. [Google Scholar] [CrossRef] [PubMed]
Brodie, B.B.; Axelrod, J.; Cooper, J.R.; Gaudette, L.; La Du, B.N.; Mitoma, C.; Udenfriend, S. Detoxication of drugs and other foreign compounds by liver microsomes. Science 1955, 121, 603–604. [Google Scholar] [CrossRef]
Klingenberg, M. Pigments of rat liver microsomes. Arch. Biochem. Biophys. 1958, 75, 376–386. [Google Scholar] [CrossRef] [PubMed]
Omura, T.; Sato, R. Fractional solubilization of haemoproteins and partial purification of carbon monoxide-binding cytochrome from liver microsomes. Biochim. Biophys. Acta 1963, 71, 224–226. [Google Scholar] [CrossRef] [PubMed]
Guengerich, F.P. Human Cytochrome P450 Enzyme. In Cytochrome P450: Structure, Mechanism, and Biochemistry; de Montellano, P.R., Ed.; Plenum: New York, NY, USA, 1995; pp. 473–535. [Google Scholar]
Dunham, N.P.; Arnold, F.H. Nature’s Machinery, Repurposed: Expanding the Repertoire of Iron-Dependent Oxygenases. ACS Catal. 2020, 10, 12239–12255. [Google Scholar] [CrossRef]
White, R.E.; Coon, M.J. Oxygen activation by cytochrome P-450. Annu. Rev. Biochem. 1980, 49, 315–356. [Google Scholar] [CrossRef] [PubMed]
Sligar, S.G. Coupling of spin, substrate, and redox equilibria in cytochrome P450. Biochemistry 1976, 15, 5399–5406. [Google Scholar] [CrossRef]
Sligar, S.G.; Cinti, D.L.; Gibson, G.G.; Schenkman, J.B. Spin state control of the hepatic cytochrome P450 redox potential. Biochem. Biophys. Res. Commun. 1979, 90, 925–932. [Google Scholar] [CrossRef]
Zhang, H.; Gruenke, L.; Arscott, D.; Shen, A.; Kasper, C.; Harris, D.L.; Glavanovich, M.; Johnson, R.; Waskell, L. Determination of the rate of reduction of oxyferrous cytochrome P450 2B4 by 5-deazariboflavin adenine dinucleotide T491V cytochrome P450 reductase. Biochemistry 2003, 42, 11594–11603. [Google Scholar] [CrossRef]
Guengerich, F.P.; Ballou, D.P.; Coon, M.J. Purified liver microsomal cytochrome P-450. Electron-accepting properties and oxidation-reduction potential. J. Biol. Chem. 1975, 250, 7405–7414. [Google Scholar] [CrossRef]
Rittle, J.; Green, M.T. Cytochrome P450 compound I: Capture, characterization, and C-H bond activation kinetics. Science 2010, 330, 933–937. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Groves, J.T.; McClusky, G.A. Aliphatic hydroxylation via oxygen rebound. Oxygen transfer catalyzed by iron. J. Am. Chem. Soc. 1976, 98, 859–861. [Google Scholar] [CrossRef]
Koppenol, W.H. Oxygen activation by cytochrome p450: A thermodynamic analysis. J. Am. Chem. Soc. 2007, 129, 9686–9690. [Google Scholar] [CrossRef] [PubMed]
Bloom, J.D.; Meyer, M.M.; Meinhold, P.; Otey, C.R.; MacMillan, D.; Arnold, F.H. Evolving strategies for enzyme engineering. Curr. Opin. Struct. Biol. 2005, 15, 447–452. [Google Scholar] [CrossRef] [PubMed]
Caswell, J.M.; O’Neill, M.; Taylor, S.J.; Moody, T.S. Engineering and application of P450 monooxygenases in pharmaceutical and metabolite synthesis. Curr. Opin. Chem. Biol. 2013, 17, 271–275. [Google Scholar] [CrossRef]
Narhi, L.O.; Fulco, A.J. Characterization of a catalytically self-sufficient 119,000-dalton cytochrome P-450 monooxygenase induced by barbiturates in Bacillus megaterium. J. Biol. Chem. 1986, 261, 7160–7169. [Google Scholar] [CrossRef]
Whitehouse, C.J.; Bell, S.G.; Wong, L.L. P450(BM3) (CYP102A1): Connecting the dots. Chem. Soc. Rev. 2012, 41, 1218–1260. [Google Scholar] [CrossRef]
Thistlethwaite, S.; Jeffreys, L.N.; Girvan, H.M.; McLean, K.J.; Munro, A.W. A Promiscuous Bacterial P450: The Unparalleled Diversity of BM3 in Pharmaceutical Metabolism. Int. J. Mol. Sci. 2021, 22, 1380. [Google Scholar] [CrossRef]
Munro, A.W.; Girvan, H.M.; Mason, A.E.; Dunford, A.J.; McLean, K.J. What makes a P450 tick? Trends Biochem. Sci. 2013, 38, 140–150. [Google Scholar] [CrossRef]
Black, S.D.; Martin, S.T. Evidence for conformational dynamics and molecular aggregation in cytochrome P450 102 (BM-3). Biochemistry 1994, 33, 12056–12062. [Google Scholar] [CrossRef]
Kitazume, T.; Haines, D.C.; Estabrook, R.W.; Chen, B.; Peterson, J.A. Obligatory intermolecular electron-transfer from FAD to FMN in dimeric P450BM-3. Biochemistry 2007, 46, 11892–11901. [Google Scholar] [CrossRef] [PubMed]
Neeli, R.; Girvan, H.M.; Lawrence, A.; Warren, M.J.; Leys, D.; Scrutton, N.S.; Munro, A.W. The dimeric form of flavocytochrome P450 BM3 is catalytically functional as a fatty acid hydroxylase. FEBS Lett. 2005, 579, 5582–5588. [Google Scholar] [CrossRef] [PubMed]
Su, M.; Chakraborty, S.; Osawa, Y.; Zhang, H. Cryo-EM reveals the architecture of the dimeric cytochrome P450 CYP102A1 enzyme and conformational changes required for redox partner recognition. J. Biol. Chem. 2020, 295, 1637–1645. [Google Scholar] [CrossRef]
Sevrioukova, I.F.; Li, H.; Zhang, H.; Peterson, J.A.; Poulos, T.L. Structure of a cytochrome P450-redox partner electron-transfer complex. Proc. Natl. Acad. Sci. USA 1999, 96, 1863–1868. [Google Scholar] [CrossRef] [PubMed]
Urban, P.; Pompon, D. Confrontation of AlphaFold models with experimental structures enlightens conformational dynamics supporting CYP102A1 functions. Sci. Rep. 2022, 12, 15982. [Google Scholar] [CrossRef] [PubMed]
Kang, J.Y.; Kim, S.Y.; Kim, D.; Kim, D.H.; Shin, S.M.; Park, S.H.; Kim, K.H.; Jung, H.C.; Pan, J.G.; Joung, Y.H.; et al. Characterization of diverse natural variants of CYP102A1 found within a species of Bacillus megaterium. AMB Express 2011, 1, 1. [Google Scholar] [CrossRef] [Green Version]
Fasan, R.; Chen, M.M.; Crook, N.C.; Arnold, F.H. Engineered alkane-hydroxylating cytochrome P450(BM3) exhibiting nativelike catalytic properties. Angew. Chem. Int. Ed. Engl. 2007, 46, 8414–8418. [Google Scholar] [CrossRef]
Farinas, E.T.; Schwaneberg, U.; Glieder, A.; Arnold, F.H. Directed evolution of a cytochrome P450 monooxygenase for alkane oxidation. Adv. Synth. Catal. 2001, 343, 601–606. [Google Scholar] [CrossRef]
Glieder, A.; Farinas, E.T.; Arnold, F.H. Laboratory evolution of a soluble, self-sufficient, highly active alkane hydroxylase. Nat. Biotechnol. 2002, 20, 1135–1139. [Google Scholar] [CrossRef]
Farinas, E.T.; Alcalde, M.; Arnold, F. Alkene epoxidation catalyzed by cytochrome P450BM-3 139-3. Tetrahedron 2004, 60, 525–528. [Google Scholar] [CrossRef]
Sawayama, A.M.; Chen, M.M.; Kulanthaivel, P.; Kuo, M.S.; Hemmerle, H.; Arnold, F.H. A panel of cytochrome P450 BM3 variants to produce drug metabolites and diversify lead compounds. Chemistry 2009, 15, 11723–11729. [Google Scholar] [CrossRef] [PubMed] [Green Version]
van Vugt-Lussenburg, B.M.; Stjernschantz, E.; Lastdrager, J.; Oostenbrink, C.; Vermeulen, N.P.; Commandeur, J.N. Identification of critical residues in novel drug metabolizing mutants of cytochrome P450 BM3 using random mutagenesis. J. Med. Chem. 2007, 50, 455–461. [Google Scholar] [CrossRef]
Kim, K.H.; Kang, J.Y.; Kim, D.H.; Park, S.H.; Park, S.H.; Kim, D.; Park, K.D.; Lee, Y.J.; Jung, H.C.; Pan, J.G.; et al. Generation of human chiral metabolites of simvastatin and lovastatin by bacterial CYP102A1 mutants. Drug Metab. Dispos. 2011, 39, 140–150. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ryu, S.H.; Park, B.Y.; Kim, S.Y.; Park, S.H.; Jung, H.J.; Park, M.; Park, K.D.; Ahn, T.; Kang, H.S.; Yun, C.H. Regioselective hydroxylation of omeprazole enantiomers by bacterial CYP102A1 mutants. Drug Metab. Dispos. 2014, 42, 1493–1497. [Google Scholar] [CrossRef] [Green Version]
Butler, C.F.; Peet, C.; Mason, A.E.; Voice, M.W.; Leys, D.; Munro, A.W. Key mutations alter the cytochrome P450 BM3 conformational landscape and remove inherent substrate bias. J. Biol. Chem. 2013, 288, 25387–25399. [Google Scholar] [CrossRef] [Green Version]
Noble, M.A.; Miles, C.S.; Chapman, S.K.; Lysek, D.A.; MacKay, A.C.; Reid, G.A.; Hanzlik, R.P.; Munro, A.W. Roles of key active-site residues in flavocytochrome P450 BM3. Biochem. J. 1999, 339 Pt 2, 371–379. [Google Scholar] [CrossRef] [PubMed]
Dietrich, M.; Do, T.A.; Schmid, R.D.; Pleiss, J.; Urlacher, V.B. Altering the regioselectivity of the subterminal fatty acid hydroxylase P450 BM-3 towards gamma- and delta-positions. J. Biotechnol. 2009, 139, 115–117. [Google Scholar] [CrossRef]
Ost, T.W.; Miles, C.S.; Murdoch, J.; Cheung, Y.; Reid, G.A.; Chapman, S.K.; Munro, A.W. Rational re-design of the substrate binding site of flavocytochrome P450 BM3. FEBS Lett. 2000, 486, 173–177. [Google Scholar] [CrossRef]
Yamazaki, H.; Inoue, K.; Shaw, P.M.; Checovich, W.J.; Guengerich, F.P.; Shimada, T. Different contributions of cytochrome P450 2C19 and 3A4 in the oxidation of omeprazole by human liver microsomes: Effects of contents of these two forms in individual human samples. J. Pharmacol. Exp. Ther. 1997, 283, 434–442. [Google Scholar]
Li, X.Q.; Weidolf, L.; Simonsson, R.; Andersson, T.B. Enantiomer/enantiomer interactions between the S- and R- isomers of omeprazole in human cytochrome P450 enzymes: Major role of CYP2C19 and CYP3A4. J. Pharmacol. Exp. Ther. 2005, 315, 777–787. [Google Scholar] [CrossRef] [Green Version]
Butler, C.F.; Peet, C.; McLean, K.J.; Baynham, M.T.; Blankley, R.T.; Fisher, K.; Rigby, S.E.; Leys, D.; Voice, M.W.; Munro, A.W. Human P450-like oxidation of diverse proton pump inhibitor drugs by ‘gatekeeper’ mutants of flavocytochrome P450 BM3. Biochem. J. 2014, 460, 247–259. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Zhou, J.; Yang, D.; Zhang, J.; Xia, X.; Chen, Y.E.; Xu, J. Decoding CRISPR–Cas PAM recognition with UniDesign. Brief. Bioinform. 2023, 24, bbad133. [Google Scholar] [CrossRef]
Jang, H.H.; Ryu, S.H.; Le, T.K.; Doan, T.T.; Nguyen, T.H.; Park, K.D.; Yim, D.E.; Kim, D.H.; Kang, C.K.; Ahn, T.; et al. Regioselective C-H hydroxylation of omeprazole sulfide by Bacillus megaterium CYP102A1 to produce a human metabolite. Biotechnol. Lett. 2017, 39, 105–112. [Google Scholar] [CrossRef] [PubMed]
Ashworth, M.A.; Bombino, E.; de Jong, R.M.; Wijma, H.J.; Janssen, D.B.; McLean, K.J.; Munro, A.W. Computation-Aided Engineering of Cytochrome P450 for the Production of Pravastatin. ACS Catal. 2022, 12, 15028–15044. [Google Scholar] [CrossRef]
McLean, K.J.; Hans, M.; Meijrink, B.; van Scheppingen, W.B.; Vollebregt, A.; Tee, K.L.; van der Laan, J.M.; Leys, D.; Munro, A.W.; van den Berg, M.A. Single-step fermentative production of the cholesterol-lowering drug pravastatin via reprogramming of Penicillium chrysogenum. Proc. Natl. Acad. Sci. USA 2015, 112, 2847–2852. [Google Scholar] [CrossRef]
Li, D.; Ma, Y.; Zhou, Y.; Gou, J.; Zhong, Y.; Zhao, L.; Han, L.; Ovchinnikov, S.; Ma, L.; Huang, S.; et al. A structural and data-driven approach to engineering a plant cytochrome P450 enzyme. Sci. China Life Sci. 2019, 62, 873–882. [Google Scholar] [CrossRef] [PubMed]
Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Zidek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
Athavale, S.V.; Gao, S.; Das, A.; Mallojjala, S.C.; Alfonzo, E.; Long, Y.; Hirschi, J.S.; Arnold, F.H. Enzymatic Nitrogen Insertion into Unactivated C-H Bonds. J. Am. Chem. Soc. 2022, 144, 19097–19105. [Google Scholar] [CrossRef]
Wu, Z.; Kan, S.B.J.; Lewis, R.D.; Wittmann, B.J.; Arnold, F.H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl. Acad. Sci. USA 2019, 116, 8852–8858. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yang, K.K.; Wu, Z.; Arnold, F.H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 2019, 16, 687–694. [Google Scholar] [CrossRef] [Green Version]
Mazurenko, S.; Prokop, Z.; damborsky, J. Machine learning in enxyme engineering. ACS Catal. 2020, 10, 1210–1223. [Google Scholar] [CrossRef]
Yang, Y.; Niroula, A.; Shen, B.; Vihinen, M. PON-Sol: Prediction of effects of amino acid substitutions on protein solubility. Bioinformatics 2016, 32, 2032–2034. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, S.; Zhou, J.; Hu, H.; Gong, H.; Chen, L.; Cheng, C.; Zeng, J. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 2016, 44, e32. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Defresne, M.; Barbe, S.; Schiex, T. Protein Design with Deep Learning. Int. J. Mol. Sci. 2021, 22, 1741. [Google Scholar] [CrossRef] [PubMed]
Romero, P.A.; Krause, A.; Arnold, F.H. Navigating the protein fitness landscape with Gaussian processes. Proc. Natl. Acad. Sci. USA 2013, 110, E193–E201. [Google Scholar] [CrossRef]
Salazar, O.; Cirino, P.C.; Arnold, F.H. Thermostabilization of a cytochrome p450 peroxygenase. ChemBioChem 2003, 4, 891–893. [Google Scholar] [CrossRef]
Xu, Y.T.; Verma, D.; Sheridan, R.P.; Liaw, A.; Ma, J.S.; Marshall, N.M.; McIntosh, J.; Sherer, E.C.; Svetnik, V.; Johnston, J.M. Deep Dive into Machine Learning Models for Protein Engineering. J. Chem. Inf. Model. 2020, 60, 2773–2790. [Google Scholar] [CrossRef]
Fox, R.J.; Davis, S.C.; Mundorff, E.C.; Newman, L.M.; Gavrilovic, V.; Ma, S.K.; Chung, L.M.; Ching, C.; Tam, S.; Muley, S.; et al. Improving catalytic function by ProSAR-driven enzyme evolution. Nat. Biotechnol. 2007, 25, 338–344. [Google Scholar] [CrossRef]
Liao, J.; Warmuth, M.K.; Govindarajan, S.; Ness, J.E.; Wang, R.P.; Gustafsson, C.; Minshull, J. Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol. 2007, 7, 16. [Google Scholar] [CrossRef] [Green Version]
Du, S.; Wang, X.; Wang, R.; Lu, L.; Luo, Y.; You, G.; Wu, S. Machine-learning-assisted molecular design of phenylnaphthylamine-type antioxidants. Phys. Chem. Chem. Phys. 2022, 24, 13399–13410. [Google Scholar] [CrossRef]
Han, X.; Ning, W.; Ma, X.; Wang, X.; Zhou, K. Improving protein solubility and activity by introducing small peptide tags designed with machine learning models. Metab. Eng. Commun. 2020, 11, e00138. [Google Scholar] [CrossRef] [PubMed]
Stanfield, J.K.; Shoji, O. The Power of Deception: Using Decoy Molecules to Manipulate P450BM3 Biotransformations. Chem. Lett. 2021, 50, 2025–2031. [Google Scholar] [CrossRef]
Zilly, F.E.; Acevedo, J.P.; Augustyniak, W.; Deege, A.; Hausig, U.W.; Reetz, M.T. Tuning a P450 Enzyme for Methane Oxidation. Angew. Chem. Int. Edit 2011, 50, 2720–2724. [Google Scholar] [CrossRef] [PubMed]
Shoji, O.; Kunimatsu, T.; Kawakami, N.; Watanabe, Y. Highly Selective Hydroxylation of Benzene to Phenol by Wild-type Cytochrome P450BM3 Assisted by Decoy Molecules. Angew. Chem. Int. Edit 2013, 52, 6606–6610. [Google Scholar] [CrossRef]
Ariyasu, S.; Yonemura, K.; Kasai, C.; Aiba, Y.; Onoda, H.; Shisaka, Y.; Sugimoto, H.; Tosha, T.; Kubo, M.; Kamachi, T.; et al. Catalytic oxidation of methane by wild-type cytochrome P450BM3 with chemicallly evolved decoy molecules. ACS Catal. 2023, 13, 8613–8623. [Google Scholar] [CrossRef]

Figure 1. P450 catalytic cycle. RH, substrate; Cys, thiolate ligand. The three dashed lines indicate potential routes of uncoupling reactions leading to the formation of byproducts like superoxide and hydrogen peroxide.

Figure 2. The primary structure architecture of CYP102A1 (A) and the “closed” and “open” conformations of the full-length CYP102A1 homodimer (B). The cryo-EM structure of full-length CYP102A1 was used to generate the models (EMDB: 20785 and 20786). Chain A is colored silver, while the three domains in chain B are colored using the same color scheme as (A). The chain in which the heme, FMN, and FAD cofactors are associated is shown in parentheses. The blue arrow indicates electron transfer from FMN to the heme. BMP, the catalytic heme domain; BMR, diflavinal reductase domain.

Figure 3. Comparison of the AlphaFold2 model for CYP102A1 homodimer with the cryo-EM models. The two chains of the homodimer are colored grey and orange, respectively. (A) The AlphaFold2 model showing the BMR domain. The backbone is shown in ribbons. The density map, displayed on a transparent surface, was generated at 7.6 Å using Molmap in ChimeraX. The green arrows show the densities connecting the FMN and FAD domains in the intermolecular-crossed complex. (B) Cryo-EM map of the full-length CYP102A1 homodimer at 7.6 Å showing the densities from the BMR domains only (EMDB: 20785). The contour threshold is 2.1 above the background. (C) Cryo-EM map of the Δ12 CYP102A1 homodimer at 6.3 Å showing both the BMP and BMR domains (EMDB: 21100). The dashed green arrows show the linker connecting the BMP and the FMN domains and the hinge connecting the FMN domain and FAD domain within the CYP102A1 monomer.

Figure 4. The UniDesign workflow for the computational enzyme design. The details of the five steps are described in the main text.

Figure 5. Directed evolution and machine learning workflows for enzyme engineering. For directed evolution, random or targeted mutations were introduced into parent sequences, generating libraries of mutants. After screening for function, improved variants will be selected for the next round of mutagenesis or used for biosynthetic purposes, and unimproved variants will be discarded. For machine learning models, both improved and unimproved variants will be for modeling training and testing. The machine learning-designed sequences can undergo further in silico screening and ranking before experimentation.

Figure 6. Illustration of the decoy molecule system for oxidation of small molecule substrate by WT CYP102A1. The small light-blue circle represents the small molecule substrate, methane, whereas Cys-S represents the thiolate ligand to the heme iron. The decoy molecule (3CHPA-Pip-Phe) is positioned at the entrance of the substrate’s binding pocket, facilitating the binding of methane.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Y.; Huang, X.; Osawa, Y.; Chen, Y.E.; Zhang, H. The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering. Molecules 2023, 28, 5353. https://doi.org/10.3390/molecules28145353

AMA Style

Sun Y, Huang X, Osawa Y, Chen YE, Zhang H. The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering. Molecules. 2023; 28(14):5353. https://doi.org/10.3390/molecules28145353

Chicago/Turabian Style

Sun, Yudong, Xiaoqiang Huang, Yoichi Osawa, Yuqing Eugene Chen, and Haoming Zhang. 2023. "The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering" Molecules 28, no. 14: 5353. https://doi.org/10.3390/molecules28145353

Article Menu

The Versatile Biocatalyst of Cytochrome P450 CYP102A1: Structure, Function, and Engineering

Abstract

1. Introduction

2. Structure and Function of CYP102A1

3. Engineering and Design of CYP102A1

3.1. Laboratory Directed Evolution (LDE)

3.2. Rational Design

3.3. Computational Protein Design (CPD)

3.4. Machine Learning (ML)

3.5. Chemical Engineering

4. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI