1. Introduction
Diosgenin, a well-known steroidal sapogenin, is not only an important and valuable material for the production of many steroid drugs [
1], but it is also of great pharmaceutical value itself [
2,
3,
4,
5,
6]. Dioscin is the main form of diosgenin in plants, composed of the diosgenin (27 Carbon atom-formed steroidal skeleton) with an oligosaccharide at the C (3) hydroxy group. The water solubility and efficacy of dioscin are often greatly reduced after the hydrolyzation of the side sugar chain. In nature, diosgenin is distributed in many plants, but it is mainly
Dioscorea plants that can meet the requirements for industrial scale extraction. There are about 137
Dioscorea species containing diosgenin, and in more than 30 of them, the content of diosgenin is higher than 1% by dry weight [
3]. The highest record is up to 16.15%, found in a single
Dioscorea zingiberensis rhizoma in China [
7].
Previous studies have shown that cholesterol is the main precursor of diosgenin biosynthesis, which has the most consistent 27 Carbon skeleton structure with diosgenin (
Figure 1b,c) [
6,
7,
8,
9,
10,
11,
12]. It is generally recognized that the basic 17 carbon atom-formed perhydrocyclopentanophenanthrene skeleton (
Figure 1a) is synthesized starting from acetyl coenzyme A(Acetyl-CoA). Acetyl-CoA can be converted to 2,3-oxide squalene through the mevalonate acid (MVA) pathway or the methylerythritol phosphate (MEP) pathway. Subsequently, 2,3-oxide squalene is led into the cholesterol biosynthesis pathway by cycloartenol synthase (CAS) and finally converted to cholesterol via a variety of enzymes [
13,
14,
15]. Many in-depth studies have been conducted on the metabolism of cholesterol [
13] and its derivatives including phytosterols, steroidal glycoalkaloids (SGAs) [
16,
17] and brassinosteroids [
12]. A series of hydroxylation, oxidation and ring formation reactions sequentially occur on the side chain of cholesterol to produce diosgenin, catalyzed by cytochrome P450 monooxygenases (CYP450s) [
10,
11]. Meanwhile, UDP-dependent glycosyltransferases (UGTs) are involved in the hydroxy group glycosylation, mainly at the C3 position [
18,
19]. De novo synthesis of diosgenin in yeast from the milligram to the gram scale was achieved through metabolic engineering strategies aimed at the synthesis of cholesterol precursors [
20]. The network of cholesterol biosynthesis was also illustrated in
Paris polyphylla, and a high-efficiency cholesterol synthesis system was established promoting the efficiency of diosgenin production in plant chassis [
21].
It has been reported that furostanol I [
22], protodioscin and sitosterol [
23] are intermediates in diosgenin biosynthesis. Furostanol I has a typical 27-C skeleton with an E ring formed through the bonding of hydroxyls at the C16 and C22 positions (C16-O-C22,
Figure 1c). A similar compound has also been found related to SGAs’ biosynthesis starting from cholesterol in
solanaceous plants. In that process, several CYP450s including GAME7/PGA2, GAME8/PGA1, GAME11 and GAME6 sequentially catalyze the hydroxylation and oxidation of cholesterol at the C22, C26 and C16 positions, and finally form the furostanol-type aglycone [
16]. DWF4(CYP90B1) in
Arabidopsis thaliana [
23], CYP90B2 in tomato (
Solanum lycopersicum) and
DzCYP90B71(
D. zingiberensis) were previously reported as cholesterol C22-hydroxylase [
11,
24]. Another report demonstrated that
PpCYP90G4 in Himalayan paris (
Paris polyphylla) and
TfCYP90B50 in fenugreek (
Trigonella foenum-graecum L.) encoded sterol C16,22-dihydroxylase in the diosgenin biosynthesis pathway coupled with several C26-hydroxylases including
PpCCYP94D108,
PpCCYP94D109,
PpCCYP72A616,
TfCYP72A613 and
TfCYP82J17 [
10].
DzCYP90G6 in
D. zingiberensis (monocots) encodes C16-hydroxylase, which is different from its homolog,
PpCYP90G4. PGA1 and PGA2 in potato (
Solanum tuberosum) belong to the CYP72A subfamily, reported to be C26- and C22-hydroxylase, respectively. Clearly, the CYP450s of the CYP72A subfamily play a crucial role in diosgenin biosynthesis. Recently, a chromosome-scale genome (629 Mb) of
D. zingiberensis was generated [
25]. Based on that work, a genome-wide analysis of CYP72A in
D. zingiberensis was performed. A total of 25
CYP72A genes were identified in silico, and nine of them were reported to have high correlation with diosgenin metabolism [
26].
Tobacco (
Nicotiana tabacum L.), as a model plant, containing an endogenous cholesterol biosynthesis pathway, can supply sufficient precursors for diosgenin synthesis [
21]. In the meantime, as a eukaryotic organism, tobacco possesses the necessary conditions for cytochrome 450 to exert its activity. Therefore, it is very suitable for the study of CYP450 candidates related to the diosgenin biosynthesis pathway. In fact, the tobacco transient expression system has been utilized to perform high-throughput function characterization of unknown CYP450s and has successfully illustrated the diosgenin biosynthesis pathway in Himalayan paris and fenugreek for the first time [
9]. In the present study, a high-throughput function characterization platform was built by expressing a combination of
DzCYP90B71(22R-oxidation) and
DzCYP90G6 (C16-oxidation) in tobacco, which could biosynthesize C16,22-dihydrocholesterol form cholesterol to supply the substrate for C26-hydoxylase. Using this platform, potential candidates of CYP450s were studied. Finally, the
DzCYP72A12-4 gene was cloned and identified as sterol C26-hydroxylase in
D. zingiberensis. The amount of diosgenin in the transgenic tobacco plants was determined using an ultra-performance liquid chromatography system (Vanquish UPLC 689, Thermo Fisher Scientific, Bremen, Germany) tandem MS (Q Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer, Thermo Fisher Scientific, Bremen, Germany) (UPLC–MS/MS). The column equipped on the UPLC system was the Hypersil GOLD C18 column 690 (2.1 mm × 100 mm, 3 m, Thermo Fisher Scientific, Bremen, Germany). Meanwhile, some physiological indexes of the transgenic diosgenin-producing tobacco plants were measured to explore the drought adaptability.
3. Discussion
Since the beginning of the steroidal pharmaceutical industry in the first half of the 20th century, diosgenin has become an important starting material for the industrial scale synthesis of steroidal drugs. Of the various sapogenins, diosgenin could be prized because of its suitable structure and abundant source capable of meeting industrial scale demand [
28]. Recently, many studies have found that diosgenin itself has therapeutic effects on a variety of diseases, such as cancers [
29], inflammatory and metabolic diseases [
30], diabetes [
5], hypolipidemia [
31], atherosclerosis [
32] and cardiac diseases [
33]. With the increasing use of diosgenin in the pharmaceutical industry and therapeutic treatment, the demand for diosgenin is increasing rapidly. However, the current sources of diosgenin remain the same, and the germplasm resources have not been improved significantly. The rapidly increasing demand for diosgenin is causing great pressure on industrial extraction. At the same time, problems such as environmental pollution and germplasm degradation arise.
Increasing the content of diosgenin biosynthesized in plants can both alleviate the economic pressure faced by the industrial extraction of diosgenin and avoid the policy risks caused by the environmental issues. Great progress has been made in the exploration of the diosgenin biosynthesis pathway, and many cytochrome 450 s, CYP90B, CYP90G, CYP94 and CYP72A, for instance, have been uncovered catalyzing the hydroxylation and oxidation of cholesterol to produce diosgenin [
9,
10]. As the last step to produce diosgenin, C26-oxidation (also called end-of-tail hydroxylation) could be catalyzed by any one of several distinct CYP450s, such as
PpCYP72A616,
PpCYP94D108 and
PpCYP94D109 in Himalayan paris or
TfCYP72A613 and
TfCYP82J17 in fenugreek, which reflects the relatively loose pairing rule between substrates and enzymes. However, none of the CYP72As, a large CYP450 subfamily playing crucial roles in catalyzing various sterol biosynthesis reactions, have been identified as diosgenin C26-hydroxylase in
D. zingiberensis.
Our previous work published a high-quality chromosome-scale genome (629 Mb) of
D. zingiberensis [
25], which revealed that the
DzCYP72A genes were physically adjacent to each other on chromosomes LG01 and LG09, forming two gene clusters, suggesting the close relationship of
DzCYP72A in function. The genome-wide analysis in
D. zingiberensis identified twenty-five
DzCYP72A genes, and nine of them were found to have a correlation with the content of diosgenin [
26]. In addition,
PpCYP72A616 and
TfCYP72A613 have been characterized as C26-hydroxylase/oxidase for diosgenin biosynthesis; their CYP72A homologs in
D. zingiberensis could be inferred to have similar catalyzing activity.
Screening specific CYP72As participating in diosgenin biosynthesis is meaningful as well as challenging work, considering the high similarity of these CYP450s in structure and function. The C26-hydroxylase/oxidase related phylogenetic tree was constructed to confine the number of CYP450 candidates involved in diosgenin biosynthesis. Then, 47 protein sequences described in previous work were chosen for further analysis (
Table S1). The phylogenetic analysis showed that the
DzCYP72As were divided into two clades (Clade I and Clade II,
Figure 2), which corresponded to the two gene clusters mentioned above. Multiple sequence alignment demonstrated that extreme similarity occurred in the coding sequence of the
DzCYP72As belonging to Clade I (
DzCYP1-13/23/24) and Clade II (
DzCYP72A18-20), respectively (
Table S3), suggesting the close relationship of evolution. Pairwise alignment was performed to calculate the identities between each pair of the
DzCYP72As using proteins and nucleotides, respectively (
Table S4), revealing that the identities of the nucleic sequences were very high in each gene cluster. In addition,
DzCYP72A6/16/17 were reported to have a high correlation with diosgenin metabolism in
D. zingiberensis.
In order to perform high-throughput function characterization of new CYP450s, the tobacco transient expression system was utilized to explore the diosgenin biosynthesis pathway for the first time in Himalayan paris and fenugreek [
9]. Then, more novel diosgenin-biosynthesis-related CYP450s,
DzCYP90B71 (22R-oxidation),
DzCYP90G6 (C16-oxidation) and
DzCYP94N8/
DzCYP94D143 (C26-oxidation), in
D. zingiberensis were identified and transferred into the cholesterol-producing yeast (
Saccharomyces cerevisiae) strain RH6829 to produce diosgenin successfully [
10]. Based on those studies, a high-throughput function characterization platform was built in tobacco harboring a combination of
DzCYP90B71,
DzCYP90G6 and DzCYP94N8. Transgenic tobacco containing
DzCYP90B71 and
DzCYP90G6 can provide enough substrate for the C26-hydroxylase to biosynthesize diosgenin, and it was exploited as a chassis to test the potential C26-oxidation catalyzing activity of the
DzCYP72A candidates. Meanwhile,
DzCYP94N8 was used as the positive control. Using this strategy, a novel
DzCYP72A12-4 gene encoding C26-hydroxylase was identified from
D. zingiberensis. Diosgenin was detected in the transgenic tobacco harboring
DzCYP90B71/DzCYP90G6/DzCYP72A12-4 genes and not in the transgenic plants harboring
DzCYP90B71/DzCYP90G6 or the transgenic plants harboring
DzCYP72A12-4, suggesting that
DzCYP72A12-4 encodes a C26-hydroxylase involved in diosgenin biosynthesis.
Drought resistance can be understood as the ability of plants to sense the water-deficiency signal and initiate coping strategies, which are very complex traits that show up as diverse indicators for assessing improved drought resistance. The germination rate (GR) is considered as a common index of seed vigor. Seed germination usually needs sufficient water. Drought stress can result in decreased vitality and increased harmful substances, such as malondialdehyde (MDA). MDA is the final decomposition product of membrane lipid peroxidation, and the content of MDA can reflect the degree of stress injury suffered by plants. The transgenic tobacco plants (OE3) had a higher germination rate than the controls (NC1, NC2 and WT). Meanwhile, the MDA concentration in 16-day-old OE3 seedlings was lower than that in the controls. The emerging diosgenin could be the main cause of this phenomenon. A previous report showed that diosgenin has the ability to reduce oxidative stress damage [
5]. The reconstitution of diosgenin biosynthesis in tobacco might lead to a promotion in drought adaptability. In addition, the expression level of
DzCYP72A12-4 was upregulated after treatments of PEG6000 (15%) in
D. zingiberensis, further proving the relationship between
DzCYP72A12-4 and drought stress. However, the specific mechanism needs to be illustrated through further research.
4. Materials and Methods
4.1. Plant Materials and Growth Conditions
D. zingiberensis plants were collected from Ankang of Shaanxi province in China. Fresh rhizomes were transplanted into a greenhouse with strict environmental control (25 ± 2 °C and 16 h light/day). New plants sprouted and were propagated to produce seeds. Samples pending determination were freeze-dried to constant weight at −80 °C and ground into powder with a tissuelyser.
Tobacco (Nicotiana benthamiana) was used for the transgenic assays. Mature tobacco seeds were surface sterilized for 60 s with 75% (v/v) ethanol and rinsed with sterile water three times. Sterilized seeds were sown on 1/2 Murashige and Skoog (MS) medium (Phyto Technology Laboratories, Lenexa, KS, USA) containing 3% sucrose, and 0.75% agar, adjusted with 1 M KOH to pH 5.8. After two days of vernalizing at 4 °C, the seeds were germinated under the 16 h light/8 h dark photoperiod at 23 °C in a growth chamber. Materials of tissue culture were cultured in other chambers with the same condition.
4.2. Standards and Chemical Reagents
Chemical reagents of HPLC grade including methanol, ethanol, chloroform, acetonitrile and n-hexane were purchased from Thermo Fisher Scientific Inc. (Waltham, MA, USA). Standard chemicals such as dioscin (98%) and diosgenin (98%) were purchased from Shanghai Yuanye Bio-Technology Co., Ltd. (Shanghai, China).
4.3. Bioinformatics Analysis of CYP72A Genes
The basic local alignment search tool (BLAST) was used to search candidate CYP450 genes in the
D. zingiberensis genome (NCBI project number: PRJNA716093) against functional characterized CYP450s (E-value 1 × 10
−10, identity 50%). The amino acid sequence was analyzed using the online tool ExPASy (
https://web.expasy.org/protparam/ accessed on 18 November 2022) [
34]. Multiple sequence alignment was performed using the VectorNTI Advance software (version 11.5, Thermo Fisher Scientific, Frederick, MD, USA). The phylogenetic tree was constructed using MEGA11 by the neighbor-joining method [
35]. Pairwise alignment was performed using the R Biostrings package [
36]. The conserved motifs of
DzCYP72A12-4 proteins were analyzed using the following online tools:
4.4. Vector Construction and Agrobacterium Tumefaciens-Mediated Transformation
For stable expression in tobacco, the coding sequences of the selected genes were amplified by PCR from
D. zingiberensis cDNA using the primers shown in
Table S5. For single gene expression, the purified PCR products were inserted into pUC57 plasmids on which the gene expression cassettes (CAMV 35S promoter-gene-Nos terminator or UBI1 promoter-gene-Nos terminator) were assembled using Gibson Assembly
® Master Mix—Assembly (E2611) (
NEW ENGLAND Biolabs, Beijing, China). Two restriction endonuclease recognition sites were added flanking the cassette simultaneously, facilitating multi-cassette assembly. Single or multiple cassettes were digested by the corresponding restriction endonuclease and inserted into a modified pCAMBIA plasmid via the same restriction endonuclease digested cohesive ends. In summary, four expression vectors were constructed for the plant transformation listed in
Table S6.
All constructs were verified by sequencing and transformed into
A. tumefaciens strain GV3101 by the electroporation method (Micro pulser, Bio-Rad, Los Angeles, CA, USA). The colonies were validated by PCR analysis.
A. tumefaciens-mediated transformation for stable expression in tobacco was performed using a leaf-disc infection method. Aseptic seedlings of wild type tobacco cultured for about 30 days were used. The edge of the leaf was cut off, laid on solid MS medium and cultured at 25 °C for 2 days in the dark. After that, the leaves were immersed in the transformed A. tumefaciens prepared in advance for 10 min. The leaves with A. tumefaciens were co-cultivated in the dark at 25 °C continuously for two days. After the co-cultivation, the leaves were transferred to new solid MS medium and cultured at 25 °C in the 16 h light/8 h dark photoperiod. Calluses growing on the edge of the leaves were isolated and cultured on MS medium containing hygromycin (50 mg/L). The clump buds were induced after 20 days of culture. PCR was used to select the positive transgenic plants, which were cultured for further study. Transient expression in tobacco was conducted as previously described [
9], with some modifications.
4.5. RNA Extraction and Gene Expression Analysis
Total plant RNA was isolated using the RNeasy plant mini kit (Qiagen, Valencia, CA, USA) following the manufacturer’s instruction with modifications. Potentially contaminated DNA was eliminated by treatment with DNase I (Takara, Peking, China). The RNA quality and concentration were determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, NC, USA). First-strand cDNA was synthesized using 1 μg of total RNA and the HiScript III 1st Strand cDNA Synthesis Kit (Vazyme, Nanjing, China) following the manufacturer’s instructions. Quantitative real-time polymerase chain reaction (qRT-PCR) was conducted following the manufacturer’s instructions of the SYBR Green PCR Master Mix (Applied Biosystems, Warrington, UK) using the CFX96 Real-Time PCR Detection System (Bio-Rad, Los Angeles, CA, USA). The primers used in this study are listed in
Table S5. The total RNA from young leaves, mature leaves, stems, flowers and rhizomes of
D. zingiberensis were used to examine the tissue expression profiles of
DzCYP72A12-4. To explore the effects of drought stress, young leaves were respectively harvested at 4 h, 8 h, 12 h, 24 h, 48 h, 72 h, 96 h and 120 h after treatment with 15% PEG 6000 for RNA extraction. The leaves of the normally watered plants were used as the control. The expression pattern of the
DzCYP72A12-4 gene under drought stress was analyzed.
DzGAPDH was used as an internal control. The qRT-PCR was set up with the following thermal cycler conditions: initial denaturation preheating at 95 °C for 3 min, followed by 40 cycles of denaturation at 95 °C for 10 s and annealing/extension at 57 °C for 30 s. Biological and technical replicates were conducted three times to ensure the accuracy of the results. The 2
−ΔΔCT method was used for the analysis of the relative gene expression compared to the control [
40].
4.6. Analysis of Diosgenin
Analysis of diosgenin in transgenic tobaccos and
D. zingiberensis was performed following the method reported previously with modifications [
10,
11,
25]. Briefly, 50 mg of the freeze-dried leaf powder was dissolved in 1 mL solvent (2:1 chloroform/methanol) and sonicated for 30 min followed by centrifugation at 12,000×
g for 10 min. The above treatment of each sample was repeated three times. The supernatant was merged each time and filtered by microfiltration (0.22 m) pending determination. The content of diosgenin was determined using the ultra-performance liquid chromatography (Vanquish UPLC 689 system, Thermo Fisher Scientific, Bremen, Germany) tandem MS (Q Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer, Thermo Fisher Scientific, Bremen, Germany). A Hypersil GOLD C18 column 690 (2.1 mm × 100 mm, 3 m, Thermo Fisher Scientific, Bremen, Germany) was equipped on the UPLC system, and the temperature of the chromatography column was set to 40 °C. The flow rate of the mobile phases was 0.3 mL/min.
To determine the diosgenin content, the UPLC–MS/MS system described above was utilized. The mobile phases were eluent A (water with 5 mM ammonium acetate) and eluent B (methanol with 5 mM ammonium acetate). The gradient program was as follows: from 0 to 3 min, isocratic 65% B; from 3 to 25 min, linear gradient of 65–99% B; from 25 to 33 min, isocratic 99% B; from 33 to 33.5 min, 99–65%; and from 33.5 min to 35 min, isocratic 65% B. The Q Exactive MS was operated in positive target-SIM mode with the following parameters: probe heater temp, 350 °C; ion source, HESI-II; diosgenin precursor ion selection at 415.32 m/z in positive ion mode. The raw data were processed by XCalibur (Thermo Fisher Scientific, Bremen, Germany).
4.7. Drought Stress Treatment in Transgenic Tobacco
The seeds were sterilized and sown on solid ½MS media. After two days of vernalization at 4 °C, the seeds were further grown at 23 °C under long-day photoperiod conditions of 16 h light/8 h dark. After germination, the seedlings of the same size were transplanted into new MS medium and cultivated for 20 days. All seedlings were grown in the same environment, and tobacco plants of wild type (WT) were used as a control. Then, the 28-day-old transgenic and WT tobacco seedlings were transplanted into new trays filled with sterilized vermiculite and watered with freshly prepared Hoagland solution.
The tobacco seed germination test was performed according to the International Rules for Seed Testing. The seeds were laid in petri dishes (9 cm in diameter) underlaid with 2 layers of filter paper. Then, 2 mL 15%PEG-6000 was added to the test groups, while 2 mL distilled water was added to the control group and replenished every day. The growing conditions were controlled as described above. Each petri dish contained 100 seeds, and 3 replicates were set up. The number of germinated seeds was recorded every day, and the germination rate was calculated on the 16th day to determine the physiological indexes.
The germination rate (GR) was the number of germinated seeds on day 16. The concentration of malondialdehyde (MDA) in the fresh seedlings was measured after 16 days of growth. The MDA content was determined by the Plant MDA Assay Kit with TBA (Shanghai Yuanye Bio-Technology Co., Ltd., Shanghai, China).
4.8. Statistical Analyses
The qualitative information regarding diosgenin was collected from mass spectra. All experiments were performed for three independent biological repeats, and at least three technical repeats were set each time. The data were analyzed using the Student’s t test. Differences were considered statistically significant when p < 0.05 and extremely significant when p < 0.01, marked with * and **, respectively.
5. Conclusions
In the present study, a novel DzCYP72A12-4 gene was screened through comprehensive methods. Diosgenin was detected in the transgenic tobacco plants (OE3) by UPLC–MS/MSv and not in the tobacco plants of the control groups lacking at least one of DzCYP90B71, DzCYP90G6 or DzCYP72A12-4. In addition, the diosgenin-producing tobacco plants showed relatively high drought adaptability. All these findings could strengthen the understanding of the DzCYP72A gene subfamily in diosgenin biosynthesis and lay the foundation for the further exploration of drought resistance in D. zingiberensis.
Considering the large family of DzCYP72A, further study should be conducted to confirm whether other DzCYP72As are involved in diosgenin biosynthesis. In addition, the platform established in this study would be very useful to carry out further identification of DzCYP72As as well as other genes of interest. Meanwhile, since steroid compounds have skeletons of a similar structure, it is worth exploring the structural similarities between key enzymes catalyzing similar substrates. In addition, the relationship between DzCYP72A and drought stress deserves more attention. At present, although the diosgenin biosynthesis from cholesterol has been preliminarily clarified, more research is needed to reveal the full picture of the diosgenin biosynthesis pathway. Specifically, efforts should be made with respect to the following three aspects: (1) finding more key enzymes involved in diosgenin biosynthesis, (2) revealing the possible regulatory mechanism of diosgenin synthesis, and (3) analyzing the interaction effects of different bioactive compounds.