Next Article in Journal
Role of RBMS3 Novel Potential Regulator of the EMT Phenomenon in Physiological and Pathological Processes
Next Article in Special Issue
Bioinformatic Assessment of Factors Affecting the Correlation between Protein Abundance and Elongation Efficiency in Prokaryotes
Previous Article in Journal
IDH Inhibitors and Immunotherapy for Biliary Tract Cancer: A Marriage of Convenience?
Previous Article in Special Issue
Web-MCOT Server for Motif Co-Occurrence Search in ChIP-Seq Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evolutionary Invariant of the Structure of DNA Double Helix in RNAP II Core Promoters

by
Anastasia V. Melikhova
,
Anastasia A. Anashkina
and
Irina A. Il’icheva
*
V.A. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2022, 23(18), 10873; https://doi.org/10.3390/ijms231810873
Submission received: 25 July 2022 / Revised: 7 September 2022 / Accepted: 13 September 2022 / Published: 17 September 2022
(This article belongs to the Special Issue Bioinformatics of Gene Regulations and Structure - 2022)

Abstract

:
Eukaryotic and archaeal RNA polymerase II (POL II) machinery is highly conserved, regardless of the extreme changes in promoter sequences in different organisms. The goal of our work is to find the cause of this conservatism. The representative sets of aligned promoter sequences of fifteen organisms belonging to different evolutional stages were studied. Their textual profiles, as well as profiles of the indexes that characterize the secondary structure and the mechanical and physicochemical properties, were analyzed. The evolutionarily stable, extremely heterogeneous special secondary structure of POL II core promoters was revealed, which includes two singular regions—hexanucleotide “INR” around TSS and octanucleotide “TATA element” of about −28 bp upstream. Such structures may have developed at some stage of evolution. It turned out to be so well matched for the pre-initiation complex formation and the subsequent initiation of transcription for POL II machinery that in the course of evolution there were selected only those nucleotide sequences that were able to reproduce these structural properties. The individual features of specific sequences representing the singular region of the promoter of each gene can affect the kinetics of DNA-protein complex formation and facilitate strand separation in double-stranded DNA at the TSS position.

1. Introduction

The heterogeneity of the three-dimensional structure of the double-stranded DNA plays an important role in the regulation of genetic processes. This heterogeneity is modulated by the nucleotide sequence. Proteins may recognize the shape of DNA (“indirect readout”) or the unique chemical signatures of the DNA bases (“direct readout”) [1]. As a rule, DNA-binding proteins combine both readout mechanisms to achieve DNA-binding specificity [2].
RNA polymerase II (Pol II) in eukaryotes is responsible for the transcription of messenger RNA and some non-protein-coding small nuclear RNAs. Pol II core promoters are fragments of genomic DNA, about 100 bp long, surrounding the transcription start site (TSS). Transcription initiation occurs when TATA-binding protein (TBP) binds to the eight base-pair TATA elements of Pol II core promoter, coordinating accretion of class II initiation factors and Pol II into a functional preinitiation complex (PIC). This process is a slow stage of transcription; it leads to the formation of a long-lived protein-DNA complex [3].
The mechanical, thermodynamic, and structural properties of Pol II promoter regions have long attracted the attention of researchers [4,5,6,7,8,9]. Regardless of the length of the analyzed promoter fragment and analysis methods, all studies come to the same conclusion. In the vicinity of the TSS, all structural properties of DNA noticeably deviate from the average level, and core promoter regions are exceptionally heterogeneous.
The nucleotide sequences of the core promoters are usually represented by the DNA coding strand (namely, the strand with the 5′→3′ vector directed to the TSS from the upstream region; hereinafter, we will call it the upper strand). The TSS position is taken as coordinates −1, +1 (there is no nucleotide with zero coordinates). In all organisms, positions −2, +4 are occupied by the initiator element (INR). At this region, the complementary strands of the double helix diverge, and Pol II recognizes the template strand. TATA element in the promoters of most organisms is located at a distance of about −28 bp from the TSS.
The common regularities of the core promoter architecture in each species may be revealed after the superposition of signals from a huge amount of species’ promoter sequences properly aligned at the TSS. The well-annotated database of promoter sequences is an essential basis for identifying general patterns in the promoter structure. To analyze structural features of DNA that determine RNA polymerase II core promoter [10], we previously used the EPD New database [11]. The profiles of the averaged textual, structural, mechanical, and physicochemical characteristics in each position of the sets of 60 bp core promoter sequences (positions from −50 to +10) in the eight organisms available at that time from the EPD New database [11] (H. sapiens, M. musculus, D. melanogaster, D. rerio, C. elegans, A. thaliana, S. cerevisiae, S. pombe), were constructed. The analysis of these profiles allowed us to reveal the common scheme of the animal and plant core promoter architecture. The promoters of the unicellular fungus S. pombe were found to correspond to the same structural scheme, but the structure of the core promoter of another unicellular fungus, S. cerevisiae, turned out to be different [10].
To date, the number of organisms available for analysis in the EPD New database [12] has increased markedly. In addition to representatives of the Metazoa (vertebrates and invertebrates), plants, and unicellular fungi (S. cerevisiae, S. pombe), a representative of the Protozoa appeared, namely the parasite P. falciparum, whose genome is 80% AT-pairs. Moreover, the total number of promoters in the samples of those organisms that were previously represented in this database also increased noticeably. Therefore, it became possible to check the generality of the conclusions obtained by us earlier and to analyze the degree of influence of the percentage of AT pairs in the genomes of different organisms on the structural features of their promoters.

2. Results

The sets of promoters of fifteen evolutionarily different organisms were retrieved from the EPD New section of the Eukaryotic Promoter Database (EPD) (http://epd.vital-it.ch (accessed on 24 July 2022)) [12]. This resource allows access to the collection of databases of experimentally validated promoters of several model organisms, for which TSS mapping was the result of high-throughput experiments such as CAGE and Oligo-capping, resulting in high precision and high coverage. We used sets of ten animal promoters, vertebrates, invertebrates, and insects, namely H. sapiens, M. mulatta, M. musculus, R. norvegicus, C. familiaris, G. gallus, D. rerio, C. elegans, D. melanogaster, and A. mellifera; two plant promoters, namely A. thaliana and Z. mays; two unicellular fungi promoters, namely S. cerevisae and S. pombe; and protozoan promoters, namely P. falciparum. The profiles of the averaged textual, structural, mechanical, and physicochemical properties of 80 bp core promoter sequences (positions from −50 to +30) were constructed.

2.1. Comparative Statistical Characteristics of the Nucleotide Sequences in the Core Promoters of Metazoans, Plants, Unicellular Fungi, and Protozoan

First, we compared the percentages of the A, T, G, and C nucleotides in core promoter sequences in different organisms. For simplicity, according to IUPAC nomenclature, we will use the terms W (for nucleotides A and T) and S (for nucleotides G and C). Frequencies of mononucleotides occurrence at each position along the coding strand are shown in Figure 1A–D.
The frequencies of occurrence of dinucleotides in the core promoter sequences of all fifteen species are shown in Figure S1. The frequencies of occurrence of tetranucleotides TATA and AAAA in the core promoter sequences of all fifteen species are shown in Figure S2.
The logo-representation of the promoter sequences with an information content of 1.0 bits is shown in Figure 2, while that with an information content of 0.4 bits is shown in Figure S3. We present two options for scaling the logo image to best reveal the features of different fragments of core promoters because the frequencies of occurrence of nucleotides differ sharply in different regions. Logos were made at http://weblogo.threeplusone.com (accessed on 24 July 2022).
For all of the considered mammalian promoters (H. sapiens, M. mulatta, M. musculus, R. norvegicus, C. familiaris), as well as for promoters of G. gallus, the percentage of S exceeds that of W in all positions, for the exception of the TATA element, where the percentages of W are almost equal to that of S (Figure 1A). On the other hand, the promoters of A. mellifera, as well as promoters of A. thaliana, unicellular fungi S. cerevisae and S. pombe, and protozoan P. falciparum have the highest percentage of W nucleotides at all positions (Figure 1B–D). The promoters of another insect, D. melanogaster, as well as promoters of C. elegans and D. rerio, are composed of a roughly equal amount of W and S nucleotides, while the TATA element is also enriched by W nucleotides. The promoters of another plant, Z. mays, have a noticeable asymmetry in the distribution of G and C nucleotides between the coding and non-coding strands. In the coding strand, the content of cytidines is ~15% higher than the content of guanines. This determines both the highest frequency of occurrence of the CC dinucleotide before and after TSS (Figure S1) and the extremely low frequencies of the occurrence of TATA and AAAA tetranucleotides (Figure S2). Another distinguishing feature of Z. mays promoters is the presence of a well-defined motif in the vicinity of the +25 position in Figure 1C, Figure 2 and Figure S3. This was also noted earlier [13], where the cap analysis of the gene expression (CAGE) was used to identify genome-wide TSSs in root and stem tissues of two maize (Z. mays) inbred lines (B73 and Mo17). The authors hypothesized that the region around +25 harbors an element other than the GC-rich motif that correlates with the presence of TATA consensus. The profiles of all of the species except for S. cerevisiae have two regions where the frequencies of dinucleotides occurrence deviate from the mean values (Figure S1). These two regions are located at the TATA-box position and at the region around TSS.
Logo representation (Figure 2) provides detailed information about the characteristic features of the TATA elements and the INR elements in the promoters of each organism. In the position of the TATA element of all mammals, as well as of G. gallus, all four nucleotides (G, C, A, and T) occur with equal frequency. In other considered organisms (with the exception of S. cerevisae), the frequency of nucleotides A and T in the TATA element are higher than that of G and C. However, the degree of the excess differs quite noticeably between organisms in this group. In both insects (D. melanogaster and A. mellifera), it is minimal, and it is most pronounced in D. rerio, A. thaliana, and S. pombe. The logo image of P. falciparum differs sharply from all other organisms since the frequencies of the occurrence of the A and T nucleotides are significantly higher.
The occurrences of various octanucleotides in the position of the TATA element of all organisms under consideration are shown in Table 1, while Table S1 also includes the absolute number of each of the octanucleotides in that position for every organism and also presents the frequencies of the occurrence of various octanucleotides in the positions −10–−3 in the promoters of S. cerevisae.
We have chosen the TATA-box position in the promoters of each organism based on the positions of the minimum in the profiles of the physical parameter “Stacking energy” and of the maximum in the profiles of the physical parameter “Mobility to bend towards major groove”, which we present in Figure 3, Figure 4, Figure 5 and Figure 6 (lines a,f). A perceptible shift in the position of the TATA box for A. thaliana promoters coincides with the data obtained earlier [14].
From Table 1, one can see that the frequencies of occurrence of different octanucleotides presenting the TATA box are rather close. The leading position in this list for all of the analyzed mammalians, as well as in D. melanogaster and C. elegans, is occupied by the TATAAAAG sequence; however, other octanucleotides occur with a very close frequency. So, the term consensus only conditionally reflects the real situation. Analysis of the TBP-TATA box minor groove interface based on the crystallographic results of their complex structures obtained with refinement better than 2 Å [15] have shown that van der Waals interactions between nonpolar atoms and between nonpolar and polar atoms are factors for complex formation. Moreover, from the kinetic probing, it was found that TBP has less than a 103-fold preference for binding TATAWAAR sequence compared to binding of nonspecific yeast genomic DNA [16]. These results allow us to suggest that hydrogen bonding does not play any role in TBP–TATA box complex formation. Therefore, those octanucleotides that are selected on the basis of low energy costs for bending towards a wide groove can be TATA elements.
In contrast, the INR element of all of the organisms is highly selective for the nucleotide sequence. The details can be seen in the logo representation (Figure 2 and Figure S3) and Table 2, Table 3 and Table 4.
From Table 2, one can see that all of the organisms show a preference for PyPu in positions −1 and +1. However, it should be noted that the occurrence of PuPu and PyPy in mammals, G. gallus, D. rerio, as well as in the plant Z. mays is also high enough, noticeably higher than in both insects (D. melanogaster and A. mellifera), in the plant A. thaliana, in the invertebrate C. elegans, and in unicellular organisms (S. cerevisae, S. pombe, and P. falciparum). We find it interesting that the promoters of pure lines of plant Z. mays are somewhat different from the promoters of wild-type A. thaliana.
All of the organisms, with the exception of S. cerevisae, S. pombe, and P. falciparum, display CA in this position as preferable. In both unicellular fungi (S. cerevisae and S. pombe), dinucleotides CA and TA are presented in equal amounts in the positions of −1 and +1, while P. falciparum, as expected, prefers dinucleotide TA.
What properties of PyPu dinucleotides and especially CA dinucleotide determine their preference in position (−1, +1)? This position is responsible for the double helix divergence, so the dinucleotide step that it occupies must have unique properties. It is known that the deformability of dinucleotides decreases in the order of PyPu > PuPu > PuPy. It was shown that with the help of a spin probe while studying the effects of nucleotide sequence on DNA duplex dynamics [17]. The special mobility of PyPu steps is explained by the greater intensity of the S↔N dynamics in furanose cycles in 5′-terminal pyrimidines compared to 5′-terminal purines, and after 5′Cyt, it reaches its maximum [18]. The advantage of the CpA step over CpG in positions −1 and +1 can be explained by the presence of only two hydrogen bonds, which must be broken at the initial stage of chain divergence. This explanation is confirmed by reactivity with the conformation-sensitive reagent chloroacetaldehyde, which reacts with unpaired adenines and cytosines. This reactivity was confined strictly to adenosine in the d(CA/TG) repeat [19]. In this regard, it is interesting to note that during the formation of nucleosomes, two conformational flexible pyrimidine–purine steps can act as strong positioning signals. These are the pyrimidine–purine step CA/TG, which is unique to the 10 possible dinucleotides and is located preferentially at both inward- and outward-facing minor grooves but not in between, and TA, which is located at inward-facing minor grooves [20].
The occurrence of tetranucleotides in positions −2 and +2, specific for each of the 15 species, is shown in Table 4. It can be assumed that the greater the percentage of less deformable dinucleotides (PuPu or PuPy) in the TSS position of promoter samples of a particular organism, the more variable the strength of different promoters in this organism will be.

2.2. Physical and Structural Anisotropy of the Naked DNA in the Core Promoters

The heterogeneity of any DNA fragment is the result of the variation of the physical and structural characteristics of individual base-pair steps. Bending anisotropy, for example, is sequence-dependent and, to a first approximation, reflects both the geometry and stability of the individual base steps [20]. We have built profiles of the base step characteristics for the sets of the core promoters of all 15 organisms using indexes of numerical parameterization for the ten double-stranded duplexes, which are collected in the database DiProDB http://diprodb.fli-leibniz.de (accessed on 24 July 2022) [21]. Among the parameters of a large number of different properties of the ten double-stranded duplexes, which are held in the database, we chose six parameters most suitable for evaluating the anisotropy of nucleotide sequences for DNA axis bending. They are the stacking energy, Roll and Slide, the stiffness of the structure to Roll alteration and to Slide alteration, as well as the stiffness of the structure to bend towards the major groove, which includes alteration to all of the base-pair steps parameters. The database contains several versions of the parameters of the same name, and earlier [10], we verified that the profiles built from different versions of the parameters are in qualitative agreement with each other. Profiles of physical and structural parameters are presented in Figure 3a–f, Figure 4a–f, Figure 5a–f, Figure 6a–f and Figure 7a–f.
We present the profiles of the variations in the stacking energy (Figure 3a, Figure 4a, Figure 5a, Figure 6a and Figure 7a) and the base-pair step parameters of Roll and Slide (Figure 3b–d, Figure 4b–d, Figure 5b–d, Figure 6b–d and Figure 7b–d) in the parametrization of Perez et al. [22], the profiles of stiffness variation in the DNA double helix to Roll and Slide changes (Figure 3c–e, Figure 4c–e, Figure 5c–e, Figure 6c–e and Figure 7c–e) in the parametrization of Goni et al. [23]. These five parameters describe DNA at the base-pair step resolution. To evaluate the stiffness of the structure to bend towards the major groove, we used the parametrization of Gartenberg and Crothers [24]. Their parameter “Mobility to bend towards major groove” was resolved for all 16 dinucleotides and related to each of the complementary strands. In Figure 3f, Figure 4f, Figure 5f, Figure 6f and Figure 7f, this characteristic is presented for the upper strand (the strand complementary to the template). While Figure 3, Figure 4, Figure 5 and Figure 6 present the profiles of the characteristics of the core promoters for all of the 15 organisms. Figure 7 presents the profiles of the same characteristics of two non-promoter regions in H. sapiens genomic sequences: the regions (−500–−420) and(−300–−220), and the profiles of the 80 bp set of 30,000 computer-simulated random nucleotide sequences. They are presented along with the profiles of the H. sapiens core promoters.
Stacking energy is a part of the enthalpy of DNA formation and defines its stabilizing forces. Its value in the core promoter sequences of all of the mammalians and G. gallus is about −16.5 ± 0.2 Kkal/mol (Figure 3a) and Figure 4a), while in invertebrates and unicellular fungi, the stacking energy is somewhat lower (Figure 4a). In plants, the value of the staking energy is intermediate (Figure 5a). The lowest level of stacking energy is in the promoter sequences of P. falciparum (Figure 6a). It can be assumed that in this Protozoa, this is due to the compensation of low DNA stability in the absence of a third hydrogen bond in AT-rich sequences. A shallow global minimum on the stacking energy profiles in the region around −28 bp–−34 bp relative to TSS (depending on the organism) is present in the profiles of all organisms, with the exception of C. elegans, A. melifera, and S. cerevisiae. In P. falciparum, its depth is the smallest. The good base stacking in the TATA box region is the property of the majority of the specially selected sequences of naked DNA. This is confirmed by the absence of local minima in the stacking energy profiles of the non-promoter regions, as well as in the profiles of the random sequences (Figure 7a). It is interesting that the average level of the stacking energy in the non-promoter regions of the human genome is practically the same as in the promoter regions, while in the set of random sequences, it is somewhat lower. We assume that this is due to the percentage of the AT pairs in the sequences: in the human genome, the percentage of AT pairs is less than the percentage of GC, while in the random sequences, the AT and GC content is approximately the same.
Base-pair step parameter Roll defines an angle between the average planes of two neighboring base pairs. The positive value of this angle corresponds to its opening towards the minor groove. Among the three rotational parameters (helical Twist, Roll, and Tilt), Roll is the most important for understanding the bending of DNA [23,25].
Base-pair step parameter Slide defines the mutual displacement of the neighboring base pairs in the direction perpendicular to the minor and major grooves. The Positive Slide values are a distinguishing feature of B-DNA, while in the A-form of DNA, the values of the Slide are always negative. Thus, the sign of the Slide is an important indicator that allows us to discriminate between the B- and A-DNA forms [26,27].
The values of these two parameters show that the structure of the naked DNA double helix in the promoter regions of mammals, invertebrates, plants, and unicellular fungi (with the exception of their INR element) belongs to the B family. In fact, the structural parameters of Roll and Slide in the core promoter regions of the mammals and G. gallus vary between 1.35–1.7° and 0.25–0.48 Å, respectively. In the core promoters of A. thaliana, the values of Roll and Slide are somewhat lower than in mammals, especially Slide (~0.2 Ǻ), but in the promoters of another plant, Z. may, the values of these parameters are as in mammals. In the promoter sequences of unicellular fungi, the values of Roll and Slide are also close to mammals. The exception is P. falciparum. In this Protozoa, double-stranded DNA, at least in the core promoter region, which we have analyzed, may represent the intermediate form with a negative value of Slide, which corresponds to some structure on the B↔A transition path [26,28].
Our profiles show that the values of Roll and Slide, as well as their stiffness in the TATA-box position of all the species (except for S. cerevisiae), differ from the average level. The extent of the difference depends on the organism. It is most pronounced in plants, S. pombe, and most mammals. The invertebrates present maximum diversity in the TATA-box position. For example, the profiles of Slide and its stiffness of C. elegans do not have peculiarities in the TATA-box position, but Roll and its stiffness have. It is important to note that while the values of both structural parameters — Roll and Slide — are somewhat less than the average level, the rigidity of the Roll drops noticeably, while the rigidity of the Slide either remains at an average level or increases. Hence, it can be concluded that binding to TBP is accompanied by an increase in the opening of the angle between adjacent base pairs towards minor grooves. This is what happens when the helical axis is bent towards the major grooves. The profiles of the parameter “Mobility to bend towards major groove” in the core promoters of all the organisms (Figure 3f, Figure 4f, Figure 5f and Figure 6f), with the exception of S. cerevisiae, clearly reflect this predisposition for octanucleotides in the TATA-box regions. It should be noted that in the core-promoter sequences of A. melifera, the increase in the values of the “Mobility to bend towards the major groove” parameter is noticeably less than in other invertebrates. Moreover, in the profiles of S. cerevisiae, the maximum falls on the position of −8 bp.

2.3. Variations of Ultrasonic Cleavage and DNase I Cleavage Intensities in Core Promoter Sequences

The intensities of the sequence-specific ultrasonic cleavage of the double-stranded DNA provide information on the intensity of the intramolecular conformational movements in every strand [18,29,30], and the DNase I enzymatic cleavage of the double-stranded DNA provide information on the width of their grooves [31,32,33,34]. Therefore, the variation in the local structure in the DNA double helix can also be assessed using the data of these independent new methods.
The relative intensities of the cleavage of the central phosphodiester bond in the 16 dinucleotides and 256 tetranucleotides were determined by multivariate statistical analysis [18]. The experimental details are also given in [29,30]. It was shown that the cleavage rates for all pairs of complementary dinucleotides are significantly different, and the sequence-dependent ultrasonic cleavage rates are consistent with the intensity of N↔S interconversion at the 5′-sugar ring [18]. Therefore, cleavage rates may be useful for characterizing the functional regions of the genome as a measure of local conformational dynamics. We use several indexes for the description of the intensity of ultrasonic cleavage [10]: R is the relative cleavage intensities of the central position of each of the 16 dinucleotides; T is the relative cleavage intensities of the central position of each of the 256 tetranucleotides; S is the combination of indexes R and T (S = T − R). The S index provides information on the effect of the nearest context on the intensity of ultrasonic cleavage in the dinucleotide, i.e., if S < 0, the first and the fourth nucleotides of a tetranucleotide bring down the intensity of the cleavage in the central step; otherwise they increase it.
The cutting rates of bovine pancreatic deoxyribonuclease I (DNase I) vary along a given DNA sequence, indicating that the enzyme recognizes sequence-dependent structural changes in the DNA double-helix. The high-resolution crystal structures of the two DNase I-DNA complexes showed that the enzyme binds tightly in the minor groove and to the sugar–phosphate backbones of both strands, thereby inducing widening in the minor groove and bending towards the major groove [31,32]. The context near the dinucleotide step strongly affects its cleavage efficiency. These can be rationalized by the fact that six base pairs are in contact with the enzyme. The intrinsic rate of the cleavage by DNase I closely tracks the width of the minor groove [33]. We have used the intensity indices of DNase I cleavage at the hexanucleotide level (D), which were obtained in [34].
Figure 8, Figure 9, Figure 10 and Figure 11 show the profiles of the ultrasonic cleavage and DNase I cleavage in H. sapiens, D. melanogaster, Z. mays, and P. falciparum, while Figures S4–S14 show the profiles of the ultrasonic cleavage and DNase I cleavage of M. mulatta, M. musculus, R. norvegicus, C. familiaris, G. gallus, D. rerio, C. elegans, A. melifera, A. thaliana, S. cerevisiae, and S. pombe, respectively.
The profiles of the ultrasonic indexes R, T, and S and the DNase I cleavage index D are depicted in blue for the upper strand and in red for the lower (template) strand.
The lowest value of the ultrasonic cleavage for the H. sapiens core promoters was detected in the region from −32 to −24 bp relative to TSS (Figure 8, indexes R and T). The same region of the promoter has the highest DNase I cleavage (Figure 8, index D). This indicates a decrease in the conformational motion in this region and minor groove widening. The minimum ultrasonic cleavage of the upper (coding strand) falls at position −26, but in the lower (template) strand, at position −29. This means that there is some shift in the intensity of the conformational movement in the complementary strands. The profiles of the differences in the S-indexes between the strands revealed periodic alteration to the conformational motion intensity in the complementary strands until the position of −3 bp. The observed behavior of the core promoter fragment structure is in good agreement with the results of the MD calculations in [35], which confirmed an important role of the indirect readout mechanism in TATA-box recognition, and revealed regular oscillations between several alternate structures in the process of TBP binding.
All of the profiles lose their smoothness around TSS.
The profiles of the ultrasonic cleavage and DNase I cleavage of all of the other mammalian (M. mulatta, M. musculus, R. norvegicus, and C. familiaris), as well as of G. gallus, D. rerio, D. melanogaster, A. thaliana, Z. mays, and S. pombe are presented in Figures S4–S14, respectively.
It is significant that the cleavage intensities of the TATA element, as well as that of Inr, have singular properties in the profiles of all but one species. Ultrasonic cleavage diminished in the TATA element, while DNase I cleavage enhanced. The exception is the TATA region in the core promoters of S. cerevisae. Both methods show a messy pattern of cleavage around the TSS in all species.

3. Discussion

Previously, we found a special structural organization in the nucleotide sequences of double-stranded DNA of minimal core promoters of POL II in metazoans and Schizosaccharomyces pombe. They have singular mechanical and structural properties at the positions of the TATA-box and around TSS [10].
This work was undertaken due to the fact that new data appeared that significantly expanded the range of organisms available for analysis, as well as the significant increase in the number of promoter nucleotide sequences available. As a result, the characteristics of the mechanical and structural properties of the core promoters of POL II in the fifteen organisms from different steps of the evolutionary ladder were obtained. These are the ten representatives of the animal kingdom—mammals, vertebrates, and invertebrates— namely, H. sapiens, M. mulatta, M. musculus, R. norvegicus, C. familiaris, G. gallus, D. rerio, C. elegans, D. melanogaster, and A. mellifera; two representatives of the plant kingdom (A. thaliana and Z. mays), two representatives of the kingdom of unicellular fungi (S. cerevisiae and S. pombe), and a representative of Protozoa (P. falciparum). The AT and GC contents of the genomes of these organisms are different. Some of them have a GC-rich genome, while the genomes of the others contain nearly equivalent amounts of AT and GC, or a slight excess of AT, while 80% of the P. falciparum genomic sequences consist of AT. The aim of the present work was to assess the generality of the characteristics of the core promoters obtained earlier based on the analysis of a much wider range of organisms that differ significantly in evolutionary development and the percentage of AT pairs in the genomic DNA.
As a result, here we have shown that the core promoters of POL II in organisms representing the kingdoms of animals, plants, fungi, and protozoa have a special structural organization. The fragments of 80 bp (positions from −50 to +30), regardless of the AT content in the genomic DNA, have two singular regions: a hexanucleotide with coordinates −2–+4 (INR) surrounding the transcription start site (TSS) and an octanucleotide separated from TSS at a distance of about 28–35 bp (depending on the organism) located upstream. In the TSS position (−1, +1), the occurrence of the PyPu/PyPu steps is exceptionally high, with a noticeable predominance of the d (CA/TG) dinucleotide. The conformational features of this dinucleotide remarkably favor the formation of an open complex (PIC). The TATA-box region of all but one organism is about 28–35 bp upstream and has unique mechanical and structural properties. Its mobility to bend towards the major groove is increased, and the stacking energy is reduced; the minor groove expands significantly, and the conformational dynamics are reduced. These local properties of the TATA region contribute to its indirect readout by TBP and the subsequent PIC formation.
It is important that the profiles of the control fragments of the same length, taken from the human genome in the vicinity of −300 and −500, as well as from a sample of 30,000 random sequences, do not reveal any structural organization.
However, it should be noted that there is no TATA-element in the position around −28 bp in the promoters of S. cerevisiae. However, the structural features that resemble the TATA box are found in the profiles of S. cerevisiae at positions −3–−10. We also reveal three organisms (C. elegans, A. melifera, and P. falciparum), where the TATA-element in the position around −28 bp is present, but some of its features are less pronounced. Let us consider in more detail the features of the TATA element in these organisms.
C. elegans does not have any peculiarities in the TATA-box position in the profiles of Slide and Slide stiffness, while in the profiles of Roll and Roll stiffness, it has. The magnitude of the maximum in the profile of the parameter “Mobility to bend towards the major groove” is relatively lower than in other organisms, and the profiles of ultrasonic cleavage and DNase I cleavage in the TATA region have no peculiarities until TSS. We suppose that these features are the result of the fact that not TBP but TBP-like factor CeTLF is used to activate Pol II in C. elegans [36,37]. Therefore, the PIC assembly machinery may have its own characteristics.
The profiles of the intensity of the ultrasound cleavage and DNase I cleavage of A. melifera do not have any features in the area of the TATA element, and the parameter “Mobility to bend towards the major groove” is noticeably less pronounced than in the profiles of the other invertebrates. A. melifera is an insect that is characterized by complex social behavior. Its transcription is still studied insufficiently, and there are little data for understanding the details of this process [38].
The extremely high TA content of the P. falciparum genomic sequence (about 80%) does not allow the formation of a completely autonomous structure of the Pol II core promoter, which would not require additional control. In P. falciparum, both ultrasonic and DNase I cleavage virtually does not change throughout the entire region upstream to TSS. However, in Figure 6f we saw a faintly pronounced wide maximum in the profile of P. falciparum “Mobility to bend towards major groove”. It seems that this is a marker for TBP binding, but it is too weak. Apparently, additional mechanisms are needed to realize gene expression and identify the TATA element in the promoter of P. falciparum. The role of G-quadruplexes in gene expression is widely discussed [39]. In addition, the presence of G-quadruplex-forming DNA motifs in the P. falciparum genome was shown [40]. This is all the more surprising given that 80% of its genome consists of AT pairs. However, it is obvious that the P. falciparum genome must contain some additional mechanisms to facilitate the recognition of the TATA element.
Let us try to figure out how much the deviations in the profiles of these three organisms can fundamentally change the idea of an evolutionarily stable structural organization of RNA polymerase II promoters. Despite the absence of some structural features in the region of the TATA element in these three organisms, one of its characteristics is present in all organisms without exception. This characteristic is “Mobility to bend towards the major groove”. It reaches its maximum in the TATA region (Figure 4f), and the presence of the motifs in the logo representations (Figure 2 and Figure S3) of C. elegans and A. melifera are evident. Thus, C. elegans, A. melifera, and P. falciparum still have a marker of the TATA element. Note that the messy pattern of cleavage around the TSS is present in all organisms.
The only organism whose promoter sequences do not have the structural markers of the TATA element at a position around −28 bp upstream of the TSS is S. cerevisiae. However, we registered the maximum in the profiles of the parameter “Mobility to bend towards major groove” at the position of −8 bp. Previously we have already obtained this result when processing a smaller sample of its promoters [10]. The peculiarity of S. cerevisiae transcription machinery may be due to the peculiarities of the functioning of Pol II in this organism, which was discovered when compared with S. pombe transcription machinery [41]. The differences in the core promoters’ structural organization of two yeasts may be associated with an evolutionary distance between S. pombe and S. cerevisiae. Really, these organisms diverged in evolution about 500 million years ago [42]. The features of Pol II functioning during transcription in S. cerevisiae have recently been studied in detail [43].

4. Materials and Methods

We analyzed the sets of promoters of fifteen evolutionarily different organisms that were retrieved from the EPD New section of the Eukaryotic Promoter Database (EPD) (http://epd.vital-it.ch (accessed on 24 July 2022) [12]. We used sets of the animal promoters (29,597 promoters for H. sapiens, 9556 promoters for M. mulatta, 25,111 promoters for M. musculus, 12,569 promoters for R. norvegicus, 6126 promoters for G. gallus, 7352 promoters for C. familiaris, 16,972 promoters for D. melanogaster, 6461 promoters for A. mellifera, 10,726 promoters for D. rerio, 7120 promoters for C. elegans); plant promoters (22,702 promoters for A. thaliana, 17,059 promoters for Z. mays); unicellular fungi promoters (5117 promoters for S. cerevisae and 4802 promoters for S. pombe); and protozoan promoters (5597 promoters for P. falciparum). We checked that all of these sequences are 80 nucleotides long and strictly defined. The profiles of the averaged textual, structural, mechanical, and the physicochemical properties of 80 bp core promoter sequences (positions from −50 to +30) were constructed.
For analysis of the structural, mechanical, and physicochemical properties of the core promoter sequences, we use indexes of numerical parameterization for the ten double-stranded duplexes, which were collected from the database DiProDB http://diprodb.fli-leibniz.de (accessed on 24 July 2022) [21]. For the profile construction of the variations in the stacking energy and the base-pair step parameters, Roll and Slide, we used the parametrization of Perez et al. [22], for the profile construction of stiffness variation in the DNA double helix to Roll and Slide changes, we used the parametrization of Goni et al. [23], and for the profile construction of stiffness of the structure to bend towards major groove we evaluated using the parametrization of Gartenberg and Crothers [24].

Profiles Construction

The X-axes of the profiles define the position relative to the TSS, which was denoted as +1 bp, while negative and positive numbers denote the upstream and downstream regions. The Y-axes present the mean value of a chosen characteristic from the corresponding databases. For textual characteristics, defined at the mononucleotide level, for every 80 positions on the X-axis (numbered: −50, −49, … −1, +1, +2, … +30), the amounts of each type of nucleotides (A, C, G, T) in all core promoters from a set of chosen species are summed up, and the resulting sum is divided by the number of promoters. For the physical or structural characteristics defined at the base-pair step level, or for the ultrasound cleavage rates at the dinucleotide level, for every 79 positions on the X-axis (numbered: −49, −48, … −1, +1, +2, … +30), the values of these characteristics are summed up (for dinucleotides at the positions [(−50, −49); (−49, −48); … (−1, +1); … (+29, +30)], taken from DiProDB (physical and structural characteristics) or from the work [18] (ultrasound cleavage rates at the dinucleotide level) and the resulting sum is divided by the number of promoters. For ultrasound cleavage rates at the tetranucleotide level, for every 77 positions on the X-axis (numbered: −48, −47, … −1, +1, +2,… +29), the values of these characteristics for tetranucleotides are summed up (for tetranucleotides at the positions (−50, −49, −48, −47); (−49, −48, −47, −46); …(−2, −1, +1, +2); …(+27, +28, +29, +30)), taken from the Supporting Material to the work [18] and the resulting sum is divided by the number of promoters. For the DNAse cleavage rates at hexanucleotide level, for every 75 positions on the X-axis (numbered: −47, −46, …. −1, +1, +2, … +77, +78), the values of these characteristics are summed up (for hexanucleotides at the positions (−50, −49, −48, −47, −46, −45); ( −49, −48, −47, −46, −45, −44); … (−3, −2,−1, +1, +2, +3); …( +25, +26, +27, +28, +29, +30), taken from Supplementary to the work [34]), and the resulting sum is divided by the number of promoters.
We have written the programs in Python 3.10 for profile construction.

5. Conclusions

Eukaryote organisms, regardless of the level of their evolutionary development and the AT content of genomic sequences, have common structural features of the naked DNA in the RNA polymerase II core promoter region. These features are the exceptional heterogeneity and asymmetry of the 3D structure and the inclusion of two singular regions—hexanucleotide (“INR”) around TSS and the octanucleotide (“TATA element”) upstream. The strength of each promoter, to some extent, depends on the nucleotide sequences forming its singular regions. In our opinion, all of the data presented here correspond to the bottom-up approach conception of evolution [44], starting from the physicochemical properties of nucleic and amino acid polymers.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms231810873/s1.

Author Contributions

I.A.I. designed research, A.V.M. performed research; I.A.I. and A.A.A. analyzed data and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

Anastasia A. Anashkina thanks the Russian Fund for Basic Research for support (grant 20-04-01085 A).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to Robert V. Polosov for useful discussions and valuable comments and Alexei A. Adzhubei for reading the early version of the manuscript and making useful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

bp: Base pair; DNase I: Bovine pancreatic deoxyribonuclease I; Pol II: RNA polymerase II; TBP: TATA-binding proteins; TFs: Transcription factors; TSS: Transcription start site.

References

  1. Sarai, A.; Kono, H. Protein-DNA Recognition Patterns and Predictions. Annu. Rev. Biophys. Biomol. Struct. 2005, 34, 379–398. [Google Scholar] [CrossRef] [PubMed]
  2. Rohs, R.; Jin, X.; West, S.M.; Joshi, R.; Honig, B.; Mann, R.S. Origins of Specificity in Protein-DNA Recognition. Annu. Rev. Biochem. 2010, 79, 233–269. [Google Scholar] [CrossRef] [PubMed]
  3. Burley, S.K. Structural Studies of Eukaryotic Transcription Initiation. In Mechanisms of Transcription; Nucleic Acids and Molecular Biology; Eckstein, F., Lilley, D.M.J., Eds.; Springer: Berlin/Heidelberg, Germany, 1997; pp. 251–264. ISBN 978-3-642-60691-5. [Google Scholar]
  4. Pedersen, A.G.; Baldi, P.; Chauvin, Y.; Brunak, S. DNA Structure in Human RNA Polymerase II Promoters. J. Mol. Biol. 1998, 281, 663–673. [Google Scholar] [CrossRef] [PubMed]
  5. Fukue, Y.; Sumida, N.; Nishikawa, J.; Ohyama, T. Core Promoter Elements of Eukaryotic Genes Have a Highly Distinctive Mechanical Property. Nucleic Acids Res. 2004, 32, 5834–5840. [Google Scholar] [CrossRef] [PubMed]
  6. Kanhere, A.; Bansal, M. Structural Properties of Promoters: Similarities and Differences between Prokaryotes and Eukaryotes. Nucleic Acids Res. 2005, 33, 3165–3175. [Google Scholar] [CrossRef] [PubMed]
  7. Florquin, K.; Saeys, Y.; Degroeve, S.; Rouzé, P.; Van de Peer, Y. Large-Scale Structural Analysis of the Core Promoter in Mammalian and Plant Genomes. Nucleic Acids Res. 2005, 33, 4255–4264. [Google Scholar] [CrossRef] [PubMed]
  8. Abeel, T.; Saeys, Y.; Bonnet, E.; Rouzé, P.; Van de Peer, Y. Generic Eukaryotic Core Promoter Prediction Using Structural Features of DNA. Genome Res. 2008, 18, 310–323. [Google Scholar] [CrossRef] [PubMed]
  9. Akan, P.; Deloukas, P. DNA Sequence and Structural Properties as Predictors of Human and Mouse Promoters. Gene 2008, 410, 165–176. [Google Scholar] [CrossRef] [PubMed]
  10. Il’icheva, I.A.; Khodikov, M.V.; Poptsova, M.S.; Nechipurenko, D.Y.; Nechipurenko, Y.D.; Grokhovsky, S.L. Structural Features of DNA That Determine RNA Polymerase II Core Promoter. BMC Genom. 2016, 17, 973. [Google Scholar] [CrossRef] [PubMed]
  11. Dreos, R.; Ambrosini, G.; Périer, R.C.; Bucher, P. The Eukaryotic Promoter Database: Expansion of EPDnew and New Promoter Analysis Tools. Nucleic Acids Res. 2015, 43, D92–D96. [Google Scholar] [CrossRef]
  12. Dreos, R.; Ambrosini, G.; Groux, R.; Cavin Périer, R.; Bucher, P. The Eukaryotic Promoter Database in Its 30th Year: Focus on Non-Vertebrate Organisms. Nucleic Acids Res. 2017, 45, D51–D55. [Google Scholar] [CrossRef] [PubMed]
  13. Mejía-Guerra, M.K.; Li, W.; Galeano, N.F.; Vidal, M.; Gray, J.; Doseff, A.I.; Grotewold, E. Core Promoter Plasticity between Maize Tissues and Genotypes Contrasts with Predominance of Sharp Transcription Initiation Sites. Plant. Cell 2015, 27, 3309–3320. [Google Scholar] [CrossRef]
  14. Molina, C.; Grotewold, E. Genome Wide Analysis of Arabidopsis Core Promoters. BMC Genom. 2005, 6, 25. [Google Scholar] [CrossRef] [PubMed]
  15. Nikolov, D.B.; Chen, H.; Halay, E.D.; Hoffman, A.; Roeder, R.G.; Burley, S.K. Crystal Structure of a Human TATA Box-Binding Protein/TATA Element Complex. Proc. Natl. Acad. Sci. USA 1996, 93, 4862–4867. [Google Scholar] [CrossRef] [PubMed]
  16. Coleman, R.A.; Pugh, B.F. Evidence for Functional Binding and Stable Sliding of the TATA Binding Protein on Nonspecific DNA. J. Biol. Chem. 1995, 270, 13850–13859. [Google Scholar] [CrossRef] [PubMed]
  17. Okonogi, T.M.; Alley, S.C.; Reese, A.W.; Hopkins, P.B.; Robinson, B.H. Sequence-Dependent Dynamics of Duplex DNA: The Applicability of a Dinucleotide Model. Biophys. J. 2002, 83, 3446–3459. [Google Scholar] [CrossRef]
  18. Grokhovsky, S.L.; Il’icheva, I.A.; Nechipurenko, D.Y.; Golovkin, M.V.; Panchenko, L.A.; Polozov, R.V.; Nechipurenko, Y.D. Sequence-Specific Ultrasonic Cleavage of DNA. Biophys. J. 2011, 100, 117–125. [Google Scholar] [CrossRef] [PubMed]
  19. Kladde, M.P.; Kohwi, Y.; Kohwi-Shigematsu, T.; Gorski, J. The Non-B-DNA Structure of d(CA/TG)n Differs from That of Z-DNA. Proc. Natl. Acad. Sci. USA 1994, 91, 1898–1902. [Google Scholar] [CrossRef] [PubMed]
  20. Travers, A.A. The Structural Basis of DNA Flexibility. Philos. Trans. A Math. Phys. Eng. Sci. 2004, 362, 1423–1438. [Google Scholar] [CrossRef] [PubMed]
  21. Friedel, M.; Nikolajewa, S.; Sühnel, J.; Wilhelm, T. DiProDB: A Database for Dinucleotide Properties. Nucleic Acids Res. 2009, 37, D37–D40. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Pérez, A.; Noy, A.; Lankas, F.; Luque, F.J.; Orozco, M. The Relative Flexibility of B-DNA and A-RNA Duplexes: Database Analysis. Nucleic Acids Res. 2004, 32, 6144–6151. [Google Scholar] [CrossRef] [PubMed]
  23. Goñi, J.R.; Pérez, A.; Torrents, D.; Orozco, M. Determining Promoter Location Based on DNA Structure First-Principles Calculations. Genome Biol. 2007, 8, R263. [Google Scholar] [CrossRef] [PubMed]
  24. Gartenberg, M.R.; Crothers, D.M. DNA Sequence Determinants of CAP-Induced Bending and Protein Binding Affinity. Nature 1988, 333, 824–829. [Google Scholar] [CrossRef] [PubMed]
  25. Suzuki, M.; Allen, M.D.; Yagi, N.; Finch, J.T. Analysis of Co-Crystal Structures to Identify the Stereochemical Determinants of the Orientation of TBP on the TATA Box. Nucleic Acids Res. 1996, 24, 2767–2773. [Google Scholar] [CrossRef] [PubMed]
  26. Vargason, J.M.; Henderson, K.; Ho, P.S. A Crystallographic Map of the Transition from B-DNA to A-DNA. Proc. Natl. Acad. Sci. USA 2001, 98, 7265–7270. [Google Scholar] [CrossRef]
  27. Lu, X.-J.; Olson, W.K. 3DNA: A Software Package for the Analysis, Rebuilding and Visualization of Three-Dimensional Nucleic Acid Structures. Nucleic Acids Res. 2003, 31, 5108–5121. [Google Scholar] [CrossRef] [PubMed]
  28. Il’icheva, I.A.; Vlasov, P.K.; Esipova, N.G.; Tumanyan, V.G. The Intramolecular Impact to the Sequence Specificity of B-->A Transition: Low Energy Conformational Variations in AA/TT and GG/CC Steps. J. Biomol. Struct. Dyn. 2010, 27, 667–693. [Google Scholar] [CrossRef] [PubMed]
  29. Grokhovsky, S.L.; Il’icheva, I.A.; Golovkin, M.V.; Nechipurenko, Y.D.; Nechipurenko, D.Y.; Panchenko, L.A.; Polozov, R.V. Mechanochemical Cleavage of DNA by Ultrasound. Adv. Eng. Res. 2013, 213, 1–24. [Google Scholar]
  30. Grokhovsky, S.; Il’icheva, I.; Nechipurenko, D.; Golovkin, M.; Taranov, G.; Panchenko, L.; Polozov, R.; Nechipurenko, Y. Quantitative Analysis of Electrophoresis Data—Application to Sequence-Specific Ultrasonic Cleavage of DNA. Gel Electrophor. Princ. Basics 2012, 217, 238. [Google Scholar]
  31. Suck, D.; Lahm, A.; Oefner, C. Structure Refined to 2A of a Nicked DNA Octanucleotide Complex with DNase I. Nature 1988, 332, 464–468. [Google Scholar] [CrossRef]
  32. Weston, S.A.; Lahm, A.; Suck, D. X-ray Structure of the DNase I-d(GGTATACC)2 Complex at 2.3 A Resolution. J. Mol. Biol. 1992, 226, 1237–1256. [Google Scholar] [CrossRef]
  33. Suck, D. DNA Recognition by DNase I. J. Mol. Recognit. 1994, 7, 65–70. [Google Scholar] [CrossRef] [PubMed]
  34. Lazarovici, A.; Zhou, T.; Shafer, A.; Dantas Machado, A.C.; Riley, T.R.; Sandstrom, R.; Sabo, P.J.; Lu, Y.; Rohs, R.; Stamatoyannopoulos, J.A.; et al. Probing DNA Shape and Methylation State on a Genomic Scale with DNase I. Proc. Natl. Acad. Sci. USA 2013, 110, 6376–6381. [Google Scholar] [CrossRef] [PubMed]
  35. Mondal, M.; Choudhury, D.; Chakrabarti, J.; Bhattacharyya, D. Role of Indirect Readout Mechanism in TATA Box Binding Protein-DNA Interaction. J. Comput. Aided Mol. Des. 2015, 29, 283–295. [Google Scholar] [CrossRef] [PubMed]
  36. Kaltenbach, L.; Horner, M.A.; Rothman, J.H.; Mango, S.E. The TBP-like Factor CeTLF Is Required to Activate RNA Polymerase II Transcription during C. Elegans Embryogenesis. Mol. Cell 2000, 6, 705–713. [Google Scholar] [CrossRef]
  37. Chen, R.A.-J.; Down, T.A.; Stempor, P.; Chen, Q.B.; Egelhofer, T.A.; Hillier, L.W.; Jeffers, T.E.; Ahringer, J. The Landscape of RNA Polymerase II Transcription Initiation in C. Elegans Reveals Promoter and Enhancer Architectures. Genome Res. 2013, 23, 1339–1347. [Google Scholar] [CrossRef]
  38. Khamis, A.M.; Hamilton, A.R.; Medvedeva, Y.A.; Alam, T.; Alam, I.; Essack, M.; Umylny, B.; Jankovic, B.R.; Naeger, N.L.; Suzuki, M.; et al. Insights into the Transcriptional Architecture of Behavioral Plasticity in the Honey Bee Apis Mellifera. Sci. Rep. 2015, 5, 11136. [Google Scholar] [CrossRef]
  39. Gazanion, E.; Lacroix, L.; Alberti, P.; Gurung, P.; Wein, S.; Cheng, M.; Mergny, J.-L.; Gomes, A.R.; Lopez-Rubio, J.-J. Genome Wide Distribution of G-Quadruplexes and Their Impact on Gene Expression in Malaria Parasites. PLoS Genet. 2020, 16, e1008917. [Google Scholar] [CrossRef]
  40. Gage, H.L.; Merrick, C.J. Conserved Associations between G-Quadruplex-Forming DNA Motifs and Virulence Gene Families in Malaria Parasites. BMC Genom. 2020, 21, 236. [Google Scholar] [CrossRef]
  41. Yang, C.; Ponticelli, A.S. Evidence That RNA Polymerase II and Not TFIIB Is Responsible for the Difference in Transcription Initiation Patterns between Saccharomyces Cerevisiae and Schizosaccharomyces Pombe. Nucleic Acids Res. 2012, 40, 6495–6507. [Google Scholar] [CrossRef]
  42. Rhind, N.; Chen, Z.; Yassour, M.; Thompson, D.A.; Haas, B.J.; Habib, N.; Wapinski, I.; Roy, S.; Lin, M.F.; Heiman, D.I.; et al. Comparative Functional Genomics of the Fission Yeasts. Science 2011, 332, 930–936. [Google Scholar] [CrossRef] [PubMed]
  43. Qiu, C.; Jin, H.; Vvedenskaya, I.; Llenas, J.A.; Zhao, T.; Malik, I.; Visbisky, A.M.; Schwartz, S.L.; Cui, P.; Čabart, P.; et al. Universal Promoter Scanning by Pol II during Transcription Initiation in Saccharomyces Cerevisiae. Genome Biol. 2020, 21, 132. [Google Scholar] [CrossRef] [PubMed]
  44. Auboeuf, D. Physicochemical Foundations of Life That Direct Evolution: Chance and Natural Selection Are Not Evolutionary Driving Forces. Life 2020, 10, 7. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. (A). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of H. sapiens, M. mulatta, M. musculus, R. norvegicus, C. familiaris, and G. gallus. (B). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of D. melanogaster, A. mellifera, D. rerio, and C. elegans. (C). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of A. thaliana and Z. mays. (D). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of S. cerevisae, S. pombe, and P. falciparum.
Figure 1. (A). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of H. sapiens, M. mulatta, M. musculus, R. norvegicus, C. familiaris, and G. gallus. (B). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of D. melanogaster, A. mellifera, D. rerio, and C. elegans. (C). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of A. thaliana and Z. mays. (D). Profiles of core promoter sequences as the mononucleotides frequencies of occurrence (in percentages) at each position along the strand, complementary to template for data sets of S. cerevisae, S. pombe, and P. falciparum.
Ijms 23 10873 g001aIjms 23 10873 g001bIjms 23 10873 g001c
Figure 2. Logo representation with information content 1.0 bits of the promoter sequences of all 15 organisms.
Figure 2. Logo representation with information content 1.0 bits of the promoter sequences of all 15 organisms.
Ijms 23 10873 g002aIjms 23 10873 g002b
Figure 3. Local variations of the values of physical and structural parameters in core promoter regions of H. sapiens, M. mulatta, M. musculus, R. norvegicus, and G. gallus. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Figure 3. Local variations of the values of physical and structural parameters in core promoter regions of H. sapiens, M. mulatta, M. musculus, R. norvegicus, and G. gallus. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Ijms 23 10873 g003
Figure 4. Local variations of the values of physical and structural parameters in core promoter regions of C. familiaris, D. melanogaster, A. mellifera, D. rerio, C. elegans. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Figure 4. Local variations of the values of physical and structural parameters in core promoter regions of C. familiaris, D. melanogaster, A. mellifera, D. rerio, C. elegans. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Ijms 23 10873 g004
Figure 5. Local variations of the values of physical and structural parameters in core promoter regions of A. thaliana and Z. mays. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Figure 5. Local variations of the values of physical and structural parameters in core promoter regions of A. thaliana and Z. mays. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Ijms 23 10873 g005
Figure 6. Local variations of the values of physical and structural parameters in core promoter regions of S. cerevisae, S. pombe, and P. falciparum. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Figure 6. Local variations of the values of physical and structural parameters in core promoter regions of S. cerevisae, S. pombe, and P. falciparum. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Ijms 23 10873 g006
Figure 7. Local variations of the values of physical and structural parameters in two non-promoter regions from H. sapiens genomic sequences: the regions (−500–−420) and (−300–−220), and the profiles of the 80 bp set of 30,000 computer simulated random nucleotide sequences along with the profiles of H. sapiens core promoters. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Figure 7. Local variations of the values of physical and structural parameters in two non-promoter regions from H. sapiens genomic sequences: the regions (−500–−420) and (−300–−220), and the profiles of the 80 bp set of 30,000 computer simulated random nucleotide sequences along with the profiles of H. sapiens core promoters. (a) Stacking energy (in kcal/mol). (b) Roll (in degrees). (c) Stiffness of the duplex structure to Roll alteration (in kcal/mol degree). (d) Slide (in angstroms). (e) Stiffness of the duplex structure to Slide alteration (in kcal/mol angstrom). (f) Mobility to bend towards major groove (in mobility units).
Ijms 23 10873 g007
Figure 8. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for H. sapiens core promoters.
Figure 8. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for H. sapiens core promoters.
Ijms 23 10873 g008
Figure 9. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for D. melanogaster core promoters.
Figure 9. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for D. melanogaster core promoters.
Ijms 23 10873 g009
Figure 10. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for Z. mays core promoters.
Figure 10. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for Z. mays core promoters.
Ijms 23 10873 g010
Figure 11. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for P. falciparum core promoters.
Figure 11. Profiles of ultrasonic cleavage indexes and DNase I cleavage indexes for P. falciparum core promoters.
Ijms 23 10873 g011
Table 1. Frequencies of occurrence of different octanucleotides in TATA-box position of every studied organisms (for the exception of S. cerevisae).
Table 1. Frequencies of occurrence of different octanucleotides in TATA-box position of every studied organisms (for the exception of S. cerevisae).
H. sapiens(−30–−23)M. mulatta(−31–−24)M. musculus(−30–−23)R. norvegicus(−30–−23)C. familiaris(−30–−23)G. gallus(−30–−23)D. melanogaster(−31–−24)A. mellifera(−32–−25)
1TATAAAAG0.20%TATAAAAG0.14%TATAAAAG0.34%TATAAAAG0.40%TATAAAAG0.20%GGGGCGGG0.26%TATAAAAG0.84%TATATATA0.66%
2TTTTTTTT0.12%GGGGCGGG0.14%TTTTTTTT0.32%TATAAAGG0.18%GGGCGGGG0.20%TATATAAG0.20%TATAAATA0.36%ATATATAT0.49%
3ATAAAAGG0.11%CGCCGCCG0.13%TATATAAG0.21%ATAAAAGG0.16%TATATAAG0.16%TTTTTTTT0.18%CTATAAAA0.35%TATATATT0.28%
4GGGCGGGG0.11%GGCGGCGG0.11%ATAAAAGG0.19%TATAAATA0.12%TATAAAAA0.15%CCGCCCCG0.18%ATAAAAGC0.34%CATATATA0.15%
5GCCCCGCC0.10%CTATAAAG0.10%TATAAAGG0.15%TATATAAG0.12%CCGGAAGT0.13%GGCGGGGC0.18%GTATAAAA0.27%GTATAAAA0.15%
6TATATAAG0.10%CTATAAAA0.10%TATAAATA0.12%TATATAAA0.12%ATAAAGGC0.12%GCGGGGCG0.18%CTATATAA0.26%TATATAAG0.15%
7GGGGCGGG0.10%CCCCGCCC0.10%ATATAAGG0.11%TATAAAGA0.12%GCGGCGGC0.12%TATAAAAG0.16%TATAAAAA0.24%TATAAAAG0.15%
8TATAAAAA0.10%CCGGAAGC0.10%ATAAATAG0.11%CTATAAAA0.12%TATAAATA0.12%ATAAAAGC0.16%TATATAAG0.24%ATATATAA0.15%
9TATAAAGG0.09%CGGCGGCG0.09%ATAAAAGC0.10%ATAAAAAG0.11%GCCCCGCC0.12%GCGGCGGG0.15%TATATAAA0.22%TATATAAA0.14%
10CCCCTCCC0.08%TGGGCGGG0.08%ATAAAAAG0.10%ATAAAAGC0.10%CGCCGCCG0.11%GATAAAAG0.15%ATAAATAG0.19%TTATATAT0.12%
11CCCCGCCC0.08%CCGCCCCG0.08%GGGGCGGG0.10%TATAAAGC0.10%TTTTTTTT0.11%TATAAAGG0.15%ATATAAAA0.17%TATAAATA0.11%
12ATATAAAG0.08%CTATATAA0.08%GCCCCGCC0.09%GCCCCGCC0.10%GGGGCGGG0.11%TATAAAGC0.15%GTATATAA0.14%CTATATAT0.11%
13CTATAAAA0.08%CGCCCCGC0.08%TATATAAA0.09%ATAAAGGC0.10%CCCGCCCC0.11%TATAAAAA0.15%ATAAAAAC0.13%TTATATTT0.11%
14ATAAAAGC0.08%AAAAAAAA0.08%TATAAGAG0.09%ATATAAAG0.10%ATAAAAGG0.09%CCCGCCCC0.13%TTATAAAA0.13%GTATATAT0.11%
15CCCGCCCC0.08%TTATAAAA0.07%ATAAAAGA0.08%TATAAGAG0.10%ATATAAGG0.09%ATAAAAGG0.13%ATATAAGC0.12%ATGTATAT0.09%
16CTATAAAG0.07%GCGCCTGC0.07%ATATAAAG0.08%TAAAAGCC0.09%CCCCGCCC0.09%TCCCTCCC0.13%TATAAAAT0.12%TATATGTA0.09%
17GAATAAAA0.06%GGAGGAGG0.07%CTATAAAA0.08%GGGGCGGG0.09%CCCTCCCC0.09%CGGGGCGG0.13%ATAAAAGA0.12%AGTATATA0.09%
18TTAAAAGG0.06%GCGGCGCG0.07%GATAAAAG0.08%AGATAAAA0.09%GGCGGCGG0.09%CACTTCCG0.11%TAAAAGCC0.12%ATATAAAT0.09%
19TTTAAAAG0.06%CATAAAAG0.07%TTTAAAAG0.08%ATAAATAG0.09%GCTTCCGG0.09%CGCTTCCG0.11%ATAAAAGG0.12%ATTATATA0.09%
20TATAAGAG0.06%GCGGCGGC0.07%AATAAAAG0.07%TATAAAAA0.08%TATAAAGG0.09%GCCCCGCC0.11%GTATAAAT0.12%TAAATATT0.08%
A. mellifera(−32–−25)D. rerio(−30–−23)C. elegans(−31–−24)A. thaliana(−34–−27)Z. mays(−34–−27)S. pombe(−34–−27)P. falciparum(−39–−32)
1TATATATA0.66%TATAAATA0.28%TATAAAAG0.90%TATATATA1.43%TATATATA0.60%TATATATA0.67%ATATATAT5.79%
2ATATATAT0.49%TTTATTTT0.22%GTATAAAA0.42%TATAAATA0.98%CTATAAAT0.34%ATATATAT0.42%TATATATA4.97%
3TATATATT0.28%TATAAAAG0.21%TATAAATA0.38%ATATATAT0.76%CTATATAA0.29%TATATAAA0.27%AAAAAAAA3.73%
4CATATATA0.15%CTTTTATT0.20%CTATAAAA0.28%TATATAAA0.65%TATAAATA0.29%CTATATAA0.23%TTTTTTTT3.59%
5GTATAAAA0.15%TTTTATTT0.18%TATATAAA0.28%CTATAAAT0.60%ATATATAT0.29%CATATATA0.21%TATATATT1.05%
6TATATAAG0.15%TTTAAAAG0.17%TATAAAAA0.25%CTATATAA0.52%CTATATAT0.25%GTATATAT0.21%ATATATAA0.79%
7TATAAAAG0.15%TATAAAAA0.15%ATAAAAGA0.25%CTATATAT0.51%TATATAAA0.24%CTATATAT0.21%ATATAATA0.71%
8ATATATAA0.15%TATAAAGC0.15%GTATATAA0.24%ATATATAA0.46%CTATAAAA0.23%ATATATAA0.19%ATATATTT0.70%
9TATATAAA0.14%TATAAAAC0.15%ATATAAAA0.21%TCTATATA0.41%CCTATAAA0.18%TATATAAG0.19%ATTTTTTT0.66%
10TTATATAT0.12%ATAAAAGC0.14%TATATAAG0.20%TCTATAAA0.39%GTATATAT0.15%ACTATATA0.17%TTATATAT0.63%
11TATAAATA0.11%TATATAAA0.14%TATAAAAT0.20%ATATAAAT0.39%TCTATATA0.15%ATATAAAT0.17%TAAATAAA0.59%
12CTATATAT0.11%TTATTTTG0.12%GTATAAAT0.20%TATAAAAA0.29%ATATATAC0.14%TATAAAAG0.17%TTTATTTT0.57%
13TTATATTT0.11%TTTAAAAA0.12%ATATAAAT0.15%TTATAAAT0.28%ATATATAA0.14%AAACGATG0.17%TATATAAT0.55%
14GTATATAT0.11%GAGAGAGA0.11%AGTATAAA0.15%CTATAAAA0.28%GCTATAAA0.14%GTATAAAT0.17%AATAAATA0.55%
15ATGTATAT0.09%ACTTTTAT0.11%ATAAAAGG0.14%TTTATATA0.27%ATAAATAG0.13%TGAATAAA0.15%AATATATA0.55%
16TATATGTA0.09%ATAAAAGG0.11%GGTATAAA0.13%TTATATAT0.24%TATAAAAG0.13%TGTATATA0.15%TATTTTTT0.55%
17AGTATATA0.09%ATAAATAC0.11%TATAAATT0.11%GTATATAT0.24%TATATAAG0.13%TTAAAAAA0.12%TTTTTTTA0.50%
18ATATAAAT0.09%TATAAACA0.11%ATAAAAAG0.11%ATATAAAC0.23%TATAAAAA0.13%ATATATAG0.12%ATATTATA0.50%
19ATTATATA0.09%TTTAAATA0.11%TATATATA0.11%ATAAATAA0.23%TATAAAAC0.12%TATATATT0.12%TTTTATTT0.50%
20TAAATATT0.08%TTTAAAAC0.10%ATAAATAG0.10%TTATATAA0.23%ATATAAAC0.12%AATATAAA0.12%TAAATATA0.48%
Table 2. The content (percentage) of dinucleotides PyPu, PuPu, PyPy, and PuPy in positions −1, +1.
Table 2. The content (percentage) of dinucleotides PyPu, PuPu, PyPy, and PuPy in positions −1, +1.
PyPuPuPuPyPyPuPy
H. sapiens72.17%13.83%9.66%4.34%
M. mulatta76.49%11.61%8.78%3.11%
M. musculus77.63%10.73%8.70%2.94%
R. norvegicus77.71%10.19%9.81%2.29%
C. familiaris65.34%15.73%13.04%5.89%
G. gallus68.30%12.39%13.96%5.35%
D. melanogaster91.26%2.87%3.54%2.33%
A. mellifera95.40%2.59%1.60%0.40%
D. rerio83.52%9.97%5.71%0.79%
C. elegans90.97%2.78%5.67%0.58%
A. thaliana88.81%5.97%3.55%1.67%
Z. mays75.15%9.38%10.29%5.18%
S. cerevisiae93.59%2.21%2.19%2.01%
S. pombe97.42%1.10%1.08%0.40%
P. falciparum95.09%1.68%1.95%1.29%
Table 3. The content (percentage) of each of 16 dinucleotides in positions −1, +1.
Table 3. The content (percentage) of each of 16 dinucleotides in positions −1, +1.
AAACAGATCACCCGCTGAGCGGGTTATCTGTT
H. sapiens1.65%0.90%1.83%0.25%38.24%5.06%13.41%0.44%4.90%2.04%5.44%1.15%8.18%3.74%12.35%0.41%
M. mulatto2.02%1.11%1.80%0.07%39.11%4.88%16.66%0.22%4.43%1.78%3.37%0.16%8.21%3.57%12.51%0.11%
M. musculus2.25%1.11%1.93%0.30%42.17%4.31%11.82%0.49%3.89%1.18%2.66%0.34%9.96%3.65%13.69%0.25%
R. norvegicus1.76%0.86%1.94%0.11%39.84%4.98%13.55%0.18%3.43%1.13%3.06%0.19%9.96%4.44%14.37%0.20%
C. familiaris2.01%1.31%2.86%0.25%29.88%7.36%19.60%0.97%5.32%3.57%5.54%0.76%5.46%4.36%10.39%0.36%
G. gallus2.27%1.27%1.83%0.36%31.02%8.37%19.57%0.52%4.24%3.09%4.05%0.64%4.93%4.68%12.78%0.38%
D. melanogaster0.62%0.71%0.74%0.74%57.76%0.85%3.34%0.41%0.86%0.58%0.65%0.30%22.95%1.86%7.21%0.43%
A. mellifera0.40%0.06%0.77%0.20%39.59%0.59%10.23%0.06%0.57%0.06%0.85%0.08%28.96%0.85%16.63%0.11%
D. rerio2.03%0.18%2.20%0.03%36.72%1.94%14.04%0.61%3.31%0.45%2.43%0.14%13.82%2.58%18.93%0.59%
C. elegans0.62%0.15%0.45%0.14%53.76%1.35%4.34%0.74%1.18%0.21%0.53%0.07%23.56%3.01%9.31%0.58%
A. thaliana0.96%0.31%0.82%0.27%43.42%1.16%5.97%0.29%3.12%0.48%1.07%0.60%27.08%1.78%12.34%0.31%
Z. mays1.48%1.45%2.21%0.71%35.00%4.52%18.22%1.74%3.15%2.07%2.54%0.94%9.62%3.20%12.32%0.83%
S. cerevisiae0.76%0.68%0.65%0.70%47.65%0.66%6.08%0.33%0.43%0.33%0.37%0.29%30.22%0.61%9.64%0.59%
S. pombe0.21%0.19%0.19%0.00%36.17%0.48%6.40%0.00%0.42%0.13%0.29%0.06%36.30%0.44%18.59%0.15%
P. falciparum1.36%0.21%0.14%1.04%15.60%0.18%1.93%0.25%0.16%0.00%0.00%0.04%60.68%0.39%16.89%1.13%
Table 4. The content (percentage) of tetranucleotides in positions −2, +2.
Table 4. The content (percentage) of tetranucleotides in positions −2, +2.
H. sapiensM. mulattoM. musculusR. norvegicusC. familiarisG. gallusD. melanogasterA. melliferaD. rerioC. elegansA. thalianaZ. maysS. cerevisiaeS. pombeP. falciparum
1CCAG6.98%GCAG7.62%TCAG7.39%TCAG7.05%GCAG6.40%GCAG8.32%TCAG20.34%TCAG12.47%TCAG6.74%TCAT16.03%TCAT7.58%CCAC4.64%ACAA6.27%CCAA6.60%TTAT22.78%
2GCAG6.49%CCAG6.85%GCAG7.02%GCAG6.79%CCAG6.02%TCAG3.77%TTAG7.30%TTAG6.48%TCAC4.46%TTAT7.46%TCAA6.31%CCAG3.99%CCAA6.06%TTAC6.08%TTAA12.92%
3TCAG5.99%TCAG5.64%CCAG6.97%CCAG6.53%TCAG4.14%GCGC3.48%TCAC5.17%CCAG3.67%GCAG4.07%TCAC6.42%TCAC5.55%CCAA3.66%TCAA5.22%TTAA5.60%ATAT10.26%
4TCAC3.21%CCAC3.13%TCAC3.62%TCAC3.32%CCGC3.06%CCGC3.28%TCAT5.13%ACAG3.65%TCAT2.93%CCAT5.01%TTAT3.79%GCAG2.96%GCAA4.03%TCAA5.46%TTGT6.72%
5CCAC2.92%TCAC3.05%CCAC3.18%CCAC3.13%GCGG2.88%GCAC3.26%CCAG4.61%TCAT3.51%TTGT2.71%TCAG4.27%CCAA3.52%TCAC2.94%CCAT3.89%CTAC4.58%TCAT4.45%
6GCAC2.35%GCGC2.44%GCAC2.50%GCGC2.36%GCGC2.88%CCAG3.07%GCAG4.58%GCAG3.22%ACAG2.68%TCAA4.10%TTAA3.45%TCAG2.46%ATAA3.85%CTAA4.41%TTGA4.43%
7ACAG1.96%GCAC2.29%ACAG2.33%GCAC2.33%CCAC2.32%TCAC2.63%ACAG3.29%TTGA3.13%ACAC2.40%TTGT3.03%ACAA3.22%CCGC2.41%GTAA3.65%TTGC3.89%ATAA4.40%
8GCGC1.91%CCGC2.23%CTGT1.98%CTGT2.04%CCGG2.25%GCGG2.61%TTAT2.75%TCAC2.93%CCAG2.29%CTAT2.77%CCAT3.06%TCGC2.27%ATAT3.21%TCAC3.69%TCAA2.93%
9CCGC1.89%GCGG2.17%GCGC1.88%ACAG2.02%TCAC2.15%TCCT2.50%TCAA2.55%TTAC2.82%TTAC1.95%ACAT2.61%CTAA2.89%CTGC2.11%ACAT3.15%CCAC3.56%TTAG2.29%
10GGAG1.80%CCAT2.02%CCAT1.79%TCCT1.89%GGAG2.11%CCGT2.14%TTAA2.11%TTAT2.76%TTGA1.83%CCAC2.42%TCAG2.74%CCAT2.11%CTAЛ2.89%TTAG3.37%ACAT2.16%
11CCAT1.68%CTGT1.99%TTAG1.74%CTGA1.86%TCCT2.00%CCAC2.04%TTGT2.02%TTGT2.70%GCAC1.75%GCAT2.32%TTGT2.47%GCAC2.09%TTAA2.81%TTGA3.31%TTAC1.98%
12TCAT1.67%ACAG1.84%CTGA1.59%CTAG1.82%GCAC1.63%CTGT2.02%GCAT1.80%GTAG2.23%TTAG1.75%TTAA2.23%TTAC2.39%GCAA2.01%TCAT2.76%TTGT3.12%GTAT1.57%
13GCGG1.64%GGAG1.66%CCGC1.58%CCGC1.80%ACAG1.55%CCAT2.02%CCAC1.68%ATAG2.16%CTGT1.72%TTAC2.02%TTGA2.32%TCAA1.60%GCAT2.68%CCAT2.85%ACAA1.57%
14CTGT1.52%CTAG1.65%CTAG1.56%TTAG1.79%CCGA1.54%GCCT1.91%CCAT1.68%CTAG2.06%CCAC1.71%ACAA1.76%CTAT2.00%CTAG1.55%GTAT2.64%TCAG2.33%ATGT1.27%
15TCCT1.51%TTGT1.62%TCAT1.55%CCAT1.60%GCCT1.40%GCGT1.81%GCAC1.63%TTЛЛ2.03%CTGC1.59%ACAC1.59%ATЛЛ1.88%CCGA1.52%TTGA2.40%TTAT2.27%TTGG1.13%
16TTGT1.44%TCCT1.58%TTGT1.54%TTGT1.48%GCGT1.40%CTGC1.68%CTAG1.60%TCGA1.85%GCGC1.57%GCAC1.59%CCAC1.86%TTGC1.49%CCAG2.03%ACAA2.12%CTAT1.09%
17CTGA1.42%TCAT1.58%TCCT1.47%CTGC1.44%CTGT1.30%GGAG1.62%TTAC1.31%ATCA1.77%GTGT1.54%TTAG1.50%ACAT1.84%TCGA1.45%ACAG1.95%CTGC2.06%CCAT1.09%
18TTAG1.42%CCGG1.55%CTGC1.43%GCGG1.43%CCGT1.27%CCCT1.47%ACAT1.26%TCAA1.74%CTGA1.53%TTGA1.49%GCAA1.82%TCGT1.43%CTAT1.93%CCAG2.02%CTAA1.04%
19CTGC1.40%CTGC1.37%GGAG1.31%TCAT1.36%CCAT1.25%GTGC1.42%ATAG1.19%ATAT1.40%TTAT1.36%CCAG1.40%GTAA1.72%TCAT1.40%GCAG1.92%CTAT1.90%ATGA1.04%
20CCGG1.24%GCGT1.35%GCGG1.27%GTAG1.33%CTGC1.25%GCAT1.39%ACAC1.10%CCAT1.28%TCAA1.34%GTAT1.28%ATAT1.67%ACAG1.39%CCAC1.90%TCAT1.81%GTAA1.02%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Melikhova, A.V.; Anashkina, A.A.; Il’icheva, I.A. Evolutionary Invariant of the Structure of DNA Double Helix in RNAP II Core Promoters. Int. J. Mol. Sci. 2022, 23, 10873. https://doi.org/10.3390/ijms231810873

AMA Style

Melikhova AV, Anashkina AA, Il’icheva IA. Evolutionary Invariant of the Structure of DNA Double Helix in RNAP II Core Promoters. International Journal of Molecular Sciences. 2022; 23(18):10873. https://doi.org/10.3390/ijms231810873

Chicago/Turabian Style

Melikhova, Anastasia V., Anastasia A. Anashkina, and Irina A. Il’icheva. 2022. "Evolutionary Invariant of the Structure of DNA Double Helix in RNAP II Core Promoters" International Journal of Molecular Sciences 23, no. 18: 10873. https://doi.org/10.3390/ijms231810873

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop