Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging

Flores-Alanis, Alejandro; Cruz-Rangel, Armando; Rodríguez-Gómez, Flor; González, James; Torres-Guerrero, Carlos Alberto; Delgado, Gabriela; Cravioto, Alejandro; Morales-Espinosa, Rosario

doi:10.3390/pathogens10020184

Open AccessArticle

Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging

by

Alejandro Flores-Alanis

¹

,

Armando Cruz-Rangel

²,

Flor Rodríguez-Gómez

³

,

James González

⁴

,

Carlos Alberto Torres-Guerrero

⁵

,

Gabriela Delgado

¹,

Alejandro Cravioto

¹

and

Rosario Morales-Espinosa

^1,*

¹

Departamento de Microbiología y Parasitología, Facultad de Medicina, Universidad Nacional Autónoma de México, Mexico City 04360, Mexico

²

Laboratorio de Bioquímica de Enfermedades Crónicas, Instituto Nacional de Medicina Genómica, Mexico City 14610, Mexico

³

Departamento de Ciencias Computacionales, Centro Universitario de Ciencias Exactas e Ingenierías, Universidad de Guadalajara, Guadalajara 44430, Jalisco, Mexico

⁴

Departamento de Biología Celular, Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

⁵

Posgrado en Edafología, Colegio de Postgraduados, Mexico City 56230, Mexico

^*

Author to whom correspondence should be addressed.

Pathogens 2021, 10(2), 184; https://doi.org/10.3390/pathogens10020184

Submission received: 8 January 2021 / Revised: 2 February 2021 / Accepted: 6 February 2021 / Published: 9 February 2021

(This article belongs to the Collection SARS-CoV Infections)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In December 2019, the first cases of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were identified in the city of Wuhan, China. Since then, it has spread worldwide with new mutations being reported. The aim of the present study was to monitor the changes in genetic diversity and track non-synonymous substitutions (dN) that could be implicated in the fitness of SARS-CoV-2 and its spread in different regions between December 2019 and November 2020. We analyzed 2213 complete genomes from six geographical regions worldwide, which were downloaded from GenBank and GISAID databases. Although SARS-CoV-2 presented low genetic diversity, there has been an increase over time, with the presence of several hotspot mutations throughout its genome. We identified seven frequent mutations that resulted in dN substitutions. Two of them, C14408T>P323L and A23403G>D614G, located in the nsp12 and Spike protein, respectively, emerged early in the pandemic and showed a considerable increase in frequency over time. Two other mutations, A1163T>I120F in nsp2 and G22992A>S477N in the Spike protein, emerged recently and have spread in Oceania and Europe. There were associations of P323L, D614G, R203K and G204R substitutions with disease severity. Continuous molecular surveillance of SARS-CoV-2 will be necessary to detect and describe the transmission dynamics of new variants of the virus with clinical relevance. This information is important to improve programs to control the virus.

Keywords:

SARS-CoV-2; genetic diversity; molecular surveillance; natural selection; non-synonymous substitution

1. Introduction

Following reports of a new infectious disease in the city of Wuhan, China, in December 2019, the subsequent global pandemic has led to 82,579,768 confirmed cases and 1,818,849 deaths up to 2 January 2021 [1]. The infectious agent responsible for this pandemic was identified as a virus of the Coronavirus family (CoVs), which was named as severe acute respiratory syndrome CoV 2 (SARS-CoV-2) [2], while the disease caused by this virus was called COVID-19 (Coronavirus disease 2019). SARS-CoV-2 is a positive, simple-strand RNA virus with a genome of approximately 29 Kb in size that is organized into 11 open reading frames (ORFs) [3]. The first ORF represents approximately 70% of the viral genome, which is composed of two overlapping ORFs (ORF1a and ORF1b). These ORFs encode two polypeptides that are processed into 16 non-structural proteins (nsp1-16). The main non-structural proteins include RNA-dependent RNA polymerase (RdRp or nsp12) and a 3’ > 5’ exonuclease (ExonN or nsp14) [4]. The remaining ORFs encode the following four structural proteins: the Spike surface glycoprotein (S), an envelope protein (E), a membrane protein (M), the nucleocapsid protein (N) and other accessory proteins (ORF3a, ORF6, ORF7a/b, ORF8 and ORF10) [5,6].

An important factor in the evolution of RNA viruses is their high mutation rate (10⁻⁶ to 10^–4 substitutions/nucleotide/cell infection) [7]. This phenomenon can be explained partially because the RNA polymerase cannot correct mistakes during genome replication [8]. However, CoVs possess an ExonN with the capacity to correct mistakes that occur during replication [9]. This feature has contributed to the low mutation rate of CoVs compared to other RNA viruses [10,11,12].

When a virus is well adapted to its environment, the establishment of new mutations in the virus population is not favored because most mutations become deleterious (purifying selection). In general, mutations that increase in frequency could be advantageous and fixed by positive selection, or they could be neutral and fixed by genetic drift. It is important to note that when neutral mutations increase in frequency, they can be confused with positive natural selection [8,13]. Therefore, in studies involving the evolutionary dynamics of a new pathogenic virus, such as SARS-CoV-2, it is important to know if the increase in the frequency of mutations is due to natural selection, in order to determine the possible consequences for its fitness, such as increased infectiousness and pathogenicity, or due to adaptation, thereby becoming drug resistant or having the ability to evade the immune system.

The aim of the present study was to monitor the genetic diversity of SARS-CoV-2 and use molecular epidemiology to track non-synonymous substitutions (dN) that could be implicated in the fitness of SARS-CoV-2 and its spread in different regions between December 2019 and November 2020. The information generated will be useful to understand the evolutionary dynamics of SARS-CoV-2 better in order to improve intervention measures against it.

2. Results

2.1. Global Genetic Diversity of SARS-CoV-2

A comparison among the 2213 SARS-CoV-2 genomes showed high nucleotide identity (99.9–100%), with an average pairwise difference of 12.78 nucleotides between any two genomes. The global nucleotide diversity (π) of the 2213 whole genomes was low (π = 0.00044 ± 0.00001). This diversity was not evenly distributed throughout the virus genome, with several high diversity peaks or hotspot mutations in ORF1ab, S gene and N gene being detected. N gene showed the highest peak of nucleotide diversity (π = 0.02934) (Figure 1).

2.2. Spatial–Temporal Genetic Diversity of SARS-CoV-2

Over time, an increase in the global π values was observed, which coincided with the increase in COVID-19 cases from December 2019 to October 2020 (Figure 2). There was a slight decrease in π values in November 2020, but we only sampled until 13th November 2020, so a decrease for the month as a whole was expected. Regional analysis around the world showed that π values were low and similar to each other (United States of America (US) π = 0.00044 ± 0.00001, Latin America (LA) π = 0.00037 ± 0.00002, Europe (EU) π = 0.00043 ± 0.00002, Africa (AF) π = 0.00047 ± 0.00002, Asia (AS) π = 0.00042 ± 0.00001 and Oceania (OC) π = 0.00046 ± 0.00001), although AF and OC regions showed the highest diversities.

Fluctuations in the π values with a tendency to increase over time were observed in US (January π = 0.00025 ± 0.00007–October π = 0.00071 ± 0.00003), EU (January π = 0.00008 ± 0.00002–November π = 0.00072 ± 0.00004), AF (February π = 0.00038 ± 0.00019–November π = 0.00100 ± 0.00004) and AS (December π = 0.00006 ± 0.00002–October π = 0.00062 ± 0.00005). In LA, there was an increase in the π values from March to August (π = 0.00028 ± 0.00002–π = 0.00046 ± 0.00004) but in September, a drastic decrease in the π value (π = 0.00025 ± 0.00012) was detected. While OC showed low diversity between February and September (π = 0.00014 ± 0.00006–π = 0.00020 ± 0.00002), the diversity increased dramatically (π = 0.00074 ± 0.00003 and π = 0.00080 ± 0.00022, respectively) during the months of October and November (Figure 3).

2.3. Non-Synonymous Substitutions and Natural Selection

Among the 2213 whole genomes analyzed, we found 3178 polymorphic sites (S), of which a high proportion (58.5%, 1861 sites) were non-synonymous (dN) when compared with the reference strain, Wuhan-Hu-1. Although there was a large number of dN substitutions, the majority were neutral (dN/dS values were between −22.85 and 7.96 but not statistically significant). In general, it appears that the global population of SARS-CoV-2 is subject to purifying selection (dN/dS = −3.533; p < 0.01).

When we analyzed dN substitutions in total, we identified seven in the global population of SARS-CoV-2 (Table 1) with frequencies > 10%. These seven frequencies varied by region: T85I and Q57H (nsp2) were the most frequent in US; I120F (nsp2) in OC; and R203K and G204R (N protein) in LA, AF and OC; P323L (nsp12) and D614G (S protein) were highly frequent in all regions. Positive selection was seen in T85S (dN/dS = 5.89; p < 0.01) and P323L (dN/dS = 7.49; p < 0.01), while I120F, D614G, Q57H and G204R had positive values of dN/dS, but these were not significant. Meanwhile, R203K presented a dN/dS negative value, but again, this was not significant (Table 1).

2.4. Phylogeny and Dynamics of the Highly Frequent Global dN Substitutions

Phylogenetic analysis, using the Nexstrain nomenclature [14], showed that the 2213 genomes were grouped into seven clusters (Figure 4). G614 was related to clade 19A and the emergence of clade 20A, while L323 was related to the emergence of clade 20A. Clades 20B and 20C, and the subclades 20A.EU1 and 20A.EU2, arose from clade 20A. K203 and R204 were related to the emergence of clade 20B, while I85 was related to the emergence of clade 20C. H57 and F120 emerged into clades 20A and 20B, respectively. Finally, subclade 20A.EU1 was related to G614 and L323, and subclade 20A.EU2 to G614, L323 and H57 (Figure 4).

Subsequently, we performed a spatial–temporal analysis of the dN substitutions with the highest global frequencies (>75%), G614 and L323 (Table 1). G614 was detected for the first time in January 2020, and L323 in February 2020, with both substitutions presenting a high increase in their frequencies between February and March 2020. From April to September 2020, these substitutions were present in >90% of the isolates analyzed each month, and in October 2020, they presented in 100% of the isolates (Figure 5). By region, we observed fluctuations in their frequencies over time, but they were persistent in all regions. In the US and LA regions, both substitutions were detected from February 2020 to October 2020; in EU, G614 was detected from January 2020 to October 2020 and L323 from February 2020 to October 2020; in AF, both substitutions were detected from February 2020 to October 2020, while in AS and OC, they were detected from March 2020 to October 2020 (Figure 5). November was not included in the analysis because we only sampled until 13th November, and genomes could not be obtained from all regions. However, 43 isolates from EU, AF and OC were recovered in this month, and all of these presented the G614 and L323 substitutions.

Interestingly enough, we found that L323 and G614 showed similar frequencies and distributions, and both substitutions presented a strong linkage disequilibrium (LD) (R² = 0.944; p < 0.001). The Nextrain phylogenetic tree indicated that these substitutions emerged early in the pandemic (G614, 2020-01-06 [IC 2019-12-27–2020-01-16]; L323, 2020-01-20, [IC 2019-01-11–2020-01-21]) and have spread all over the world (Figure S1).

2.5. Emergence and Transmission of New Variants of SARS-CoV-2

In addition to those previously described in Section 2.3 above, we investigated if there were dN substitutions with a significant increase in frequency by region. We found a dN substitution in the S gene (G22992A > S477N) with a dN/dS value of 1.92 (p = 0.485) and a frequency of 42.6% (n = 153) in the virus population from the OC region. An I120F substitution was also present in high frequency in OC (43.2%, n = 155) (Table 1).

The Nexstrain phylogenetic tree showed that F120 emerged in late March (2020-03-21; IC 2020-03-12–2020-03-27) in AS, and it spread in AS and OC regions (Figure S1). We detected F120 with moderate frequency (11.4%, n = 67) in AS (Bangladesh) from April 2020 to July 2020 and again in October 2020, and in a genome from EU (Wales) in September 2020. Meanwhile, N477 emerged in late May (2020-05-27; IC 2020-05-08–2020-06-05) in OC and it spread throughout this region (Figure S1 and Figure 6A).

F120 and N477 presented similar distributions and frequencies over time in OC, and were under strong linkage disequilibrium (R² = 0.977; p < 0.001). Both substitutions were detected from June 2020 to October 2020, with their highest frequencies of 98–100% being seen in July 2020 to September 2020. However, between September 2020 and October 2020, there was a dramatic decrease in their frequencies from 100 to 32.4%, respectively (Figure 6B,C).

A second cluster carrying the S477N substitution that included genomes from EU (France, Netherlands, Norway, Belgium and Denmark; n = 15, 78.95%), AF (Tunisia; n = 2, 10.52), AS (Hong Kong; n = 1, 5.26%) and OC (New Zealand; n = 1, 5.26%) (Figure 6A) was detected during September 2020 and November 2020. The phylogenetic trees showed that this cluster corresponded to subclade 20A.EU2 (Figure 4 and Figure 6A). Moreover, one genome from AF (Ivory Coast) located in clade 20A also carried this substitution. The Nexstrain phylogenetic tree suggests that subclade 20A.EU2 emerged in EU during July (2020-07-24; IC 2020-07-09–2020-08-03) (Figure S1).

2.6. Association between Amino acid Variation and Disease Severity

We focused on the dN substitutions located in the S protein (D614G), nsp12 (P323L) and N protein (R203K and G204R) to analyze associations between viral variants and disease severity. We found clinical information available for 118 patients; 21 (17.8%) patients from the low/mild disease group and 84 (71.2%) patients from the hospitalized/severe disease group had the L323 substitution. The G614 substitution was detected in 21 (17.8%) patients from the low/mild disease group and 81 (68.6%) from the hospitalized/severe disease group. K203 and R204 were detected in three (2.5%) patients from the low/mild disease group and 36 (30.5%) patients from the hospitalized/severe disease group. We found a significant association between the presence of G614 (p = 0.0047), L323 (p = 0.0005), and K203 and R204 (p = 0.0015) in patients with hospitalized/severe disease.

3. Discussion

Our results showed that the nucleotide diversity of the global population of SARS-CoV-2 has increased over time. Genome diversity was not homogeneous with regions showing high and low diversity. We found that more than 3000 mutations have emerged in the whole genome of the virus, and half of these have resulted in non-synonymous substitutions (dN), with most of them being neutral or likely neutral substitutions. The P323L and D614G substitutions in the global SARS-CoV-2 population have increased dramatically in their frequency over time. By October and November 2020, they were present in 100% of the virus population analyzed. Moreover, we detected two dN substitutions that spread in Oceania from July to October 2020, and we found a significant association between the G614, L323, K203 and R204 substitutions and hospitalized/severe disease.

Analysis of the 2213 SARS-CoV-2 genomes revealed that they shared a high nucleotide identity, suggesting that the genetic variation is limited withing the global population of the virus. In the whole genome, we detected genomic regions of high and low nucleotide diversity, implying that some genomic regions are evolving faster than others [15,16]. This difference between genomic regions may be useful because regions with low diversity could be considered more suitable to develop and test new antiviral drugs, vaccines and detection methods (RT-PCR), in order to reduce the possibility of rapid drug resistance, immune system evasion and high numbers of false negatives when testing [17,18,19].

Global nucleotide diversity (π) varied over time and coincided with the increase in COVID-19 cases from December 2019 to October 2020. Previous studies have reported a positive association between sampling time and the evolution of the virus, indicating that more recent isolates have accumulated additional mutations more than older ones [15,19]. Although the number of samples per month in the current study was not homogeneous, the increase in diversity over time suggests that the global effective population size of SARS-CoV-2 is relatively high. Regionally, we also found that a tendency for diversity increased over time; however, there were fluctuations in the π values, which could be explained by the sample size per month per region and the over representation of a few genotypes in a given time. Infection patterns during outbreaks that might occur in a region over a determined time period could result in the over-representation of some mutations, resulting in a decrease in genetic diversity and a similar effect to that of natural selection [12,13].

Although we found a large number of dN substitutions, it is still unclear if they play a significant role since most of them are neutral or likely neutral. Seven of them presented frequencies > 10% in the global SARS-CoV-2 population and were detected in nsp2 (T85I and I120F), nsp12 (P323L), S protein (D614G), ORF3a (Q57H) and N protein (R203K and G204R). Additionally, we found a substitution in the S protein (S477N) with a high frequency in OC. Although the dN/dS values were positive for T85I, I120F, G614, L323, Q57H, G204R and S477N, only F120 and L323 presented statistical significance, indicating positive natural selection.

Interestingly, we found that G614 and L323, and F120 and N477, presented a strong LD, suggesting that this LD is the result of natural selection, and that the average fitness of isolates that carry both mutations could overcome the adequacy of each substitution [20], thereby suggesting that the LD among these two substitutions could persist over time [21]. However, more detailed bioinformatics and experimental analyses of LD, epistasis and natural selection are needed to understand the detailed evolution of these substitutions.

Our results, together with those from the Nexstrain phylogenetic analysis, show that L323 and G614 emerged early in the pandemic in EU and AS, respectively, and these have spread worldwide with dramatic increases in frequency over time. Other substitutions were more frequent on a more regional basis, for example, F120 and N477, were highly frequent in OC. F120 emerged in AS and was then introduced to OC where it spread, while N477 emerged and spread in OC. The phylogenetic analysis showed that N477 has also been detected in other regions, principally in EU where it formed a well-defined clade (20A.EU2). In OC, N477 could be the result of an outbreak during the period June–October 2020 with cryptic transmission of SARS-CoV-2 in the region, which may be the case for other outbreaks that have been reported in US [22]. A recent study reported the presence of N477 in EU between June and September 2020 increasing its frequency over time, principally in France [23]. Our results indicate that the presence of this substitution in OC and EU was the result of two independent events (homoplasy), but the few cases observed in AF, AS and OC in clade 20A.EU2 suggest genetic flow from EU to those regions.

Another homoplasy event was the emergence of the mutation A23063T > N501Y in England and South Africa. In the middle of December 2020, a new outbreak in England of a new SARS-CoV-2 strain (named linage B.1.1.7) was reported. The more significant changes in this strain were the mutations A23063T and C23604A, which resulted in substitutions of N501Y and P681H, respectively, in the S protein. Although this strain was detected in late September 2020, a rapid increase in its frequency has since been reported in December 2020 in England, and it has spread to other countries from the UK, Europe, Africa, Asia, Oceania and America [24,25,26]. This variant is roughly 50–56% more transmissible than other virus variants but does not cause more severe disease [27,28]. Its rapid spread has been associated with the N501Y and P681H substitutions, which could be implicated in viral infectivity [29]. The variant (501Y.V2), which was first identified in South Africa in October 2020 [30], also carries the N501Y substitution. Recently, the British government reported two imported cases from South Africa [30]; those genomes had the N501Y substitution but did not share the same mutations in the B1.1.7 linage.

The combination of several mutations and phylogenetic associations provides information that helps to determine the origin of the viral genotypes, and so theoretically, if we know the origin of the genotypes, both local and imported cases can be detected allowing us to track the dynamics of viral spread at a local and global level. Thanks to molecular epidemiology, it has been possible to detect the emergence, introduction and transmission of new variants of the virus in different regions during this current pandemic [10,31,32,33,34]. This information is vital for developing public health interventions and policy to control viral spread.

Given its function, nsp12 is essential for the replication/transcription of the SARS-CoV-2 genome, and this protein serves as a target for the treatment of COVID-19. The P323L substitution is located in an interphase region of nsp12, and together with nsp7 and nsp8, it has been reported to play an important role in the formation of a protein complex [35], which provides structural stability to nsp12 for its processivity [36,37]. However, a previous report suggests that L323 could possess structural alterations [38] and an adverse effect on proofreading during the genomic replication of the virus [15]. Meanwhile, the P323L substitution is located in a pocket that has been predicted as a possible druggable site [39]; however, further research is needed to discover if the mutation could affect these properties.

The S protein is a key factor for the entry of the virus to the host cell [40]. This protein has a receptor binding motif (RBM) that interacts with the ACE-2 receptor of the host cell [41]. The S447N substitution is located in the RBM and a recent study showed that S477 increases the affinity for the ACE-2 receptor [42]. Moreover, this substitution is part of an epitope recognized by human neutralizing antibodies [43], but further analysis is required to determine if N477 alters recognition by human antibodies.

G614 has gained relevance since the presence of this substitution correlated with a higher capacity for infection by SARS-CoV-2 [44,45,46]. Moreover, studies in vitro have shown that this substitution is responsible for making the virus 2.4 times more infectious [47]. It has also been reported that the viral load in COVID-19 patients is higher than in those patients with isolates that do not present this substitution [48,49,50].

Furthermore, the S protein is among the elements targeted in the development of vaccines against SARS-CoV-2. Initial studies have shown that the presence of the D614G substitution produces a reduction in the neutralization titers using antibodies from convalescent plasma obtained from patients with COVID-19 [47]. This suggests that the substitution affected the antigenic response to the S protein. Recent reports have shown that mutations in the S protein are becoming more frequent as the pandemic spreads and that these mutations can have an increased capacity to spread [51,52]. To date, most serum samples from either volunteers in vaccines trials or patients recovering from COVID-19 have shown full or slightly diminished capacity to inactivate some of the more widespread SARS-CoV-2 variants, except for B.1.1.7 (N501Y substitution), 501Y.V2 (N501Y, K417N and E484K substitutions) and 501.V3 (N501Y and E484K substitutions), which have been able to cause a decrease in the neutralization assays using the aforementioned serum samples [53,54,55,56].

Although an effective COVID-19 vaccine could be the proximal solution to the SARS-CoV-2, genetic diversity in the S protein and its implication in host immune evasion must be taken into account in order to develop improved vaccines in the future, which may be required to protect against new mutations.

Finally, nsp2 is a helical transmembrane protein implicated in the modulation of the host cell environment [57], although its precise function remains unknow. Previous studies have reported that a stabilizing mutation in the endosome associate protein-like domain of the nsp2 could be associated with the more contagious phenotype of SARS-CoV-2 when compared with SARS-CoV [58]. The I120F substitution occurs in the N-terminal of nsp2, which is located in the extracellular region of the protein. We recommend the need for further study of its possible implications in virus fitness.

Some genetic changes in SARS-Cov-2 may confer an evolutionary advantage, such as high transmissibility, evasion of the host immune system or future drug resistance, but they could also be implicated in clinical outcomes. We found that the D614G substitutions in the S protein, P323L in nsp12, and R203K and G204R in the N protein had a significant association with the disease severity. More specific studies will be needed to determine how these substitutions contribute to disease severity. Despite the present study including a small amount (5.3%) of COVID-19 clinical data form patients compared to the number of analyzed genomes, this could provide a preliminary approach for determining the association between SARS-CoV-2 and COVID-19 disease severity. Recently, Nagy et al. [59] showed a direct correlation between dN substitution and clinical outcomes. They found five dN substitutions in ORF8, nsp6, ORF3a, nsp4 and N protein related to mild disease, while 17 dN substitutions distributed in S protein, nsp12, ORF3a, N protein, nsp3, ORF6 and nsp7 were related to hospitalization and severe disease, including D614G, P323L, Q57H, R203K and G204R. Associations between the presence of the moderate and severe forms of COVID-19 in pediatric patients and P323L and D614G substitutions were also reported [60]. Moreover, P323L and D614G substitutions may correlate with higher fatality rates [61]. The development of a barcoding system could be useful to detect viral variants and diagnoses of severe COVID-19 disease.

One year after the emergence of the SARS-CoV-2, the virus continues to mutate, and it will keep accumulating novel mutations with possible clinical and therapeutic repercussions requiring the development of new strategies to reduce the burden of COVID-19 disease. Molecular epidemiologic surveillance needs to continue in order to detect genetic changes that might be involved in pathogenesis, host immune system evasion and/or future drug resistance, as well as its worldwide spread. Such information will contribute greatly to the development of more efficient interventions for SARS-CoV-2, as well as to providing a solid foundation for tackling other viral pandemics in the future.

4. Materials and Methods

4.1. Sequences, Alignments and Quality Control

A total of 2500 complete genomes of SARS-CoV-2 from six regions around the world (United States of America (US), Latin America (LA), Europe (EU), Africa (AF), Asia (AS) and Oceania (OC)) were obtained randomly from NCBI [62] and GISAID [24] databases up to November 13, 2020. The genomes were aligned using MAFFT v7.3 [63] and revised by BioEdit v7.2 software [64], using the isolate from Wuhan, Hu-1 (GenBank: NC045512) as the reference strain. Non-coding regions were eliminated, as were all genomes that presented more than 15 non-determined (N) or other ambiguous nucleotides according to the IUPAC nucleotide code. For final analysis, we included 2213 genomes from 29,256 nucleotides distributed throughout the period between December 2019 and November 2020 (Supplementary Table S1).

4.2. Genetic Analyses

We used DnaSP v5.1 software [65] to determine the number of polymorphisms (S), nucleotide diversity (π), the number of non-synonymous (dN) substitutions and linkage disequilibrium (LD) given by the R² index. The variations of π throughout the genome were estimated using a 50 bp window at 10 bp steps. To determine if diversity moves away from neutrality, the difference between synonymous and non-synonymous substitutions (dN/dS) was evaluated using the software MEGA v6.0 [66]. This estimation was based on the maximum joint likelihood of ancestral reconstruction states under the Muse–Gaut models [67] and Felsenstein’s codon substitution [68]. Moreover, the software calculates the probability of rejecting the null hypothesis of neutral evolution (p value). We obtained dN frequencies using Jalview v2.11 software [69].

4.3. Phylogenetic Analysis

A maximum likelihood phylogenetic tree of the 2213 SARS-CoV-2 genomes analyzed in this study on a background of 1888 reference genomes was constructed in Nextstrain [70] on 30 November 2020. In addition, we obtained a maximum likelihood phylogenetic tree (named Nexstrain phylogenetic tree) of SARS-CoV-2 obtained from Nextstrain [70] on the same date in order to localize the nucleotide changes and dN substitutions into their respective clades together with divergence times.

4.4. Clinical Classification and Genetic–Phenotype Association Analysis

For the 711 sequences downloaded from the GISAID database, we obtained the patient follow-up status (Supplementary Table S1). Only 118 sequences had informative patient status; the rest were marked as “unknown” or as “live”. The 118 patient samples with informative status were grouped into two categories: low/mild disease, which included patients who were marked as “asymptomatic”, “home”, “not hospitalized”, “outpatient”, “mild clinical signs without hospitalization” and “isolation”, and hospitalized/severe disease, which included patients who were marked as “hospitalized”, “released”, “deceased”, “intensive care unit” and “recovered”.

Association between genotype and disease severity was performed using a Fisher´s exact test and odds ratio calculation via a 2 × 2 contingency table, and statistical analysis was performed using Rstudio v3.2.2 [71].

Supplementary Materials

The following are available online at https://www.mdpi.com/2076-0817/10/2/184/s1, Table S1. Accession numbers, database, region, country and collection date of the 2213 genomes analyzed. Figure S1. Maximum-likelihood phylogeny of 3156 SARS-CoV-2 genomes deposited in the GISAID database. The 8 dN substitutions analyzed here are located in the base of the nodes. The tree was constructed in the Nexstrain website [70], which allowed us to consult the divergence time of each node associated with each dN substitution.

Author Contributions

R.M.-E. and A.F.-A. contributed to the study design; A.F.-A. performed the data curation, genetic and phylogenetic analyses, and the interpretation of results; F.R.-G. contributed to the interpretation of results; A.F.-A. and R.M.-E. wrote the original manuscript; R.M.-E., F.R.-G., A.C.-R., J.G., C.A.T.-G., G.D., and A.C. made a critical review and edited the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by DGAPA-PAPIIT grant IN213816.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Luisa Sandner Miranda for helpful discussions.

Conflicts of Interest

The authors declare no conflict of interest.

References

World Health Organization. WHO Coronavirus Disease (COVID-19) Dashboard. 2020. Available online: https://covid19.who.int/ (accessed on 29 September 2020).
Gorbalenya, A.E.; Baker, S.C.; Baric, R.S.; de Groot, R.J.; Drosten, C.; Gulyaeva, A.A.; Haagmans, B.L.; Lauber, C.; Leontovich, A.M.; Neuman, B.W.; et al. The species Severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 2020, 5, 536–544. [Google Scholar] [CrossRef] [Green Version]
Wu, F.; Zhao, S.; Yu, B.; Chen, Y.-M.; Wang, W.; Song, Z.-G.; Hu, Y.; Tao, Z.-W.; Tian, J.-H.; Pei, Y.-Y.; et al. A new coronavirus associated with human respiratory disease in China. Nature 2020, 579, 265–269. [Google Scholar] [CrossRef] [Green Version]
Von Brunn, A.; Teepe, C.; Simpson, J.C.; Pepperkok, R.; Friedel, C.C.; Zimmer, R.; Roberts, R.; Baric, R.; Haas, J. Analysis of Intraviral Protein-Protein Interactions of the SARS Coronavirus ORFeome. PLoS ONE 2007, 2, e459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Khailany, R.A.; Safdar, M.; Ozaslan, M. Genomic characterization of a novel SARS-CoV-2. Gene Rep. 2020, 19, 100682. [Google Scholar] [CrossRef]
Cui, J.; Li, F.; Shi, Z.-L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019, 17, 181–192. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sanjuán, R.; Nebot, M.R.; Chirico, N.; Mansky, L.M.; Belshaw, R. Viral Mutation Rates. J. Virol. 2010, 84, 9733–9748. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Peck, K.M.; Lauring, A.S. Complexities of Viral Mutation Rates. J. Virol. 2018, 92, 1–8. [Google Scholar] [CrossRef] [Green Version]
Ogando, N.S.; Ferron, F.; Decroly, E.; Canard, B.; Posthuma, C.C.; Snijder, E.J. The Curious Case of the Nidovirus Exoribonuclease: Its Role in RNA Synthesis and Replication Fidelity. Front. Microbiol. 2019, 10, 1813. [Google Scholar] [CrossRef]
Bedford, T.; Riley, S.; Barr, I.G.; Broor, S.; Chadha, M.; Cox, N.J.; Daniels, R.S.; Gunasekaran, C.P.; Hurt, A.C.; Kelso, A.; et al. Global circulation patterns of seasonal influenza viruses vary with antigenic drift Europe PMC Funders Group. Nature 2015, 523, 217–220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Holmes, E.C.; Dudas, G.; Rambaut, A.; Andersen, K.G. The evolution of Ebola virus: Insights from the 2013–2016 epidemic. Nature 2016, 538, 193–200. [Google Scholar] [CrossRef] [Green Version]
Liu, Q.; Zhao, S.; Shi, C.-M.; Song, S.; Zhu, S.; Su, Y.; Zhao, W.; Li, M.; Bao, Y.; Xue, Y.; et al. Population Genetics of SARS-CoV-2: Disentangling Effects of Sampling Bias and Infection Clusters. Genom. Proteom. Bioinform. 2020, 4–11. [Google Scholar] [CrossRef]
Vitti, J.J.; Grossman, S.R.; Sabeti, P.C. Detecting Natural Selection in Genomic Data. Annu. Rev. Genet. 2013, 47, 97–120. [Google Scholar] [CrossRef] [PubMed]
Nextclade. Available online: https://clades.nextstrain.org/ (accessed on 16 December 2020).
Pachetti, M.; Marini, B.; Benedetti, F.; Giudici, F.; Mauro, E.; Storici, P.; Masciovecchio, C.; Angeletti, S.; Ciccozzi, M.; Gallo, R.C.; et al. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J. Transl. Med. 2020, 18, 1–9. [Google Scholar] [CrossRef] [Green Version]
Day, T.; Gandon, S.; Lion, S.; Otto, S.P. On the evolutionary epidemiology of SARS-CoV-2. Curr. Biol. 2020, 30, R849–R857. [Google Scholar] [CrossRef]
Ramírez, J.D.; Florez, C.; Muñoz, M.; Hernández, C.; Castillo, A.; Gomez, S.; Rico, A.; Pardo, L.; Barros, E.C.; Castañeda, S.; et al. The arrival and spread of SARS-CoV-2 in Colombia. J. Med. Virol. 2020. [Google Scholar] [CrossRef] [PubMed]
Wright, E.; Lakdawala, S.; Cooper, V. SARS-CoV-2 genome evolution exposes early human adaptations. bioRxiv 2020. [Google Scholar] [CrossRef]
Van Dorp, L.; Acman, M.; Richard, D.; Shaw, L.P.; Ford, C.E.; Ormond, L.; Owen, C.J.; Pang, J.; Tan, C.C.; Boshier, F.A.; et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol. 2020, 83, 104351. [Google Scholar] [CrossRef] [PubMed]
Felsenstein, J. The effect of linkage on directional selection. Genetics 1965, 52, 349–363. [Google Scholar] [CrossRef]
Karlin, S.; Feldman, M.W. Linkage and selection: Two locus symmetric viability model. Theor. Popul. Biol. 1970, 1, 39–71. [Google Scholar] [CrossRef]
Bedford, T.; Greninger, A.L.; Roychoudhury, P.; Starita, L.M.; Famulare, M.; Huang, M.-L.; Nalla, A.; Pepper, G.; Reinhardt, A.; Xie, H.; et al. Cryptic transmission of SARS-CoV-2 in Washington state. Science 2020, 370, 571–575. [Google Scholar] [CrossRef]
Hodcroft, E.B.; Zuber, M.; Nadeau, S.; Crawford, K.H.D.; Bloom, J.D.; Stadler, T.; Neher, R.A. Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020 SeqCOVID-SPAIN consortium, 14. medRxiv 2020. [Google Scholar] [CrossRef]
GISAID–Initiative. Available online: https://www.gisaid.org/ (accessed on 30 November 2020).
Preliminary Genomic Characterisation of an Emergent SARS-CoV-2 Lineage in the UK Defined by a Novel Set of Spike Mutations—SARS-CoV-2 Coronavirus/nCoV-2019 Genomic Epidemiology—Virological. Available online: https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/563 (accessed on 23 December 2020).
Wise, J. Covid-19: New coronavirus variant is identified in UK. BMJ 2020, 371, m4857. [Google Scholar] [CrossRef]
Volz, E.; Mishra, S.; Chand, M.; Barrett, J.C.; Johnson, R.; Hopkins, S.; Gandy, A.; Rambaut, A.; Ferguson, N.M. Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. medRxiv 2021. [Google Scholar] [CrossRef]
Davies, N.G.; Barnard, R.C.; Jarvis, C.I.; Kucharski, A.J.; Munday, J.; Pearson, C.A.B.; Russell, T.W.; Tully, D.C.; Abbott, S.; Gimma, A.; et al. Preliminary-not peer reviewed Estimat-ed transmissibility and severity of novel SARS-CoV-2 Variant of Concern 202012/01 in England. medRxiv 2020. [Google Scholar] [CrossRef]
Gu, H.; Chen, Q.; Yang, G.; He, L.; Fan, H.; Deng, Y.-Q.; Wang, Y.; Teng, Y.; Zhao, Z.; Cui, Y.; et al. Adaptation of SARS-CoV-2 in BALB/c mice for testing vaccine efficacy. Science 2020, 369, 1603–1607. [Google Scholar] [CrossRef]
What Do We Know about the Two New Covid-19 Variants in the UK?|World News|The Guardian. Available online: https://www.theguardian.com/world/2020/dec/23/what-do-we-know-about-the-two-new-covid-19-variants-in-the-uk (accessed on 23 December 2020).
Puenpa, J.; Suwannakarn, K.; Chansaenroj, J.; Nilyanimit, P.; Yorsaeng, R.; Auphimai, C.; Kitphati, R.; Mungaomklang, A.; Kongklieng, A.; Chirathaworn, C.; et al. Molecular epidemiology of the first wave of severe acute respiratory syndrome coronavirus 2 infection in Thailand in 2020. Sci. Rep. 2020, 10, 1–8. [Google Scholar] [CrossRef] [PubMed]
Laiton-Donato, K.; Villabona Arenas, C.J.; Usme Ciro, J.; Franco Munoz, C.; Alvarez-Diaz, D.A.; Villabona-Arenas, L.; Echeverria-Londono, S.; Franco-Sierra, N.; Cucunuba, Z.; Florez-Sanchez, A.C.; et al. Genomic epidemiology of SARS-CoV-2 in Colombia. medRxiv 2020. [Google Scholar] [CrossRef]
Lemieux, J.E.; Siddle, K.J.; Shaw, B.M.; Loreth, C.; Schaffner, S.F.; Gladden-Young, A.; Adams, G.; Fink, T.; Tomkins-Tinch, C.H.; Krasilnikova, L.A.; et al. Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events. Science 2021, 371, eabe3261. [Google Scholar] [CrossRef] [PubMed]
Lai, A.; Bergna, A.; Caucci, S.; Clementi, N.; Vicenti, I.; Dragoni, F.; Cattelan, A.M.; Menzo, S.; Pan, A.; Callegaro, A.; et al. Molecular Tracing of SARS-CoV-2 in Italy in the First Three Months of the Epidemic. Viruses 2020, 12, 798. [Google Scholar] [CrossRef] [PubMed]
Gao, Y.; Yan, L.; Huang, Y.; Liu, F.; Zhao, Y.; Cao, L.; Wang, T.; Sun, Q.; Ming, Z.; Zhang, L.; et al. Structure of the RNA-dependent RNA polymerase from COVID-19 virus. Science 2020, 368, 779–782. [Google Scholar] [CrossRef] [Green Version]
Subissi, L.; Posthuma, C.C.; Collet, A.; Zevenhoven-Dobbe, J.C.; Gorbalenya, A.E.; Decroly, E.; Snijder, E.J.; Canard, B.; Imbert, I. One severe acute respiratory syndrome coronavirus protein complex integrates processive RNA polymerase and exonuclease activities. Proc. Natl. Acad. Sci. USA 2014, 111, E3900–E3909. [Google Scholar] [CrossRef] [Green Version]
Wang, R.; Chen, J.; Gao, K.; Hozumi, Y.; Yin, C.; Wei, G.W. Characterizing SARS-CoV-2 mutations in the united states. arXiv 2020. [Google Scholar] [CrossRef]
Chand, G.B.; Banerjee, A.; Azad, G.K. Identification of novel mutations in RNA-dependent RNA polymerases of SARS-CoV-2 and their implications on its protein structure. PeerJ 2020, 8, e9492. [Google Scholar] [CrossRef]
Ruan, Z.; Liu, C.; Guo, Y.; He, Z.; Huang, X.; Jia, X.; Yang, T. SARS-CoV-2 and SARS-CoV: Virtual screening of potential inhibitors targeting RNA-dependent RNA polymerase activity (NSP12). J. Med. Virol. 2020, 93, 389–400. [Google Scholar] [CrossRef] [PubMed]
Ye, Z.-W.; Yuan, S.; Yuen, K.-S.; Fung, S.-Y.; Chan, C.-P.; Jin, D.-Y. Zoonotic origins of human coronaviruses. Int. J. Biol. Sci. 2020, 16, 1686–1697. [Google Scholar] [CrossRef] [Green Version]
Lan, J.; Ge, J.; Yu, J.; Shan, S.; Zhou, H.; Fan, S.; Zhang, Q.; Shi, X.; Wang, Q.; Zhang, L.; et al. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 2020, 581, 215–220. [Google Scholar] [CrossRef] [Green Version]
Starr, T.N.; Greaney, A.J.; Hilton, S.K.; Ellis, D.; Crawford, K.H.; Dingens, A.S.; Navarro, M.J.; Bowen, J.E.; Tortorici, M.A.; Walls, A.C.; et al. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell 2020, 182, 1295–1310.e20. [Google Scholar] [CrossRef]
Barnes, C.O.; Jette, C.A.; Abernathy, M.E.; Dam, K.-M.A.; Esswein, S.R.; Gristick, H.B.; Malyutin, A.G.; Sharaf, N.G.; Huey-Tubman, K.E.; Lee, Y.E.; et al. SARS-CoV-2 neutralizing antibody structures inform therapeutic strategies. Nature 2020, 588, 682–687. [Google Scholar] [CrossRef]
Hou, Y.J.; Chiba, S.; Halfmann, P.; Ehre, C.; Kuroda, M.; Dinnon, K.H.; Leist, S.R.; Schäfer, A.; Nakajima, N.; Takahashi, K.; et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science 2021, 370, 1464–1468. [Google Scholar] [CrossRef]
Brufsky, A. Distinct viral clades of SARS-CoV-2: Implications for modeling of viral spread. J. Med. Virol. 2020, 92, 1386–1390. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brufsky, A.; Lotze, M.T. DC/L-SIGNs of hope in the COVID-19 pandemic. J. Med. Virol. 2020, 92, 1396–1398. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; He, C.L.; Gao, Q.; Zhang, G.J.; Cao, X.X.; Long, Q.X.; Deng, H.J.; Huang, L.Y.; Chen, J.; Wang, K.; et al. The D614G mutation of SARS-CoV-2 spike protein enhances viral infectivity. bioRxiv 2020. [Google Scholar] [CrossRef]
Yao, H.; Lu, X.; Chen, Q.; Xu, K.; Chen, Y.; Cheng, M.; Chen, K.; Cheng, L.; Weng, T.; Shi, D.; et al. Patient-derived SARS-CoV-2 mutations impact viral replication dynamics and infectivity in vitro and with clinical implications in vivo. Cell Discov. 2020, 6, 1–16. [Google Scholar] [CrossRef]
Lorenzo-Redondo, R.; Nam, H.H.; Roberts, S.C.; Simons, L.M.; Jennings, L.J.; Qi, C.; Achenbach, C.J.; Hauser, A.R.; Ison, M.G.; Hultquist, J.F.; et al. A clade of SARS-CoV-2 viruses associated with lower viral loads in patient upper airways. EBioMedicine 2020, 62, 103112. [Google Scholar] [CrossRef] [PubMed]
Volz, E.; Hill, V.; McCrone, J.T.; Price, A.; Jorgensen, D.; O’Toole, Á.; Southgate, J.; Johnson, R.; Jackson, B.; Nascimento, F.F.; et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 2021, 184, 64–75.e11. [Google Scholar] [CrossRef]
Korber, B.; Fischer, W.M.; Gnanakaran, S.; Yoon, H.; Theiler, J.; Abfalterer, W.; Hengartner, N.; Giorgi, E.E.; Bhattacharya, T.; Foley, B.; et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell 2020, 182, 812–827.e19. [Google Scholar] [CrossRef] [PubMed]
Plante, J.A.; Liu, Y.; Liu, J.; Xia, H.; Johnson, B.A.; Lokugamage, K.G.; Zhang, X.; Muruato, A.E.; Zou, J.; Fontes-Garfias, C.R.; et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature 2020, 1–6. [Google Scholar] [CrossRef]
Greaney, A.J.; Loes, A.N.; Crawford, K.H.; Starr, T.N.; Malone, K.D.; Chu, H.Y.; Bloom, J.D. Comprehensive mapping of mutations to the SARS-CoV-2 receptor-binding domain that affect recogni-tion by polyclonal human serum antibodies. bioRxiv 2021. [Google Scholar] [CrossRef]
Wang, Z.; Schmidt, F.; Weisblum, Y.; Muecksch, F.; Finkin, S.; Schaefer-Babajew, D.; Cipolla, M.; Gaebler, C.; Lieberman, J.A.; Yang, Z.; et al. mRNA vaccineelicited antibodies to SARS-CoV-2 and circulating variants 2 3. bioRxiv 2021. [Google Scholar] [CrossRef]
Wibmer, C.K.; Ayres, F.; Hermanus, T.; Madzivhandila, M.; Kgagudi, P.; Lambson, B.E.; Vermeulen, M.; Van Den Berg, K.; Rossouw, T.; Boswell, M.; et al. SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma. bioRxiv 2021. [Google Scholar] [CrossRef]
Novavax Vaccine Delivers 89% Efficacy against COVID-19 in UK—but Is Less Potent in South Africa|Science|AAAS. Available online: https://www.sciencemag.org/news/2021/01/novavax-vaccine-delivers-89-efficacy-against-covid-19-uk-less-potent-south-africa (accessed on 29 January 2021).
Davies, J.; Almasy, K.; McDonald, E.; Plate, L. Comparative multiplexed interactomics of SARS-CoV-2 and homologous coronavirus non-structural proteins identifies unique and shared host-cell dependencies. bioRxiv Prepr. Serv. Biol. 2020. [Google Scholar] [CrossRef]
Angeletti, S.; Benvenuto, D.; Bianchi, M.; Giovanetti, M.; Pascarella, S.; Ciccozzi, M. COVID-2019: The role of the nsp2 and nsp3 in its pathogenesis. J. Med Virol. 2020, 92, 584–588. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nagy, Á.; Pongor, S.; Győrffy, B. Different mutations in SARS-CoV-2 associate with severe and mild outcome. Int. J. Antimicrob. Agents 2021, 57, 106272. [Google Scholar] [CrossRef] [PubMed]
Pandey, U.; Yee, R.; Shen, L.; Judkins, A.R.; Bootwalla, M.; Ryutov, A.; Maglinte, D.T.; Ostrow, D.; Precit, M.; Biegel, J.A.; et al. High Prevalence of SARS-CoV-2 Genetic Variation and D614G Mutation in Pediatric Patients with COVID-19. Open Forum Infect. Dis. 2020. [Google Scholar] [CrossRef]
Toyoshima, Y.; Nemoto, K.; Matsumoto, S.; Nakamura, Y.; Kiyotani, K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J. Hum. Genet. 2020, 65, 1075–1082. [Google Scholar] [CrossRef] [PubMed]
SARS-CoV-2 Resources—NCBI. Available online: https://www.ncbi.nlm.nih.gov/sars-cov-2/ (accessed on 30 November 2020).
Katoh, K.; Toh, H. Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics 2010, 26, 1899–1900. [Google Scholar] [CrossRef]
Hall, T.A. BioEdit: A user-friendly biological sequences alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 1999, 41, 95–98. [Google Scholar]
Librado, P.; Rozas, J. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinforma. Appl. NOTE 2009, 25, 1451–1452. [Google Scholar] [CrossRef] [Green Version]
Tamura, K.; Stecher, G.; Peterson, D.; Filipski, A.; Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Mol. Biol. Evol. 2013, 30, 2725–2729. [Google Scholar] [CrossRef] [Green Version]
Muse, S.V.; Gaut, B.S. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 1994, 11, 715–724. [Google Scholar] [CrossRef] [Green Version]
Felsenstein, J. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 1981, 17, 368–376. [Google Scholar] [CrossRef] [PubMed]
Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2—A multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nextstrain/Ncov/Global. Available online: https://nextstrain.org/ncov/global (accessed on 30 November 2020).
RStudio|Open Source & Professional Software for Data Science Teams—RStudio. Available online: https://rstudio.com/ (accessed on 29 January 2021).

Figure 1. Nucleotide diversity (π) in a total of 2213 SARS-CoV-2 genomes. Several hotspot mutations were detected along the genome. Seven nucleotide substitutions with frequencies > 10% in the sample population are indicated, all of which resulted in amino acid non-synonymous (dN) substitutions. The π values were calculated within a sliding window of 50 bp moving with 10 bp steps.

Figure 2. Temporal changes of SARS-Cov-2 nucleotide diversity (π) and monthly incidence of COVID-19 cases according to confirmed global cases from December 2019 to November 2020. The number of cases in November was not considered because we recorded until 13th November 2020 only. The number of isolates analyzed per month was as follows: Dec = 15, Jan = 103, Feb = 84, Mar = 628, Apr = 222, May = 221, June = 118, July = 233, Aug = 196 and Sep = 179, Oct = 171, Nov = 43. Abbreviations: Dec, December; Jan, January, Feb, February, Mar, March; Apr, April; Aug, August; Sep, September; Oct, October; Nov, November.

Figure 3. Temporal changes of SARS-Cov-2 nucleotide diversity (π) by region. Abbreviations: Dec, December; Jan, January, Feb, February, Mar, March; Apr, April; Aug, August; Sep, September; Oct, October; Nov, November. US, United States of America; LA, Latin America; EU, Europe; AF, Africa; AS, Asia; OC, Oceania.

Figure 4. A Maximum-likelihood phylogeny of 2213 SARS-CoV-2 genomes. The branches with tip circles represent the 2213 genomes analyzed in the present work, and branches without a tip circle represent the 1888 reference genomes. The 7 dN substitutions analyzed here are located in the base of the nodes. The color of each circle indicates the clades to which it belongs according to Nexstrain nomenclature.

Figure 5. Spatial–temporal frequencies of D614G (A) and P323L (B) substitutions. Abbreviations: Dec, December; Jan, January, Feb, February, Mar, March; Apr, April; Aug, August; Sep, September; Oct, October. OC, Oceania; AS, Asia; AF, Africa; EU, Europe; LA, Latin America; US, United States of America. The number of isolates analyzed per month was as follows: Dec = 15, Jan = 103, Feb = 84, Mar = 628, Apr = 222, May = 221, June = 118, July = 233, Aug = 196, Sep = 179, and Oct = 171.

Figure 6. Maximum-likelihood phylogeny of 2213 SARS-CoV-2 genomes (A), and temporal frequencies of I120F (B) and S477N (C) substitutions in Oceania. F120 and N477 are located in the base of the nodes. The presence of N477 is indicated in yellow. Abbreviations: Dec, December; Jan, January, Feb, February, Mar, March; Apr, April; Aug, August; Sep, September; Oct, October. In Figures (B,C), the number of isolates analyzed per month was as follows: Jan = 1, Feb = 3, Mar = 144, Apr = 17, May = 6, June = 18, July = 42, Aug = 46, Sep = 42, and Oct = 37.

Table 1. Non-synonymous Substitutions (dN) of Medium–High Frequency in the Global Population of SARS-CoV-2.

Nucleotide Change	Amino Acid Change	Genomic Location	dN/dS (p Value)	Distribution and Frequency (%)
Nucleotide Change	Amino Acid Change	Genomic Location	dN/dS (p Value)	US	LA	EU	AF	AS	OC	Global
C1059T	T85I	ORF1a (nsp2)	5.89 (0.009)	49.2	12.5	9.80	5.80	2.40	11.4	14.41
A1163T	I120F	ORF1a (nsp2)	4.79 (0.052)	0.00	0.00	0.20	0.00	11.4	43.2	10.08
C14408T	P323L	ORF1b (nsp12)	7.49 (0.002)	80.6	92.3	81.8	88.4	68.2	81.1	79.58
A23403G	D614G	S gene	2.42 (0.153)	80.3	90.9	85.3	92.6	69.0	81.1	80.80
G25563T	Q57H	ORF3	7.13 (0.105)	59.8	18.7	21.1	18.6	25.9	16.2	27.47
G28881A	R203K	N gene	−0.43 (0.805)	7.90	41.8	34.2	48.8	26.9	54.0	33.44
G28883C	G204R	N gene	1.79 (0.285)	7.90	41.8	34.0	48.3	26.7	54.0	33.30

Abbreviations: US, United States of America; LA, Latin America; EU, Europe; AF, Africa; AS, Asia; OC, Oceania.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Flores-Alanis, A.; Cruz-Rangel, A.; Rodríguez-Gómez, F.; González, J.; Torres-Guerrero, C.A.; Delgado, G.; Cravioto, A.; Morales-Espinosa, R. Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging. Pathogens 2021, 10, 184. https://doi.org/10.3390/pathogens10020184

AMA Style

Flores-Alanis A, Cruz-Rangel A, Rodríguez-Gómez F, González J, Torres-Guerrero CA, Delgado G, Cravioto A, Morales-Espinosa R. Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging. Pathogens. 2021; 10(2):184. https://doi.org/10.3390/pathogens10020184

Chicago/Turabian Style

Flores-Alanis, Alejandro, Armando Cruz-Rangel, Flor Rodríguez-Gómez, James González, Carlos Alberto Torres-Guerrero, Gabriela Delgado, Alejandro Cravioto, and Rosario Morales-Espinosa. 2021. "Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging" Pathogens 10, no. 2: 184. https://doi.org/10.3390/pathogens10020184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Molecular Epidemiology Surveillance of SARS-CoV-2: Mutations and Genetic Diversity One Year after Emerging

Abstract

1. Introduction

2. Results

2.1. Global Genetic Diversity of SARS-CoV-2

2.2. Spatial–Temporal Genetic Diversity of SARS-CoV-2

2.3. Non-Synonymous Substitutions and Natural Selection

2.4. Phylogeny and Dynamics of the Highly Frequent Global dN Substitutions

2.5. Emergence and Transmission of New Variants of SARS-CoV-2

2.6. Association between Amino acid Variation and Disease Severity

3. Discussion

4. Materials and Methods

4.1. Sequences, Alignments and Quality Control

4.2. Genetic Analyses

4.3. Phylogenetic Analysis

4.4. Clinical Classification and Genetic–Phenotype Association Analysis

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI