Next Article in Journal
AMPK Promotes Larval Metamorphosis of Mytilus coruscus
Previous Article in Journal
Roles of Polycomb Complexes in the Reconstruction of 3D Genome Architecture during Preimplantation Embryonic Development
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genetic Diversity and Population Structure Analysis of Castanopsis hystrix and Construction of a Core Collection Using Phenotypic Traits and Molecular Markers

1
Guangdong Academy of Forestry, Guangdong Key Laboratory of Forest Cultivation, Protection and Utilization, Guangzhou 510520, China
2
College of Horticulture and Landscape Architecture, Zhongkai College of Agriculture and Engineering, Guangzhou 510225, China
3
Guangdong Forest Resource Conservation Center, Guangzhou 510173, China
4
College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510642, China
5
Academy of Forestry, Hebei Agricultural University, Baoding 071000, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this research.
Genes 2022, 13(12), 2383; https://doi.org/10.3390/genes13122383
Submission received: 18 October 2022 / Revised: 20 November 2022 / Accepted: 5 December 2022 / Published: 16 December 2022
(This article belongs to the Section Population and Evolutionary Genetics and Genomics)

Abstract

:
Castanopsis hystrix is a valuable native, broad-leaved, and fast-growing tree in South China. In this study, 15 phenotypic traits and 32 simple sequence repeat (SSR) markers were used to assess the genetic diversity and population structure of a natural population of C. hystrix and to construct a core germplasm collection by a set of 232 accessions. The results showed that the original population of C. hystrix had relatively high genetic diversity, with the number of alleles (Na), effective number of alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), Shannon’s information index (I), and polymorphism information content (PIC) averaging at 26.188, 11.565, 0.863, 0.897, 2.660, and 0.889, respectively. Three sub-populations were identified based on a STRUCTURE analysis, indicating a strong genetic structure. The results from the phylogenetic and population structures showed a high level of agreement, with 232 germplasms being classified into three main groups. The analysis of molecular variance (AMOVA) test indicated that 96% of the total variance was derived from within populations, which revealed a low differentiation among populations. A core collection composed of 157 germplasms was firstly constructed thereafter, of which the diversity parameters non-significantly differed from the original population. These results revealed the genetic diversity and population structure of C. hystrix germplasms, which have implications for germplasm management and genome-wide association studies on C. hystrix, as well as for core collection establishment applications in other wood-producing hardwood species.

1. Introduction

Information on genetic diversity is important for understanding the extent of genetic variability in existing plant material and the breeding and conservation of genetic resources [1,2]. However, tree breeding usually involves the recurrent selection of genetically superior materials and possibly results in altered diversity levels in breeding populations [3]. Various types of markers can be used for genetic diversity estimation. In the past, phenotypic traits were widely used for assessing genetic diversity; however they are influenced by the environment and cannot be accurately evaluated. In recent decades, DNA molecular markers have been increasingly exploited for genetic diversity. They can be employed to investigate levels of genetic diversity among categories such as cultivars and closely related species in germplasm banks [4,5].
The collection and preservation of germplasm resources are of great significance to genetic improvement, new variety breeding, and germplasm innovation of forest trees [6]. Brown [7,8] and Frankel and Brown [9] first proposed the concept of core collection, an approach using a minimum number of germplasm resources from the whole germplasm bank to represent the maximum genetic diversity of the original collection. The construction of a core collection provided a new approach for the in-depth evaluation, efficient protection, and utilization of germplasm resources [10,11,12] and has gradually become a hot topic in international germplasm resources research [13]. The research on core collection in China started in 1994 and was first applied to some crops [14,15], while applications in trees were relatively rare. Using molecular markers and phenotypic traits to construct a core collection, the full genetic diversity of tree species can be better retained, resulting in improved accuracy and reliability, as shown in Cryptomeria japonica [16] and Robinia pseudoacacia [17]. Recent theoretical studies on sampling ratio, sampling strategy, and effectiveness evaluation of core collection further advance the field by providing a theoretical basis for the construction and representative evaluation of core collections [18,19,20,21].
Castanopsis hystrix is an evergreen broad-leaved tree in the Fagaceae mainly distributed in Guangdong, Guangxi, Fujian and other southern provinces of China. It is widely used in furniture, construction, and shipbuilding due to its fast growth and good adaptability to various materials. C. hystrix has been identified as a precious species in the Guangdong province. Since 1999, Guangdong has continuously collected germplasm materials of C. hystrix from different habitats. Aimed at developing effective and efficient conservation practices for plant genetic resources, understanding the genetic diversity between and within the population is important [22,23,24]. Assessing relatedness among accessions is an important prerequisite for the identification of core collections suitable for optimizing association studies [25]. The analysis of genetic diversity and population genetic structures is significant for verifying domestication events and genetic relationships of C. hystrix. In the past, molecular markers, i.e., random amplified polymorphic DNA (RAPD) [26] and inter-simple sequence repeat (ISSR) [27,28,29] have been applied to assess the genetic diversity of C. hystrix resources. Molecular markers provide a powerful tool for genetic diversity analysis and core collection establishment. Of the marker types, SSR markers consist of 2–6 bp nucleotide repeat motifs, which are considered to be one of the most effective molecular markers for studying genetic diversity. Its codominance, abundant polymorphism information, and genomes specify and show a fairly even distribution over the genome [30]. SSR markers have found applications in the analyses of genetic diversity and population structures, gene mapping, and assisted selection for crop improvement [31,32,33,34]. However, SSR markers have been less frequently applied in C. hystrix, and there is no relevant research on the construction of a C. hystrix core collection.
In this study, to ascertain the genetic diversity and population structure and to construct a core collection, a total of 232 C. hystrix accessions from 17 provenances were analyzed using 32 SSR markers distributed throughout the C. hystrix genome and using 15 phenotypic traits in the whole distribution area. Our objectives were to estimate the levels of genetic diversity and to characterize the population structure of the C. hystrix germplasm collection. The results are intended to provide a molecular basis for understanding C. hystrix genetic diversity, effectively preserving and utilizing C. hystrix germplasm resources, which provide better materials for C. hystrix breeding and ensure the population inheritance of important traits.

2. Materials and Methods

2.1. Experimental Materials

The germplasm gene bank was established in 2003. Since 1999, the C. hystrix Genetic Improvement Research Collaboration Group, composed of the Guangdong Academy of Forestry and other institutions, has systematically investigated, selected, and sampled superior trees from 17 provenance areas with relatively concentrated distributions of C. hystrix germplasm resources according to the distribution of existing C. hystrix resource. Furthermore, the group has also carried out the collection of C. hystrix germplasm resources. Superior trees were selected by the method of five dominant trees, and one tree was selected for every 30–50 m. The number of superior trees selected from each producing area was generally 10–30 trees according to the distribution area. After selecting excellent trees, we collected seeds and the sunny branches of the specific year in the south side of the middle and upper part of the crown from October to November 2001. After sending them directly to the grafting site, 30–50 plants were grafted on each excellent tree. The grafted trees were transplanted in the gene bank in the spring of 2003, and five plants were planted in each excellent tree; the plant row spacing was 3 m × 5 m, and the size of the hole was 60 cm × 60 cm × 40 cm.
A set of 232 accessions of C. hystrix were mainly collected from the whole range (Figure 1), which mainly comprised 17 provenances, including Guangxi (5 accessions), Guangdong (4 accessions), Fujian (3 accessions), Hainan (2 accessions), Yunnan (2 accessions), and Hunan (1 accessions). At present, all of them are preserved in the Maofeng Mountain C. hystrix germplasm gene bank (113° 46′ E and 23° 29′ N), Baiyun District, Guangzhou City, Guangdong Province. From November to December 2018, 3 trees were selected for each clone, and 15 phenotypic traits were investigated, including 5 growth traits, 2 morphological traits, and 8 wood properties (Table S1); the detailed passport data is presented below (Table 1). The laboratory had completed the DNA extraction and SSR genotyping in the early stage. Finally, 32 pairs of SSR markers were chosen (Tables S2 and S3). For the specific operation methods and steps of the test, refer to the paper by Yang [35].

2.2. Genetic Diversity Analysis

The format conversion software Convert v1.31 [36] was used to convert the results into the POPGENE format. Assessments of genetic diversity, including observed heterozygosity (Ho), expected heterozygosity (He), and polymorphic information content (PIC), were estimated using CERVUS v3.0.7 software [37,38]. The number of alleles (Na), number of effective alleles (Ne), Shannon’s information index (I), genetic differentiation index (Fst), and gene flow (Nm) were calculated using GenAlEx v6.5 software [39,40].

2.3. Population Structure, Principal Coordinate Analysis, and Evolutionary Tree Analysis

An analysis of molecular variance (AMOVA) test was carried out to determine the relative partitioning of the total genetic variation among and within different groups of genotypes by using GenAlEx 6.5. The principal coordinate analysis (PCoA) was also performed using GenAlEx v6.5. The genetic structure of unique genotypes was investigated using STRUCTURE v2.3.4 software [41] using an admixed model with 10,000 burn-ins followed by 10,000 iterations. Markov Chain Monte Carlo iterations were run for 20 cycles of a number (K = 1–10) of genetically homogeneous clusters. The most probable K value was determined with the highest ΔK method [42] in STRUCTURE HARVESTER v0.6 software [43] and used for the estimation of the membership coefficient of each clone. The web tool iTol (https://itol.embl.de/ (accessed on 2 July 2021)) was used for data visualization. Additionally, to analyze the relationships of the 232 germplasms, a genetic distance matrix between the clones was generated, and an unrooted phylogenetic tree was constructed using the neighbor-joining method in PowerMarker 3.25 software [44].

2.4. Construction of the Core Collection and Evaluation

QGAStation v2.0 software(Guobo Chen, Futao Zhang and Jun Zhu, Zhejiang University, China) was used to construct the core collection based on phenotypic data, combining three systematic clustering methods (unweighted pair-group average method, Ward’s method, and median distance method), two genetic distances (Euclidean distance and Mahalanobis distance), three sampling methods (random sampling, deviation sampling, and preferred sampling) and seven sampling ratios (10%, 15%, 20%, 25%, 30%, 35%, and 40%) [45].
Core Finder v1.1 [46] and Core Hunter 3 [47] software were used to establish a core collection according to the molecular markers data; the latter software set various sampling ratios. The independent t-test was used to analyze the significance of differences in genetic diversity parameters between the core collection and the original collection [48]. If the differences were not significant, the constructed core collection was considered to be representative of the original collection. The principal coordinates analysis (PCoA) was performed to generate the distribution map of the core and original collection to evaluate the core collection.

3. Results

3.1. Genetic Diversity Analysis of the Original Population

A total of 335 alleles were detected by 32 polymorphic microsatellite markers, with an average of 10.458 alleles detected at each locus, among which 15 alleles were detected at the SSR04 locus. The alleles of SSR02, SSR10, SSR15, SSR18, SSR23, and SSR27 were the same and least, having only eight alleles (Table 2). The average number of effective alleles (Ne) at all loci was 7.115, ranging from 3.832 to 10.931. The average Shannon diversity index (I) was 2.035, ranging from 1.597 to 2.469. The average observed heterozygosity (Ho) was 0.861, ranging from 0.736 to 0.990. The average expected heterozygosity (He) of the 32 loci was 0.824, ranging from 0.704 to 0.894, indicating that the C. hystrix had a high level of diversity. The average polymorphism information index (PIC) was 0.889, ranging from 0.744 to 0.958. According to the standard of PIC ≥ 0.5 [49], all of the above loci showed a high polymorphism. The average coefficient of genetic differentiation (Fst) of all loci in this study was 0.081, ranging from 0.056 to 0.138, indicating slight genetic differentiation in the loci. The average value of gene flow (Nm) was 2.948, ranging from 1.561 to 4.200, indicating that the gene flow greatly fluctuated in different loci, which indicates that there was a high degree of gene communication among the C. hystrix populations. The results showed that the genetic diversity of these SSR loci was generally high, and the highest genetic diversity was SSR04 while the lowest genetic diversity was SSR10, observed by combining the values of each genetic parameter.
In order to analyze the genetic diversity of 17 C. hystrix populations, the genetic parameters, containing Na, Ne, Ho, He, and I were calculated, respectively (Table 3). The average number of alleles (Na) was 10, ranging from 3 to 15. The average number of effective alleles (Ne) was 7.115, ranging from 2.821 to 9.519. The average Shannon diversity index (I) was 2.035, ranging from 1.025 to 2.395. The mean observed heterozygosity (Ho) was 0.861, ranging from 0.766 to 0.923. The average expected heterozygosity (He) was 0.824, ranging from 0.609 to 0.876. These results indicated that the overall genetic diversity of C. hystrix was high and that the level of variation was rich.

3.2. Population Structure of the Original Population

The population structure of the 232 accessions was estimated using STRUCTURE software based on the 32 SSR markers. Firstly, the number of subpopulations (K) was identified based on the maximum likelihood and DK values; the results showed that the DK value reached the highest when K = 3, which indicated that the whole population was divided into three subgroups (Figure 2). The three subgroups were designated as Q1, Q2, and Q3 (indicated in red, green, and blue, respectively, in Figure 3). At K = 3, the division was as follows: group I included 100 accessions; group II contained 57 accessions; and group III contained 75 accessions (Table S4, Figure S1). There was an admixture that occurred between the clusters, indicating that there was a certain degree of gene exchange among the populations.
A principal component analysis was performed to create a three-dimensional scatter plot using the data of the SSRs identified in the 232 C. hystrix germplasms to visualize the relationships between genotypes. A three-dimensional graph was created based on the value of each sample in the first (PC1), second (PC2), and third (PC3) principal components (Figure 4). The first, second, and third principal components explained 8.4%, 6.3%, and 4.8% of the total genetic variability, respectively. The scattered dots of different colors in the PCA figure represent samples of different populations, and the results show that the accessions are clustered together, indicating that the differences of these accessions are small.
A neighbor-joining analysis was performed, and the 232 germplasms were classified into three main groups, designated as groups I, II, and III (Figure 5). The distribution of the C. hystrix germplasms in the inferred groups is shown in Table S5. The first main group (I) included 61 germplasms. The second main group (II) included 68 germplasms. The third main group (III) included 103 germplasms. From a geographic origin perspective, some of the germplasms from the same geographic origin clustered in the different group (Table S4). This shows that the three groups classified by phylogenetic analysis contained germplasms from different geographical locations. The neighbor-joining dendrogram based on the genetic distance between individual trees was used to determine the genetic relationship among C. hystrix accessions, and a similar result of structure analysis at K = 3 was obtained.
In order to understand the level of genetic differentiation and reflect the source of variation among C. hystrix populations, the source of variation was divided into two levels (between different populations and within populations), and an AMOVA analysis of molecular variance (Table 4) was performed on the C. hystrix populations. The results showed that the genetic variation of C. hystrix populations mainly came from individuals within the population, and most of the variation was within populations (96%), whereas 4% of the variation was between populations.

3.3. Core Collection Establishment and Evaluation

Using QGAStation software to construct core collection based on phenotypic data, a total of 126 core collections were constructed. Among the 126 core collections, three with CR < 80% were removed (Figure 6, Table S6); three core collections had a CR of 10%. Eventually, 123 core collections had a mean difference (MD) of <20% and coincidence rate of range (CR) > 80%, indicating that these 123 core collections are good representations of the genetic diversity of the original collection. According to the maximum variance difference (VD) and rate of variation in the coefficient of variation (VR) values at each sampling ratio (Table 5), we found that the VR had maximum values at the 10% sampling ratio, and the CR and VD had maximum values at the 15% sampling ratio. With the increase in sampling proportion, the VR gradually decreased. Therefore, 15% is the optimal sampling ratio. Under the preferred sampling method (D3), the CR of the core collection constructed was 100%, but the CR constructed by the deviation sampling method (D2) was lower than 100%. Therefore, the best sampling method is the preferred sampling method (D3). At the 15% sampling ratio, the VD value at C3 was greater than C2, and the VR value was similar; thus, the best clustering method was determined to be the mediate distance method (C3). We conclude that the core collection generated from the 15% sampling ratio (B2C3D3) is the best core collection, which has 32 clones.
Core Finder software was used to analyze the SSR data with the M strategy based on the principle of maximizing alleles. The core collection Mc1 having a sample size of 158 retained 100% of the allele number of the original collection and had increased genetic parameters (Ne, He, I, and PIC); thus, Mc1 with its strong genetic diversity should be a good representation of the original collection. In addition, Core Hunter 3 was used to determine the optimal size of the core collection using 10 preset sampling ratios (Table 6). The six genetic diversity parameters (Na, Ne, Ho, He, I, and PIC) of the core collection and the original collection were compared. We found that the core collections with a sampling ratio of less than 50% had significantly different Na and Ho values than the original collection; thus, the optimal sampling ratios is 55% (H-55, referred to henceforth as Mc2). To further determine the better core collection between Mc1 and Mc2, we analyzed the Ne, I, Ho, He, and PIC of the constructed core collection (Table 7). The core collection Mc1 and Mc2 preserved 68.1% and 55.2% of the original collection resources, respectively. The t-test results showed that the six genetic diversity parameters of core collection Mc1 and Mc2 were not significantly different from those of the original collection, indicating that both core collections could be a good representation of the original population. Based on the values summarized in both Table 6 and Table 7, the Ne, He, I, and PIC values of Mc1 obtained by Core Finder were all higher than those of the original collection. This indicates that the genetic redundancy in the original collection was removed from Mc1 and the corresponding genetic diversity parameters were increased. Only the He and PIC values of Mc2 obtained by Core Hunter 3 were higher than those of the original collection, and the reservation rates of all six genetic parameters were lower than those of Mc1. Moreover, the proportion of Na retention of Mc1 generated by Core Finder was 99.9%, and the retention rates of Ne, He, I, and PIC were all above 100%; PIC ≥ 0.5 had high polymorphism, indicating it is a good core collection. Therefore, the best core collection was identified as Mc1, which has 158 clones.
Finally, a total of 157 clones were obtained by combining the phenotypic core collection (B2C3D3-15) and molecular core collection (Mc1) into the final core set (BM). In order to check if the BM could effectively represent the genetic diversity of the whole germplasm, a principal coordinates analysis (PCoA) was used to generate a distribution map of the core and original collection with SSR data (Figure 7). The results showed that the distribution of the core collection and original collection basically coincides in the middle part, indicating that this part of the core collection is a good representation of the original collection. However, there was deviation in the upper right and lower right part.

4. Discussion

4.1. Genetic Diversity of C. hystrix Germplasm Resources

Characterizing breeding collection germplasms is crucial in plant breeding as the genetic advancement of economically valuable traits relies on the genetic diversity available within the breeding gene pool. Learning about genetic diversity also assists in minimizing the use of closely related clones as parents in breeding programs. Genetic diversity is an integral part of all biological diversity; it is the basis of biological evolution and species differentiation and is of great significance for population maintenance, reproduction, and adaptation to habitat changes. The higher the genetic diversity, the more likely a population is to adapt to different environments, and variations in DNA sequences are the primary drivers of such diversity [50]. Molecular markers provide powerful tools for genetic diversity analyses and the establishment of core collections. In recent years, various molecular markers such as RAPD, SSR, and ISSR have been used to study the analysis of phylogeny, inter-species relationships, and genetic diversity of forest species including Pinus leucodermis, Eucalyptus globulus, Swietenia macrophylla, and Populus deltoides [51,52,53,54]. It has been reported that SSRs are abundant and ubiquitous in prokaryotic and eukaryotic genomes [55,56]. SSRs offer high-resolution markers to breeding programs far beyond the traditionally used approaches solely depending on pedigrees [7] or phenotypic data [57]. Consequently, SSRs have become the most popular marker.
C. hystrix is a precious local wood and an efficient multi-purpose fast-growing tree species in South China. Genetic diversity, population structure, and molecular markers knowledge may accelerate the selection of desirable traits in C. hystrix. Nei’s gene diversity, the observed heterozygosity and expected heterozygosity, the Shannon–Wiener index, the polymorphism information content, etc., have all been used to evaluate the level of genetic diversity of plant species [50]. The high number of alleles obtained in some studies may be due to the use of a large amount of highly diversified plant material [58,59] as well as the high number of samples employed in the analysis. In the present study, a total of 335 alleles was revealed using 32 SSR markers, with an average 10 alleles per locus, revealing a high level of variability within a sample set. This high average of alleles per locus can be attributed to the high genetic diversity in the investigated genotypes. The PIC value affords a fairer estimation of diversity than the actual number of alleles because it takes into account the relative frequencies of each allele present [60,61]. In our study, the overall average PIC for the SSR loci value was 0.889. All SSRs had PIC values ranging from 0.744 to 0.958. The SSRs having PIC values ranging from 0.25 to 0.5 are considered moderately informative [49]. This result was also reported for Xanthoceras sorbifoliai, and higher PIC and genetic diversity scores were reported in studies using SSRs [62]. According to the genetic diversity of the 17 C. hystrix populations, the P2 population (Bobai, Guangxi) had the highest genetic diversity, while the P13 population (Jianghua, Hunan) had the lowest genetic diversity. The genetic diversity of the P13 population was significantly lower than that of the other populations, which may be due to the small number of samples. As previously demonstrated, the SSR assay approach is appropriate for genetic relationship studies [63,64], and it proved to be an efficient tool for the assessment of the genetic diversity of C. hystrix and identification of its populations in China.

4.2. SSR-Based Genetic Relationships among C. hystrix Germplasm Resources

Population structure is an important component in association mapping analyses between molecular markers and traits. Differences in population genetic structures reflect genetic diversity and convey the adaptation potential of a species to its changing environment [65]. To understand the genetic relationships and population genetic structure of the C. hystrix germplasm at the genomic level, the SSR data of 232 germplasms, STRUCTURE analysis, UPGMA cluster analysis, and PCoA analyses were used to thoroughly investigate the genetic structure of C. hystrix. Based on the SSRs and multiple analyses, including population structure and phylogenetic analyses, it was confirmed that the within population was clearly clustered to three groups, which is more than previous studies using SSR and ISSR molecular marker analysis [27,66]. Both NJ and Bayesian model-based clustering studies failed to indicate any definitive clustering among the germplasm accessions. Although they were clustered into three groups, the results were somewhat different. This may be caused by different clustering methods. We found that the results of the phylogenetic analysis and genetic population analysis were basically consistent and complemented one another, but they are not completely clustered according to geographical origin. This is mainly because the elite germplasms used in this test were all selected and obtained from local gene pool. Moreover, in the long-term selection process, germplasms from different provinces were introduced or exchanged. Wang et al. [67] reported similar results in a study conducted on 119 Xanthoceras sorbifolium accessions.
The AMOVA analysis is a satisfactory grouping criterion for evaluating the variation within and among populations. Most scholars generally believe that the level of genetic diversity in woody plants with wide distribution, perennial, outcrossing, wind-borne seeds, or feeding by birds and animals is higher and that the genetic diversity within the populations is richer than that between populations [68,69,70]. In accordance with the genetic variation between and within the populations was significant (p < 0.001), the results indicated a greater within population variation (96%) than between populations (4%), and the genetic variation within populations was the primary source of the total variation. This indicates that there is little genetic differentiation among the populations, which matches the results recorded in previous studies [27,28]. The AMOVA results revealed that the population differentiation between the main genetic variation accounted for the largest proportion of genetic variability (Table 4), which was similar to that of previous studies by Li et al. [66] and Belaj et al. [71,72]. In conclusion, the genetic variation mainly came from the within population variation, the genetic diversity within the population was rich, and the genetic differentiation among the populations was small. This may be so for the following reasons: (1) C. hystrix is an outcrossing plant mainly pollinated by wind, its wide distribution provides opportunities for gene recombination and produces rich genetic diversity. At the same time, the gene exchange between populations was promoted and the differentiation between populations was reduced. (2) Plant genetic diversity is related to environmental adaptation [73]. The C. hystrix has a wide distribution range and strong adaptability. Under the action of long-term natural selection, a wide range of genetic variation has been produced, resulting in the formation of geographical provenances with different phenotypes and different requirements for environmental conditions. (3) The level of genetic diversity within populations was also influenced by the number of samples [74,75], and the genetic diversity within a population is proportional to the number of samples [76].

4.3. Core Collection Construction and Evaluation

In this study, the genetic diversity, population structure, population differentiation and, core collection of C. hystrix resources have been evaluated using SSR molecular markers, which identified rich genetic diversity among the C. hystrix germplasm within populations. Although the core collections of many trees have been established, the construction of a core collection of C. hystrix had not been conducted. According to the method of Hu [20], when MD% ≤ 20% and CR% > 80%, the core collection can be considered to recapitulate the genetic diversity of the original collection. The smaller the MD%, the larger the VD%, CR%, and VR%, and the more representative the core collection. The retention ratio of alleles should be greater than 70%, and the larger the other genetic parameters, the better [77,78]. We successfully established a core collection of C. hystrix with a 100% allelic representation based on 15 phenotypic traits and 32 SSR markers.
The rich diversity of different germplasm resources is detailed, the sampling ratio of core collection bank varied, and the sampling ratio of woody plants ranged from 10.00% to 45.00% [13,79,80]. In this study, seven sampling ratios were investigated (10%, 15%, 20%, 25%, 30%, 35%, and 40%), and the final sampling ratio of the best C. hystrix phenotype core collection was 15%, which is consistent with previous studies [81]. While phenotypic data can often reflect the genetic diversity of materials, perennial trees are vulnerable to environmental impacts. However, molecular marker technology has the advantages of low cost and fast data acquisition, and it is not affected by external factors. The core collection constructed by combining the genetic diversity and phenotypic variation of the original population [82,83,84] can improve the effectiveness of the constructed core collection.
Based on the M strategy, Core Finder software selects core collections by maximizing the number of alleles at each locus, which can eliminate genetic duplication in materials during construction and screen materials with a large number of alleles and low redundancy. Core Hunter 3 mainly screens the core collection based on maximizing genetic diversity and allele richness, and different sampling ratios can be set. In this study, a C. hystrix core collection was constructed using Core Finder and Core Hunter 3 software. The results showed that the retention rates of Core Finder in the four genetic parameters Na, Ne, Ho, and I were higher than those of Core Hunter 3 (Table 7). Moreover, the Ne, He, I, and PIC values of Mc1 obtained by Core Finder were higher than those of the original collection; this was expected as the diversity increases with the elimination of genetically similar accessions during core collection development [85]. In conclusion, Core Finder software is more suitable for the construction of the C. hystrix core collection. Consistent with this study, Gong et al. [86] constructed an astragalus core collection based on 380 astragalus samples using different methods, such as the M strategy-based method in Core Finder and stepwise sampling-based method in Core Hunter 3; the authors concluded that Core Finder software combined with the M strategy was the most suitable method for constructing the astragalus core collection. In this study, a core C. hystrix germplasm set, BM, was constructed based on 15 entries of phenotypic data and 32 SSR markers, which were composed of 157 C. hystrix accessions. The results of the principal component analysis showed that some of the core collection overlapped with the original collection and some were scattered around. The reason may be that when using Core Finder software to analyze SSR data and extract the core collection, the amount of data was too large. In the future, we can try to use other software to construct a core collection based on the SSR data and compare the results.

5. Conclusions

In this study, the genetic diversity, population structure, population differentiation, and core collection of C. hystrix resources have been evaluated using SSR molecular markers. The results showed that the genetic diversity of these SSR loci was rich. Moreover, C. hystrix samples were grouped into three clusters. We successfully established a core collection, BM, by combining 15 phenotypic data and 32 SSR molecular markers. We demonstrated that SSR markers were successful and effective for the assessment of the genetic diversity and structure of the C. hystrix populations. The established core collection can be used for future genome association analysis and breeding program research. This study provided a theoretical basis for germplasm resource management as well as the conservation and utilization of C. hystrix germplasm resources.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/genes13122383/s1, Table S1: List of 232 clones of C. hystrix; Table S2: Characteristics of 32 microsatellite primers developed for C. hystrix; Table S3: The primer information of the 32 pairs of SSR of the C. hystrix 17 populations (The first row in the table lists the 32 pairs of SSR molecular markers, and the first column is the sample number corresponding to the phenotype); Table S4: Three groups obtained from the STRUCTURE analysis result; Table S5: Phylogenetic tree clustering results of the phylogenetic trees based on the neighbor-joining method (Clustered into three groups, each group is represented by I, II, III; 61,68,103 represent the number of each group and list in detail what is in each type); Table S6: Candidate core collection obtained by a combination of 126. Three combinations were screened out according to CR < 80%, among which the MD value of 123 combinations was 0 (red font in the table). Figure S1: The classification results of structure.

Author Contributions

Methodology, F.X.; writing—original draft, N.L., Y.Y. and X.C.; software, N.L. and F.X.; investigation, N.L., Y.Y., R.W. and Z.L.; writing—review and editing, N.L.; supervision, F.X. and W.Z.; funding acquisition, W.P. and W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program of the Guangdong Province, grant number 2020B020215002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data necessary for confirming the conclusions of the article are present within the article, figures and tables, and within Supplementary tables.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yanchuk, A.D. A Quantitative Framework for Breeding and Conservation of Forest Tree Genetic Resources in British Columbia. Can. J. For. Res. 2001, 31, 566–576. [Google Scholar] [CrossRef]
  2. Glaszmann, J.C.; Kilian, B.; Upadhyaya, H.D.; Varshney, R.K. Accessing Genetic Diversity for Crop Improvement. Curr. Opin. Plant Biol. 2010, 13, 167–173. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. White, T.L.; Adams, W.T.; Neale, D.B. Forest Genetics; CABI: Wallingford, UK, 2007; ISBN 1-84593-286-2. [Google Scholar]
  4. Erlich, H.A.; Gelfand, D.; Sninsky, J.J. Recent Advances in the Polymerase Chain Reaction. Science 1991, 252, 1643–1651. [Google Scholar] [CrossRef] [PubMed]
  5. Mullis, K.B. The Unusual Origin of the Polymerase Chain Reaction. Sci. Am. 1990, 262, 56–65. [Google Scholar] [CrossRef] [PubMed]
  6. Liping, R.; Xiyuan, N.; Jixiang, H. Core Collection of a Representative Germplasm Population in Brassica napus. Sci. Agric. Sin. 2008, 41, 3521–3531. [Google Scholar]
  7. Brown, A.H.D. Core Collections: A Practical Approach to Genetic Resources Management. Genome 1989, 31, 818–824. [Google Scholar] [CrossRef]
  8. Brown, A.H.D. The Case for Core Collections. In The Use of Plant Genetic Resources; Cambridge University Press: Cambridge, UK, 1989; pp. 136–156. [Google Scholar]
  9. Frankel, O.H.; Brown, A.H.D. Current Plant Genetic Resources—A Critical Appraisal. In Proceedings of the XV International Congress of Genetics, Genetics: New Frontiers, New Delhi, India, 12–21 December 1983; Chopra, V.L., Joshi, B.C., Sharma, R.P., Bansal, H.C., Eds.; IBH Publishing Co.: Oxford, UK, 1984. [Google Scholar]
  10. Frankel, O.H. Genetic Perspectives of Germplasm Conservation. Genet. Manip. Impact Man Soc. 1984, 161, 170. [Google Scholar]
  11. Frankel, O.H. Plant Genetic Resources Today: A Critical Appraisal. Crop Genet. Resour. Conserv. Eval. 1984, 241–256. [Google Scholar]
  12. Brown, A.H.D.; Frankel, O.H.; Marshall, D.R.; Williams, J.T. The Use of Plant Genetic Resources; Cambridge University Press: Cambridge, UK, 1989; ISBN 0-521-36886-3. [Google Scholar]
  13. Belaj, A.; del Dominguez-García, M.C.; Atienza, S.G.; Martín Urdíroz, N.; De la Rosa, R.; Satovic, Z.; Martín, A.; Kilian, A.; Trujillo, I.; Valpuesta, V. Developing a Core Collection of Olive (Olea europaea L.) Based on Molecular Markers (DArTs, SSRs, SNPs) and Agronomic Traits. Tree Genet. Genomes 2012, 8, 365–378. [Google Scholar] [CrossRef]
  14. Wang, J.C.; Jin, H.U.; Zhang, C.F.; Zhang, S. Assessment on Evaluating Parameters of Rice Core Collections Constructed by Genotypic Values and Molecular Marker Information. Rice Sci. 2007, 14, 101–110. [Google Scholar] [CrossRef]
  15. Zhang, H.; Zhang, D.; Wang, M.; Sun, J.; Qi, Y.; Li, J.; Han, L.; Qiu, Z.; Tang, S.; Li, Z. A Core Collection and Mini Core Collection of Oryza sativa L. in China. Theor. Appl. Genet. 2011, 122, 49–61. [Google Scholar] [CrossRef] [PubMed]
  16. Miyamoto, N.; Ono, M.; Watanabe, A. Construction of a Core Collection and Evaluation of Genetic Resources for Cryptomeria japonica (Japanese Cedar). J. For. Res. 2015, 20, 186–196. [Google Scholar] [CrossRef]
  17. Guo, Q.; Liu, J.; Li, J.; Cao, S.; Zhang, Z.; Zhang, J.; Zhang, Y.; Deng, Y.; Niu, D.; Su, L. Genetic Diversity and Core Collection Extraction of Robinia pseudoacacia L. Germplasm Resources Based on Phenotype, Physiology, and Genotyping Markers. Ind. Crops Prod. 2022, 178, 114627. [Google Scholar] [CrossRef]
  18. Wang, J.C.; Hu, J.; Xu, H.M.; Zhang, S. A Strategy on Constructing Core Collections by Least Distance Stepwise Sampling. Theor. Appl. Genet. 2007, 115, 1–8. [Google Scholar] [CrossRef]
  19. van Hintum, T.J.; Brown, A.H.D.; Spillane, C. Core Collections of Plant Genetic Resources; Bioversity International: Rome, Italy, 2000; ISBN 92-9043-454-6. [Google Scholar]
  20. Hu, J.; Zhu, J.; Xu, H.M. Methods of Constructing Core Collections by Stepwise Clustering with Three Sampling Strategies Based on the Genotypic Values of Crops. Theor. Appl. Genet. 2000, 101, 264–268. [Google Scholar] [CrossRef]
  21. Yao, Q.; Fang, P.; Yang, K.; Pan, G. Methods of Constructing a Core Collection of Maize Landraces in Southwest China Based on SSR Data. J. Hunan Agric. Univ. 2009, 35, 225–228. [Google Scholar]
  22. Barut, M.; Nadeem, M.A.; Karaköy, T.; Baloch, F.S. DNA Fingerprinting and Genetic Diversity Analysis of World Quinoa Germplasm Using iPBS-Retrotransposon Marker System. Turk. J. Agric. For. 2020, 44, 479–491. [Google Scholar] [CrossRef]
  23. Guney, M.; Kafkas, S.; Keles, H.; Zarifikhosroshahi, M.; Gundesli, M.A.; Ercisli, S.; Necas, T.; Bujdoso, G. Genetic Diversity among Some Walnut (Juglans regia L.) Genotypes by SSR Markers. Sustainability 2021, 13, 6830. [Google Scholar] [CrossRef]
  24. Tuna, G.S.; Yücel, G.; Aşçioğul, T.K.; Ateş, D.; Eşiyok, D.; Tanyolac, M.B.; Tuna, M. Molecular Cytogenetic Characterization of Common Bean (Phaseolus vulgaris L.) Accessions. Turk. J. Agric. For. 2020, 44, 612–630. [Google Scholar] [CrossRef]
  25. Garris, A.J.; McCouch, S.R.; Kresovich, S. Population Structure and Its Effect on Haplotype Diversity and Linkage Disequilibrium Surrounding the Xa5 Locus of Rice (Oryza sativa L.). Genetics 2003, 165, 759–769. [Google Scholar] [CrossRef]
  26. Wang, M.G.; Tu, Z.J.; Qiu, X.J.; Zhu, J.Y.; Jiang, Y.; Ye, Z.Y. Genetic Diversity of Castanopsis hystrix by RAPD Analysis. J. Xiamen Univ. 2006, 45, 570–574. (In Chinese) [Google Scholar]
  27. Zhu, J.Y.; Jiang, Y.; Tan, Z.; Tan, Y.; Xie, Y.; Jiang, H.; Dong, M.; Lei, J.; Su, X. Genetic Diversity of Natural and cultivated Populations of Castanopsis hystrix. J. Agric. Sci. Tech. 2012, 13, 1516–1520. (In Chinese) [Google Scholar]
  28. Xu, B.; Zhang, F.Q.; Pan, W.; Liu, Y.C. Genetic Diversity and Genetic Structure in Natural Populations of Castanopsis hystrix from Its Main Distribution in China. Sci. Silvae Sin. 2013, 49, 162–166. (In Chinese) [Google Scholar]
  29. Yang, F. Hereditary Constitution of Castanponsis hystrix Superior Genealogy by ISSR Marker and Analysis. Cent. South Univ. Technol. 2012. (In Chinese) [Google Scholar]
  30. Powell, W.; Morgante, M.; Andre, C.; Hanafey, M.; Vogel, J.; Tingey, S.; Rafalski, A. The Comparison of RFLP, RAPD, AFLP and SSR (Microsatellite) Markers for Germplasm Analysis. Mol. Breed. 1996, 2, 225–238. [Google Scholar] [CrossRef]
  31. Akkaya, M.S.; Bhagwat, A.A.; Cregan, P.B. Length Polymorphisms of Simple Sequence Repeat DNA in Soybean. Genetics 1992, 132, 1131–1139. [Google Scholar] [CrossRef]
  32. Huang, X.; Börner, A.; Röder, M.; Ganal, M. Assessing Genetic Diversity of Wheat (Triticum aestivum L.) Germplasm Using Microsatellite Markers. Theor. Appl. Genet. 2002, 105, 699–707. [Google Scholar] [CrossRef]
  33. Zhang, D.; Bai, G.; Zhu, C.; Yu, J.; Carver, B.F. Genetic Diversity, Population Structure, and Linkage Disequilibrium in US Elite Winter Wheat. Plant Genome 2010, 3, 117–127. [Google Scholar] [CrossRef]
  34. Hao, C.; Wang, L.; Ge, H.; Dong, Y.; Zhang, X. Genetic Diversity and Linkage Disequilibrium in Chinese Bread Wheat (Triticum aestivum L.) Revealed by SSR Markers. PLoS ONE 2011, 6, e17279. [Google Scholar] [CrossRef] [Green Version]
  35. Yang, Y.M. Genetic Diversity based on SSR and Its Association Analysis with Phenotypic Traits in Germplasm Resources of Castanopsis hystrix. South China Agric. Univ. 2019. (In Chinese) [Google Scholar] [CrossRef]
  36. Glaubitz, J.C. Convert: A User-friendly Program to Reformat Diploid Genotypic Data for Commonly Used Population Genetic Software Packages. Mol. Ecol. Notes 2004, 4, 309–310. [Google Scholar] [CrossRef]
  37. Marshall, T.C.; Slate, J.; Kruuk, L.E.B.; Pemberton, J.M. Statistical Confidence for Likelihood-based Paternity Inference in Natural Populations. Mol. Ecol. 1998, 7, 639–655. [Google Scholar] [CrossRef] [Green Version]
  38. Kalinowski, S.T.; Taper, M.L.; Marshall, T.C. Revising How the Computer Program CERVUS Accommodates Genotyping Error Increases Success in Paternity Assignment. Mol. Ecol. 2007, 16, 1099–1106. [Google Scholar] [CrossRef] [PubMed]
  39. Peakall, R.; Smouse, P.E. GENALEX 6: Genetic Analysis in Excel. Population Genetic Software for Teaching and Research. Mol. Ecol. Notes 2010, 6, 288–295. [Google Scholar] [CrossRef]
  40. Peakall, R.; Smouse, P.E. GenAlEx 6.5: Genetic Analysis in Excel. Population Genetic Software for Teaching and Research—An Update. Bioinformatics 2012, 28, 2537–2539. [Google Scholar] [CrossRef] [Green Version]
  41. Pritchard, J.K.; Stephens, M.J.; Donnelly, P.J. Inference of Population Structure Using Multilocus Genotype Data. Genetics 2000, 155, 945–959. [Google Scholar] [CrossRef]
  42. Evanno, G.S.; Regnaut, S.J.; Goudet, J. Detecting the Number of Clusters of Individuals Using the Software Structure: A Simulation Study. Mol. Ecol. 2005, 14, 2611–2620. [Google Scholar] [CrossRef] [Green Version]
  43. Earl, D.A.; Vonholdt, B.M. Structure Harvester: A Website and Program for Visualizing Structure Output and Implementing the Evanno Method. Conserv. Genet. Resour. 2012, 4, 359–361. [Google Scholar] [CrossRef]
  44. Liu, K.; Muse, S.V. PowerMarker: An Integrated Analysis Environment for Genetic Marker Analysis. Bioinformatics 2005, 21, 2128–2129. [Google Scholar] [CrossRef] [Green Version]
  45. Liu, J.; Liao, K.; Cao, Q.; Sun, Q.; Liu, H.; Jia, Y. Establishment of Wild Apricot Core Collection Based on Phenotypic Characters. J. Fruit Sci. 2015, 32, 787–796. [Google Scholar]
  46. Cipriani, G.; Spadotto, A.; Jurman, I.; Di Gaspero, G.; Crespan, M.; Meneghetti, S.; Frare, E.; Vignani, R.; Cresti, M.; Morgante, M. The SSR-Based Molecular Profile of 1005 Grapevine (Vitis vinifera L.) Accessions Uncovers New Synonymy and Parentages, and Reveals a Large Admixture amongst Varieties of Different Geographic Origin. Theor. Appl. Genet. 2010, 121, 1569–1585. [Google Scholar] [CrossRef] [PubMed]
  47. De Beukelaer, H.; Davenport, G.F.; Fack, V. Core Hunter 3: Flexible Core Subset Selection. BMC Bioinform. 2018, 19, 1–12. [Google Scholar] [CrossRef] [PubMed]
  48. Escribano, P.; Viruel, M.A.; Hormaza, J.I. Comparison of Different Methods to Construct a Core Germplasm Collection in Woody Perennial Species with Simple Sequence Repeat Markers. A Case Study in Cherimoya (Annona cherimola, Annonaceae), an Underutilised Subtropical Fruit Tree Species. Ann. Appl. Biol. 2008, 153, 25–32. [Google Scholar] [CrossRef]
  49. Botstein, D.; White, R.L.; Skolnick, M.; Davis, R.W. Construction of a Genetic Linkage Map in Man Using Restriction Fragment Length Polymorphisms. Am. J. Hum. Genet. 1980, 32, 314. [Google Scholar] [PubMed]
  50. Qiao, D.; Wang, P.; Wang, S.; Li, L.; Gao, L.; Yang, R.; Wang, Q.; Li, Y. Genetic Diversity Analysis of Lagerstroemia Germplasm Resources Based on SNP Markers. J. Nanjing. For. Univ. Nat. Sci. Ed. 2020, 44, 21–28. [Google Scholar]
  51. Bucci, G.; Vendramin, G.G.; Lelli, L.; Vicario, F. Assessing the Genetic Divergence of Pinus leucodermis Ant. Endangered Populations: Use of Molecular Markers for Conservation Purposes. Theor. Appl. Genet. 1997, 95, 1138–1146. [Google Scholar] [CrossRef]
  52. Nesbitt, K.A.; Potts, B.M.; Vaillancourt, R.E.; Reid, J.B. Fingerprinting and Pedigree Analysis in Eucalyptus globulus Using RAPDs. Silvae Genet. 1997, 46, 6–10. [Google Scholar]
  53. Gillies, A.C.M.; Navarro, C.; Lowe, A.J.; Newton, A.C.; Hernandez, M.; Wilson, J.; Cornelius, J.P. Genetic Diversity in Mesoamerican Populations of Mahogany (Swietenia macrophylla), Assessed Using RAPDs. Heredity 1999, 83, 722–732. [Google Scholar] [CrossRef]
  54. Chen, C.; Chu, Y.; Ding, C.; Su, X.; Huang, Q. Genetic Diversity and Population Structure of Black Cottonwood (Populus deltoides) Revealed Using Simple Sequence Repeat Markers. BMC Genet. 2020, 21, 1–12. [Google Scholar] [CrossRef] [Green Version]
  55. Silver, L.M. Bouncing off Microsatellites. Nat. Genet. 1992, 2, 8–9. [Google Scholar] [CrossRef]
  56. Mrázek, J.; Guo, X.; Shah, A. Simple Sequence Repeats in Prokaryotic Genomes. Proc. Natl. Acad. Sci. USA 2007, 104, 8472–8477. [Google Scholar] [CrossRef] [PubMed]
  57. Widrlechner, M.P. Genetic Markers and Plant Genetic Resource Management. Plant Breed. Rev. 2010, 13, 11. [Google Scholar]
  58. Baldoni, L.; Cultrera, N.G.; Mariotti, R.; Ricciolini, C.; Arcioni, S.; Vendramin, G.G.; Buonamici, A.; Porceddu, A.; Sarri, V.; Ojeda, M.A. A Consensus List of Microsatellite Markers for Olive Genotyping. Mol. Breed. 2009, 24, 213–231. [Google Scholar] [CrossRef]
  59. Belaj, A.; Muñoz-Diez, C.; Baldoni, L.; Porceddu, A.; Barranco, D.; Satovic, Z. Genetic Diversity and Population Structure of Wild Olives from the North-Western Mediterranean Assessed by SSR Markers. Ann. Bot. 2007, 100, 449–458. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  60. Laborda, P.R.; Oliveira, K.M.; Garcia, A.A.F.; Paterniani, M.; De Souza, A.P. Tropical Maize Germplasm: What Can We Say about Its Genetic Diversity in the Light of Molecular Markers? Theor. Appl. Genet. 2005, 111, 1288–1299. [Google Scholar] [CrossRef]
  61. Cömertpay, G.; Baloch, F.S.; Kilian, B.; Ülger, A.C.; Özkan, H. Diversity Assessment of Turkish Maize Landraces Based on Fluorescent Labelled SSR Markers. Plant Mol. Biol. Rep. 2012, 30, 261–274. [Google Scholar] [CrossRef]
  62. Shen, Z.; Zhang, K.; Ma, L.; Duan, J.; Ao, Y. Analysis of the Genetic Relationships and Diversity among 11 Populations of Xanthoceras sorbifolia Using Phenotypic and Microsatellite Marker Data. Electron. J. Biotechnol. 2017, 26, 33–39. [Google Scholar] [CrossRef]
  63. Ben-Ayed, R.; Sans-Grout, C.; Moreau, F.; Grati-Kamoun, N.; Rebai, A. Genetic Similarity among Tunisian Olive Cultivars and Two Unknown Feral Olive Trees Estimated through SSR Markers. Biochem. Genet. 2014, 52, 258–268. [Google Scholar] [CrossRef]
  64. Sonnante, G.; Carluccio, A.V.; De Paolis, A.; Pignone, D. Identification of Artichoke SSR Markers: Molecular Variation and Patterns of Diversity in Genetically Cohesive Taxa and Wild Allies. Genet. Resour. Crop Evol. 2008, 55, 1029–1046. [Google Scholar] [CrossRef]
  65. de Oliveira Melo, A.T.; Coelho, A.S.G.; Pereira, M.F.; Blanco, A.J.V.; Franceschinelli, E.V. High Genetic Diversity and Strong Spatial Genetic Structure in Cabralea canjerana (Vell.) Mart. (Meliaceae): Implications to Brazilian Atlantic Forest Tree Conservation. Nat. Conserv. 2014, 12, 129–133. [Google Scholar] [CrossRef] [Green Version]
  66. Li, J.; Ge, X.J.; Cao, H.L.; Ye, W.H. Chloroplast DNA Diversity in Castanopsis hystrix Populations in South China. For. Ecol. Manag. 2007, 243, 94–101. [Google Scholar] [CrossRef]
  67. Wang, Y.; Li, Y. Population Genetics and Development of a Core Collection from Elite Germplasms of Xanthoceras sorbifolium Based on Genome-Wide SNPs. Forests 2022, 13, 338. [Google Scholar] [CrossRef]
  68. Hamrick, J.L. Isozymes and the Analysis of Genetic Structure in Plant Populations. In Isozymes in Plant Biology; Springer: Dordrecht, The Netherlands, 1989; pp. 87–105. [Google Scholar]
  69. Hamrick, J.L.; Godt, M.J.W.; Sherman-Broyles, S.L. Factors Influencing Levels of Genetic Diversity in Woody Plant Species. In Population Genetics of Forest Trees; Springer: Dordrecht, The Netherlands, 1992; pp. 95–124. [Google Scholar]
  70. da Silva, T.A.; Cantagalli, L.B.; Saavedra, J.; Lopes, A.D.; Mangolin, C.A.; da Silva, M.d.F.P.; Scapim, C.A. Population Structure and Genetic Diversity of Brazilian Popcorn Germplasm Inferred by Microsatellite Markers. Electron. J. Biotechnol. 2015, 18, 181–187. [Google Scholar] [CrossRef] [Green Version]
  71. Belaj, A.; Leon, L.; Satovic, Z.; De la Rosa, R. Variability of wild olives (Olea europaea subsp. europaea var. sylvestris) analysed by agro-morphological traits and SSR markers. Sci. Hortic. 2011, 129, 561–569. [Google Scholar] [CrossRef]
  72. Belaj, A.; Muñoz-Diez, C.; Baldoni, L.; Satovic, Z.; Barranco, D. Genetic diversity and relationships of wild and cultivated olives at regional level in Spain. Sci. Hortic. 2010, 124, 323–330. [Google Scholar] [CrossRef]
  73. Semagn, K.; Bjornstad, A.; Stedje, B.; Bekele, E. Comparison of Multivariate Methods for the Analysis of Genetic Resources and Adaptation in Phytolacca Dodecandra Using RAPD. Theor. Appl. Genet. 2000, 101, 1145–1154. [Google Scholar] [CrossRef]
  74. Nybom, H.; Bartish, I.V. Effects of Life History Traits and Sampling Strategies on Genetic Diversity Estimates Obtained with RAPD Markers in Plants. Perspect. Plant Ecol. Evol. Syst. 2000, 3, 93–114. [Google Scholar] [CrossRef]
  75. Gaudett, M.; Salomon, B.; Sun, G. Molecular Variation and Population Structure in Elymus trachycaulus and Comparison with Its Morphologically Similar E. alaskanus. Plant Syst. Evol. 2005, 250, 81–91. [Google Scholar] [CrossRef]
  76. Persson, H.A.; Lundquist, K.; Nybom, H. RAPD Analysis of Genetic Variation within and among Populations of Turk’s-cap Lily (Lilium martagon L.). Hereditas 1998, 128, 213–220. [Google Scholar] [CrossRef]
  77. McKhann, H.I.; Camilleri, C.; Bérard, A.; Bataillon, T.; David, J.L.; Reboud, X.; Le Corre, V.; Caloustian, C.; Gut, I.G.; Brunel, D. Nested Core Collections Maximizing Genetic Diversity in Arabidopsis thaliana. Plant J. 2004, 38, 193–202. [Google Scholar] [CrossRef]
  78. Ronfort, J.; Bataillon, T.; Santoni, S.; Delalande, M.; David, J.L.; Prosperi, J.M. Microsatellite Diversity and Broad Scale Geographic Structure in a Model Legume: Building a Set of Nested Core Collection for Studying Naturally Occurring Variation in Medicago truncatula. BMC Plant Biol. 2006, 6, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  79. Gailing, O.; Nelson, C.D. Genetic Variation Patterns of American Chestnut Populations at EST-SSRs. Botany 2017, 95, 799–807. [Google Scholar] [CrossRef]
  80. Duan, H.; Cao, S.; Zheng, H.; Hu, D.; Lin, J.; Cui, B.; Lin, H.; Hu, R.; Wu, B.; Sun, Y. Genetic Characterization of Chinese Fir from Six Provinces in Southern China and Construction of a Core Collection. Sci. Rep. 2017, 7, 1–10. [Google Scholar] [CrossRef] [Green Version]
  81. Liu, X.; Cai, Q.; Ma, L.; Wu, C.; Lu, X.; Ying, X.; Fan, Y. Strategy of Sampling for Pre-Core Collection of Sugarcane Hybrid. Acta Agron. Sin. 2009, 35, 1209–1216. [Google Scholar] [CrossRef]
  82. Xue, H.; Yu, X.; Fu, P.; Liu, B.; Zhang, S.; Li, J.; Zhai, W.; Lu, N.; Zhao, X.; Wang, J. Construction of the Core Collection of Catalpa fargesii f. Duclouxii (Huangxinzimu) Based on Molecular Markers and Phenotypic Traits. Forests 2021, 12, 1518. [Google Scholar] [CrossRef]
  83. Sa, K.J.; Kim, D.M.; Oh, J.S.; Park, H.; Hyun, D.Y.; Lee, S.; Rhee, J.H.; Lee, J.K. Construction of a Core Collection of Native Perilla Germplasm Collected from South Korea Based on SSR Markers and Morphological Characteristics. Sci. Rep. 2021, 11, 1–13. [Google Scholar] [CrossRef]
  84. Wang, X.; Cao, Z.; Gao, C.; Li, K. Strategy for the Construction of a Core Collection for Pinus yunnanensis Franch. to Optimize Timber Based on Combined Phenotype and Molecular Marker Data. Genet. Resour. Crop Evol. 2021, 68, 3219–3240. [Google Scholar] [CrossRef]
  85. Agrama, H.A.; Yan, W.; Lee, F.; Fjellstrom, R.; Chen, M.-H.; Jia, M.; McClung, A. Genetic Assessment of a Mini-core Subset Developed from the USDA Rice Genebank. Crop Sci. 2009, 49, 1336–1346. [Google Scholar] [CrossRef]
  86. Gong, F.; Geng, Y.; Zhang, P.; Zhang, F.; Fan, X.; Liu, Y. Genetic Diversity and Structure of Core Collection of Huangqi (Astragalus) Developed by Genomic Simple Sequence Repeat Markers. Genet. Resour. Crop Evol. 2022. [Google Scholar] [CrossRef]
Figure 1. Maps showing the geographic location of the collection site of the 17 C. hystrix populations sampled. The red dot in the figure represents 17 sampling sites The Chinese letters in the picture represent six provinces. The lower right corner represents the South China Sea Islands, which are an integral part of China.
Figure 1. Maps showing the geographic location of the collection site of the 17 C. hystrix populations sampled. The red dot in the figure represents 17 sampling sites The Chinese letters in the picture represent six provinces. The lower right corner represents the South China Sea Islands, which are an integral part of China.
Genes 13 02383 g001
Figure 2. The STRUCTURE estimation of the number of populations for K values ranging from 1 to 10. The K value with the highest delta K represents the suggested cluster number.
Figure 2. The STRUCTURE estimation of the number of populations for K values ranging from 1 to 10. The K value with the highest delta K represents the suggested cluster number.
Genes 13 02383 g002
Figure 3. Phylogenetic tree of the accession with three different colors indicating three groups obtained from the STRUCTURE analysis result. Red, green and blue represent Q1, Q2 and Q3 respectively. The vertical coordinate of each subgroup indicates the membership coefficients for each C. hystrix accession.
Figure 3. Phylogenetic tree of the accession with three different colors indicating three groups obtained from the STRUCTURE analysis result. Red, green and blue represent Q1, Q2 and Q3 respectively. The vertical coordinate of each subgroup indicates the membership coefficients for each C. hystrix accession.
Genes 13 02383 g003
Figure 4. Principal component analysis (PCA) of 232 C. hystrix elite germplasms by https://www.bioladder.cn/web/#/chart/62. PCA plot of the first three components (PC1, PC2, and PC3) of the 232 germplasms.
Figure 4. Principal component analysis (PCA) of 232 C. hystrix elite germplasms by https://www.bioladder.cn/web/#/chart/62. PCA plot of the first three components (PC1, PC2, and PC3) of the 232 germplasms.
Genes 13 02383 g004
Figure 5. Phylogenetic tree of all 232 C. hystrix elite germplasms based on the SSRs built by the neighbor-joining method in PowerMarker 3.25 software. I, II and III represent the three categories of clustering, respectively.
Figure 5. Phylogenetic tree of all 232 C. hystrix elite germplasms based on the SSRs built by the neighbor-joining method in PowerMarker 3.25 software. I, II and III represent the three categories of clustering, respectively.
Genes 13 02383 g005
Figure 6. Percentage of trait differences between the core collections and the initial collection obtained by different combinations. B1 and B2 represent Euclidean distance and Mahalanobis distance, respectively; C1, C2, and C3 represent the unweighted pair-group average method, Ward’s method, and mediate distance method in the systematic clustering, respectively. D1, D2, and D3 represent the random sampling, deviation sampling, and preferred sampling methods, respectively. Seven sampling ratios of 10%, 15%, 20%, 25%, 30%, 35%, and 40% were set. The percentage of the mean difference (MD), percentage of the variance difference (VD), coincidence rate of the range (CR) and rate of variation in the coefficient of variation (VR). * represents the maximum for each sampling ratio in the figure.
Figure 6. Percentage of trait differences between the core collections and the initial collection obtained by different combinations. B1 and B2 represent Euclidean distance and Mahalanobis distance, respectively; C1, C2, and C3 represent the unweighted pair-group average method, Ward’s method, and mediate distance method in the systematic clustering, respectively. D1, D2, and D3 represent the random sampling, deviation sampling, and preferred sampling methods, respectively. Seven sampling ratios of 10%, 15%, 20%, 25%, 30%, 35%, and 40% were set. The percentage of the mean difference (MD), percentage of the variance difference (VD), coincidence rate of the range (CR) and rate of variation in the coefficient of variation (VR). * represents the maximum for each sampling ratio in the figure.
Genes 13 02383 g006
Figure 7. Principal coordinate distribution of the C. hystrix core collection and original collection.
Figure 7. Principal coordinate distribution of the C. hystrix core collection and original collection.
Genes 13 02383 g007
Table 1. Details of sampling points in the 17 province areas of C. hystrix.
Table 1. Details of sampling points in the 17 province areas of C. hystrix.
SiteGroup NumberNumber of ClonesLongitude and LatitudeClone Number
Guangxi PuBeiP112109.55° E 22.27° NA1~A7, F1~F2
Guangxi BoBaiP226109.98° E 22.27° NB1-1~B10, C8~C9, E12~E16
Guangxi RongXianP315110.53° E 22.87° NC1~C7-1, C33, J28~J32
Guangxi PingXiangP49106.75° E 22.11° ND1~D8, I39
Guangxi DongLanP59107.36° E 24.53° NE1~E10
Guangdong XinYiP617110.90° E 22.36° NG1~G14
Guangdong GaoZhouP718110.50° E 21.54° NH1~H20
Guangdong LuHeP824115.65° E 23.30° NI1~I10, I20~I35, 30, 55
Guangdong ShiXingP98114.08° E 24.78° NI11~I18
Fujian JinShanP1023117.37° E 24.52° NJ1~J19
Fujian HuaFengP115117.53° E 25.02° NJ20~J21-1
Fujian GaoCheP129117.39° E 24.31° NJ22~J27
Hunan JiangHuaP132111.79° E 24.97° NK1~K2
Hainan LeDongP1415109.17° E 18.73° NL1~L18
Hainan ChangJiangP1512109.03° E 19.25° NL20~L33
Yunnan JingHongP1615100.79° E 22.00° NM1~M15
Yunnan SiMaoP1713101.00° E 22.79° NM16~M27, N1
Total 232 232
Table 2. Genetic diversity parameters of the 32 SSR loci.
Table 2. Genetic diversity parameters of the 32 SSR loci.
LocusNaNeHoHePICIFstNm
SSR01117.9010.9350.8620.9102.1650.0564.200
SSR0285.8940.9450.8110.8661.8670.0743.119
SSR03107.1280.8560.8390.9152.0450.0862.661
SSR041510.9310.9150.8940.9582.4690.0683.438
SSR05106.5650.8820.8280.8811.9840.0683.453
SSR06106.9280.7410.8320.9032.0280.0852.684
SSR07106.4950.8350.8220.8882.0080.0733.169
SSR08107.1280.8330.8120.8952.0180.1121.990
SSR09129.0130.9550.8670.9362.2750.0743.119
SSR1083.8320.7800.7040.7441.5970.0942.408
SSR1195.2020.7360.7330.8331.7560.1381.561
SSR12128.9830.9660.8580.9402.2480.0872.621
SSR13106.6310.9550.8240.8902.0010.0852.681
SSR14129.0390.9170.8580.9372.2270.0852.674
SSR1584.1480.7750.7350.7691.6200.0703.325
SSR16107.2940.8050.8400.9192.0660.0892.551
SSR17105.7420.8730.8080.8511.9240.0603.924
SSR1885.3800.8090.8030.8421.8310.0683.420
SSR19106.7450.8510.8250.8852.0120.0792.924
SSR20139.7860.8500.8820.9482.3500.0733.191
SSR2196.1580.9090.8220.8771.9190.0763.051
SSR22128.6730.8170.8540.9322.2180.0902.533
SSR2384.9430.7930.7650.8301.7400.0952.387
SSR24117.3460.9420.8490.9162.1000.0763.055
SSR25128.8400.8270.8630.9412.2390.0832.780
SSR26105.7150.8930.8010.8591.9220.0743.121
SSR2784.5190.7530.7480.8051.6640.0882.591
SSR28138.3640.7950.8410.9202.2010.0792.925
SSR29117.8970.9900.8490.9042.1450.0663.538
SSR30106.8480.9020.8350.8892.0320.0643.632
SSR31117.7220.8660.8410.9182.1130.0862.656
SSR32149.8800.8470.8730.9432.3450.0753.079
Mean107.1150.8610.8240.8892.0350.0812.948
Table 3. Genetic diversity parameters of the 17 populations.
Table 3. Genetic diversity parameters of the 17 populations.
PopulationNaNeHoHeI
Pop1118.0670.9040.8582.200
Pop2159.5190.8960.8762.395
Pop3128.4730.8770.8682.267
Pop496.8350.9230.8352.023
Pop596.5160.9020.8361.986
Pop6128.0970.8590.8562.217
Pop7138.9100.8850.8662.306
Pop8148.6690.8530.8712.324
Pop985.9720.8870.8151.886
Pop10137.6390.8660.8482.197
Pop1153.6340.8670.6911.346
Pop1296.4520.8550.8241.972
Pop1332.8210.8440.6091.025
Pop14116.9720.8160.8332.103
Pop15117.5240.7970.8462.120
Pop16117.1230.7660.8312.079
Pop17117.7300.8400.8512.156
Mean107.1150.8610.8242.035
Table 4. AMOVA of the original population.
Table 4. AMOVA of the original population.
Source of VariationDegree of FreedomSum of SquareMean of SquareEstimated VariancePercentage of Variation (%)Fstp
Among Populations16771.82148.2391.3384%0.042<0.001
Within Populations2156500.75630.23630.23696%
Total2317272.57831.57631.574100%
Table 5. Percentage of trait differences between the core and original collection under different sampling strategies.
Table 5. Percentage of trait differences between the core and original collection under different sampling strategies.
Sampling Ratio10%15%20%25%30%35%40%
Core CollectionB2C3D2B2C2D3B2C3D3B2C3D3B1C1D2B2C3D3B1C2D2B1C2D2B1C2D2B2C1D2
MD (%)0.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000
CR (%)93.4255100.000100.000100.00091.4811100.00094.349694.504694.504698.7887
VD (%)66.666760.000066.666753.333366.666746.666753.333346.666733.333340.0000
VR (%)141.037136.418136.026128.870125.758122.641120.205118.457115.168113.5148
Numbers21324253647384
The meanings of B1, B2, C1, C2, C3, D1, D2 and D3 in the table are shown in the Figure 6.
Table 6. Genetic parameters of the original and core collections.
Table 6. Genetic parameters of the original and core collections.
SoftwareCore CollectionSample Number No.Allele Number
Na
Effective Number of Allele
Ne
Observed Heterozygosity
Ho
Expected Heterozygosity
He
Shannon Information Index
I
Polymorphism Information Content
PIC
Original collection23226.18811.5650.8630.8972.6600.8890
Core FinderMc115826.15611.8260.8630.8992.6870.8916
Core HunterH-102315.000 **9.9740.777 **0.8872.4360.8770
H-153517.531 **10.6840.784 **0.8962.5370.8875
H-204618.719 **11.0460.792 **0.8982.5720.8901
H-255820.031 **11.2340.798 **0.8992.6030.8912
H-307021.375 **11.4390.811 *0.9002.6290.8924
H-358121.938 **11.3790.820 *0.9002.6310.8920
H-409322.719 *11.4110.8260.9002.6410.8920
H-4510422.875 *11.4810.8330.9002.6440.8921
H-5011623.188 *11.4790.8370.8992.6440.8912
H-55(MC2)12823.59411.4830.8400.8992.6450.8909
H-6013924.21911.5720.8390.8992.6560.8916
* p ≤ 0.05 or ** p ≤ 0.01 for the difference between a core subset and the original collection in independent t-test.
Table 7. Results of the t-test for the diversity parameters of the core and original collections.
Table 7. Results of the t-test for the diversity parameters of the core and original collections.
GermplasmNNaNeHoHeIPIC
Original Collection23226.18811.5650.8630.8972.6600.8890
Mc1158
(68.1%)
26.156
(99.9%)
11.826
(102%)
0.863
(100%)
0.899
(100%)
2.687
(101%)
0.8916
(100%)
Mc2128
(55.2%)
23.594
(90.1%)
11.483
(99.3%)
0.840
(97.3%)
0.899
(100%)
2.645
(99.4%)
0.8909
(100%)
t1 0.980.830.970.840.750.84
t2 0.080.940.850.210.870.87
N is the total number of samples; t1 represents the t-test value of core collection Mc1 and original collection, and t2 represents the t-test value of core collection Mc2 and original collection.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, N.; Yang, Y.; Xu, F.; Chen, X.; Wei, R.; Li, Z.; Pan, W.; Zhang, W. Genetic Diversity and Population Structure Analysis of Castanopsis hystrix and Construction of a Core Collection Using Phenotypic Traits and Molecular Markers. Genes 2022, 13, 2383. https://doi.org/10.3390/genes13122383

AMA Style

Li N, Yang Y, Xu F, Chen X, Wei R, Li Z, Pan W, Zhang W. Genetic Diversity and Population Structure Analysis of Castanopsis hystrix and Construction of a Core Collection Using Phenotypic Traits and Molecular Markers. Genes. 2022; 13(12):2383. https://doi.org/10.3390/genes13122383

Chicago/Turabian Style

Li, Na, Yuanmu Yang, Fang Xu, Xinyu Chen, Ruiyan Wei, Ziyue Li, Wen Pan, and Weihua Zhang. 2022. "Genetic Diversity and Population Structure Analysis of Castanopsis hystrix and Construction of a Core Collection Using Phenotypic Traits and Molecular Markers" Genes 13, no. 12: 2383. https://doi.org/10.3390/genes13122383

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop