3.1. Sample Size and Power Analysis
In México, studies using genomic information from large animal populations are scarce. Usually, the main restriction is the availability of economic resources for animal genotyping. Genetic improvement infrastructure of developing countries usually is less robust than that of developed countries, whose genomic databases involve millions of animals [
35].
In previous studies carried out with the same population used in our study, the genomic information of 300 animals was considered to find candidate markers for diseases [
36] and meat quality [
37]. For this study, we used intragenic markers (±25 kb), and only 150 animals with candidate genes associated with resistance and susceptibility to mastitis were found.
The sample size used in our study, evaluated with the pwr.chisq.test function of the pwr package [
38], was enough to detect significant differences with a
p-value of 0.05 and an effective size of 0.4. The power of 0.87 for the Xi
2 test obtained in the present study is adequate to obtain conclusive, reliable results [
39]. In other studies of genetic diversity and structure, smaller sample sizes have been used, and conclusive results have been reached: 6 to 23 [
40], 7 to 58 [
41], 13 to 38 [
42], and 71 to 167 [
43].
3.3. Population Genetic Structure
Table 6 shows the results for He and Ho, as well as those of a
t-test to determine whether the differences were statistically significant. The rule of decision to not reject the null hypothesis of non-significant differences between He and Ho was that the
p-value for the
t-test >
p-value for Bonferroni Correction. In the last column, the |He-Ho| differences can be observed; all were non-significant, with exception of the difference for SM.
The estimates for Ho obtained in our study for mastitis (average 0.45 ± 0.076) are above those obtained with breeds derived from the original Braunvieh, but below those reported for
Bos taurus breeds, with some adaptation to the tropics. In Colombia, Creole cattle [
44] reported Ho of 0.66, and the Costeño con Cuernos breed had the lowest, 0.635, and Casareño the highest, 0.733. These Ho estimates are quite different from those reported by [
45], who found an average of 0.35 ± 0.167 for Ho in populations related to Braunvieh.
This suggests that the Mexican Braunvieh population has increased its genetic versatility for the candidate genes studied, with a higher number of heterozygous individuals. Exposure to climates different from their native climate for over a century in the tropical systems of Mexico could have caused a change in Ho, with respect to original Braunvieh populations [
13,
45]. On the other hand, the non-significant differences between He and Ho for most of the traits (
Table 6) is similar to the results obtained in other studies, where neither the breed nor the environment modified the non-rejection of the null hypothesis, He = Ho for Xi2 [
13,
44,
46].
As a graphic resource,
Figure 1 shows that the absolute differences at the locus level are small, as in the overall result. The exception to this was SM, whose joint trend of the loci was to be over 0.03. Thus, it is the only trait with significant differences. In the bar-plots for the rest of the traits, there were loci that were both close to zero and above 0.05, but the overall estimate is not enough to declare evidence for a significant difference.
The estimate for the F
IT obtained coincides with those observed in semi-specialized systems of production. The overall average for F
IT was 0.017 ± 0.043 for all the traits associated with mastitis. This value is similar to those found for Sahiwal (0.013 ± 0.109), Gyr (0.013 ± 0.106), and Guernsey (0.02 ± 0.217) [
47]. In contrast, the F
IT for the tropical breeds Landim, Angole, and Tete [
48] was 70% higher than that found in our study. Furthermore, for Creole Colombian breeds, the average was 0.09 [
44]. The first group of breeds was maintained in semi-specialized systems [
47], while Landim, Angole, Tete, and the Colombian Creole are non-specialized breeds. This difference could explain why smaller breeds without adequate genetic control present higher inbreeding coefficients.
Figure 2 shows that, for traits RM, SM, and RBM, most of the loci had positive F
IT. In contrast, the loci for RSM and RCM were mostly negative; that is, there were no traces of inbreeding. This could be due to the specificity of the last two traits, where the candidate genes involve DNA segments that have undergone null direct selection.
The genes whose symbol starts with CXCL, chemokine (C-X-C motif) ligand, are responsible for the immune response [
49]. An in vivo study demonstrated that the CXCL genes are key in the inflammatory process of the mammary gland, after the entry of bacteria through the cow’s nipple, particularly the CXCL8 and CXCL10 genes [
50,
51]. In our study, for RCM, 67% of the SNP markers belong to genes CXCL1 and CXCL8. This explains the low F
IT values for the trait, which is because current improvement programs for the Braunvieh population under study have no selection criteria based on resistance to mastitis.
For RSM, 40% of the markers belong to EDN2 and HIVEP3, candidate genes associated with immune processes, such as intracellular and sequential signaling of immunoglobulin receptors. Because of their immunologic nature, these genes have not been directly selected for; however, at the farm level there is a rising concern for improving their animals for genes associated with diseases that could be prevented, such as paratuberculosis and clinic mastitis [
8,
52].
The distribution of animals by their F
IT value can be observed in
Figure 3. In general, animals fall in the range of 0 to 0.3 F
IT. F
IT = 50% is equivalent to the value of an inbred animal produced by two progenitors with 100% genetic relationship. Our animals varied from 1.3% to 13.3%. The trait with the highest number of highly inbred animals was SM, while RSM had the lowest.
Although up to 13.3% of animals have non-desirable F
IT values, the estimate obtained in our study (0.017 ± 0.043) is similar to that reported for other cattle breeds in specialized and semi-specialized systems of production. In those systems, although some animals had high inbreeding coefficients, the average F
IT is lower than those reported by [
53] for
Bos taurus breeds (0.071), Brown Swiss (0.071), Braunvieh (0.059), Original Braunvieh (0.023), Holstein (0.057), and Simmental (0.028).
3.4. Clustering Using Hierarchical Methods and K-Means Algorithms
The optimum number of k to equilibrate the silhouette width and the means of squared errors for both algorithms converged at the same number of clusters for each trait. However, better values for silhouette were found by grouping with the K-means algorithm: RM, 0.08; SM, 0.17; RBM, 0.17; RSM, 0.12; and RCM, 0.39. For the first four traits, the silhouette values were low, but in all cases, they were negative, a good indicator of the suitability of the method. The RCM silhouette was the widest.
The average proportion of variability among groups relative to the total variability was obtained. The estimates found were RM, 34.9%; SM, 65.5%; RBM, 66.9%; RSM, 54.7%; and RCM, 77.3%. It should be noted that the RCM values for variability and silhouette were the best of all the traits. This could be due to the small number of markers of the trait (6), which were highly informative as well.
The circular dendrograms of
Figure 4 were generated to visualize the groups formed by the Ward hierarchical method. The groups created with this algorithm were very similar to those created with K-means. The
k number, the number of individuals per group, and the representative animals together with their genotypic profile can be seen in
Figure 5. The representative animal is the animal closest to the centroid of its cluster, and its genotypic profile was obtained to visualize the main differences among group patterns.
Ho and F
IT are not related because, unlike inbreeding, heterozygosity is not correlated with the markers [
54], even though Ho values close to 0.50 were found for all the traits. In the genotypic profiles, it can be observed that there are groups of animals with a low frequency of genotype AB (
Figure 5). The traits with the lowest values for F
IT present high numbers of homozygous animals. This could be explained by the low genetic relationship among the animals for these loci, that is, genotypes AA or BB for the same locus, as seen in
Table 6. One of these traits (SM) presents a significant He-Ho difference. This means a decrease in the expected value for heterozygosity, which will be reflected in a greater number of homozygous individuals.
The trait with the greatest number of heterozygous animals was RM. Groups III, VII, VIII, and XI possess most of their loci with genotype AB. RM is multifactorial, and therefore, it includes candidate genes associated with productive, reproductive, adaptative, and immunological traits. This will cause high genetic diversity, in contrast with the traits RSM and RCM, which, in our study, are only influenced by immunological candidate genes of a specific order whose diversity is limited. According to [
55], these gene types tend to be found in homozygous genotypes in wildlife, given that AB genotypes are associated with proteinic abnormalities that increase their susceptibility to pathogenic diseases.
For SM, there are well defined groups of animals whose loci are mostly heterozygous. Groups II, IV, V, XV, and XVII have at least 55% of their loci with genotype AB. The groups with fewest heterozygous individuals were III, XVIII, XIX, and XX, with up to 18% AB genotypes. There is high genetic diversity in the population for the trait as was observed for RM; the reason could be the presence of genes influencing traits that are indirectly related with mastitis, and not just loci affecting the immune system [
49,
55].
The loci of the groups formed by the algorithm K-means for RBM are from 29% (groups II, VI, IX, and XX) to 57% (groups I, III, VIII, and XII) heterozygous. Diversity was high because only 33% of the genes were directly related with immune functions in the animal, genes IL1A and IL1RN, while the rest were related to metabolic processes [
20,
21,
22].
3.5. Principal Components Analysis (PCA)
The PCA was useful to observe the genetic diversity in the groups of animals generated with K-means. The first two dimensions of the PCA explained 22.21% (RM), 42.1% (SM), 40.55% (RBM), 29.68% (RSM), and 58.9% (RCM) of the variation found in the markers (
Figure 6). These results were higher than those reported in similar studies [
40,
41]. These authors found that 10.9% and 8.79%, respectively, of the variation was explained by the first two dimensions.
For traits RM and RSM, which possess a larger number of the loci under study, the number of eigenvalues considered to reach a minimum of 80% of variability is higher than that for the rest of the traits. However, the first two dimensions possess a high percentage of variation, evident in the graphic representation of the PCA, where the K-means groups are clearly differentiated, particularly for the RCM graph.
For RM, the genes ARHGAP10 and BDH2 (SNPs 2 and 3, respectively) contributed the most to the analysis; however, the relationship between them is negative (
Figure 7). This means the genotypes for the genes in an animal are present as AA and in another as BB, or vice versa. Between these genes there was no known linkage or genetic correlation, given that they were in different chromosomes and affect unrelated traits. ARHGAP10 is associated with intramuscular fat formation [
56] and BDH2 with feed intake [
57].
Oddly enough, for RM, the highly inbred animal 16 is among the top 10% of the most contributory animals (
Figure 7). This can probably be explained by its position in the biplot, which means it is most associated with gene TBC1D8, which contributed little in the analysis. The same situation is true for SM, animal 143, and the marker for the gene ITPK1.
For SM, the markers for genes ANKRD33B, CTNND2, GRIA3, and SIDT1 contributed the most to the PCA,
Figure 6. The pairs of genes CTNND2 with SIDT1, and ANKRD33B with GRIA3 had a positive relationship (
Figure 7) even though they are in distant chromosomes. This may be a particular feature of the genotypic profile for mastitis in this Braunvieh population.
The markers of genes IL1A and ITGB3 for RBM were in a relationship similar to that of the genes ARHGAP10 and BDH2 for RM (
Figure 6). They had a negative relationship according to the biplot (
Figure 7). Similarly, they are in different chromosomes, even though both directly influence the function of the immune system [
20,
21].
The marker BovineHD0300029925 (10) of HIVEP3 (
Figure 6 and
Figure 7) contributed the most to the PCA for RSM. This SNP is positively related to another two intragenic HIVEP3 markers: BovineHD0300029904 (9) and BovineHD0300029955 (11). For RCM, most of the markers were highly contributive, i.e., the SNPs for the genes CXCL8, SEL1L, and STAT4, associated directly with the immune system [
49].