The Complete Mitochondrial Genome of Northeast Asian Rove Beetle, Lordithon arcuatus (Solsky, 1871) and Performance of Site-Specific Mixture Models in Building the Mitogenomic Phylogeny of Staphylinidae (Insecta: Coleoptera)

Ji, Qiao-Qiao; Sun, Yi-Nuo; Lü, Liang; Zhao, Tian-You; Wu, Dong-Hui

doi:10.3390/d15050588

Open AccessArticle

The Complete Mitochondrial Genome of Northeast Asian Rove Beetle, Lordithon arcuatus (Solsky, 1871) and Performance of Site-Specific Mixture Models in Building the Mitogenomic Phylogeny of Staphylinidae (Insecta: Coleoptera)

by

Qiao-Qiao Ji

^1,2,†

,

Yi-Nuo Sun

^3,†

,

Liang Lü

³

,

Tian-You Zhao

^3,*

and

Dong-Hui Wu

^1,4,5,6,*

¹

State Key Laboratory of Black Soils Conservation and Utilization, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Key Laboratory of Animal Physiology, Biochemistry and Molecular Biology of Hebei Province, College of Life Sciences, Hebei Normal University, Shijiazhuang 050024, China

⁴

Key Laboratory of Vegetation Ecology, Ministry of Education, Northeast Normal University, Changchun 130024, China

⁵

State Environmental Protection Key Laboratory of Wetland Ecology and Vegetation Restoration, School of Environment, Northeast Normal University, Changchun 130117, China

⁶

Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Diversity 2023, 15(5), 588; https://doi.org/10.3390/d15050588

Submission received: 30 November 2022 / Revised: 18 April 2023 / Accepted: 20 April 2023 / Published: 23 April 2023

(This article belongs to the Special Issue Mega-Diversity of Beetle Species—Perspective on Taxonomy, Systematics, Morphological Evolution and Zoogeographical Patterns of Coleoptera (Arthropoda, Insecta))

Download

Browse Figures

Versions Notes

Abstract

:

Simple Summary

This paper describes the first complete mitochondrial genome of an Asian Lordithon species (Coleoptera, Staphylinidae). The mitochondrial genome of Lordithon arcuatus (Solsky, 1871) is 18,290 bp long. Maximum likelihood and Bayesian phylogenetic analyses using 68 staphylinid taxa revealed that the mycetoporine representatives constituted a stable and fully supported clade. In addition, we evaluated the performance of mixture models in constructing the mitochondrial tree of staphylinids, and our findings suggest that the class-unlinked heterotachy models, despite having the lowest AIC or BIC value, may produce deceptive results.

Abstract

Lordithon species are typically mushroom-dwelling rove beetles that devour maggots. This study presents the mitogenome of a Lordithon arcuatus specimen that was procured from Changbai Mountain in the Jilin Province of China. The mitogenome is 18,290 bp long and comprises 13 protein-coding genes, 22 tRNAs, and 2 rRNAs. The base composition of the mitogenome is as follows: A = 38.80%, T = 37.93%, G = 8.94%, and C = 14.32%. Maximum likelihood and Bayesian phylogenetic trees were constructed using 68 representative staphylinid species, which showed that Lordithon, Bolitobius, and Ischnosoma form a stable and fully supported Mycetoporinae clade, whereas there was no consensus regarding the relationships among Tachyporinae taxa. Additionally, the performance of site-specific mixture models for inferring the phylogeny of staphylinids using mitogenomic data was assessed. The results suggest that heterotachy models should be used with caution, as they may result in incorrect topology with delusive precedence in AIC- or BIC-based model selection.

Keywords:

mitogenome; phylogenetics; heterogeneity models; rove beetles; Mycetoporinae; Lordithon

1. Introduction

The staphylinid genus Lordithon Thomson has 134 species worldwide, including 73 Palaearctic species [1], which are spread throughout the temperate regions of all continents, excluding Antarctica [2]. Twelve species have been documented in China, the majority of which are found in northern provinces [3]. The current mitogenomic data for this genus available in GenBank come from a European species, L. exoletus (Erichson, 1839) (accession number: KX087309). Here, we present new mitogenome data for an Asian species, Lordithon arcuatus (Solsky, 1871). It is widespread across Northeast Asia, Siberia, and Russia’s Far East [4]. This species currently only has records from Jilin Province in China. [3]. Both Ban et al. [4] and our collection show that L. arcuatus is a mountainous species and can be captured using pitfall traps, flight-interception traps, and sifting from mushrooms. However, it was not present in our collecting funnels for insects that live in the canopy at the same plots [5]. Its association with mushrooms is not evidence of fungivory. Instead, we observed some individuals eating maggots that were present in mushrooms, as summarized by Campbell [2]. In addition, previous studies with different data and models revealed a number of variations in the topologies of staphylinid phylogeny [6,7,8]. In particular, it has been discovered that the site heterogeneity of sequence, which assumes that each site of alignment has undergone uneven evolutionary processes and thus has a specific rate or composition [9,10,11,12,13], has been identified as a significant contributor to the estimation of errors in beetle phylogenetics [14,15]. Therefore, the influence of datasets and models in phylogenetic inference also needs to be addressed, especially the performance of the mixture models, which were created to handle the site heterogeneity of sequences.

2. Material and Methods

2.1. Sample Collection, DNA Extraction, and Mitogenome Sequencing

The specimen for sequencing was gathered in Antu County, Jilin Province, China (42.1075° N, 128.0956° E 1400 m, July 2019). It was identified according to Li et al. [3] and Ban et al. [4]. Whole-genome DNA was isolated from the forebody using a modified CTAB (pH 8.0)-based DNA extraction protocol that was described in Zhao et al. [7]. The genomic DNA library was constructed using an NEB Next^® Ultra^TM DNA Library Prep Kit for Illumina (NEB, Ipswich, MA, USA), following the manufacturer’s recommendations. Then, the DNA library was sequenced, and about 20 Gb of data from 150 bp pair-end clean reads were generated on the Illumina NovaSeq 6000 platform (Novogene, Beijing, China); these data are deposited in the GenBank-SRA database (accession number: SRR24124725). The voucher specimen was preserved in 100% ethanol, in a −20 °C freezer at Hebei Normal University.

2.2. Mitogenome Assembly, Annotation, and Bioinformatic Analysis

MITOZ v2.4 [16] was used to assemble the mitogenome, annotate protein-coding genes (PCGs) and ribosomal RNAs (rRNAs), and draw a map of the mitochondrial genome. The annotation and prediction of the second structures of tRNAs were carried out on the MITOS Web Server (http://mitos2.bioinf.uni-leipzig.de/index.py), accessed on 14 October 2021 [17]. Geneious Prime v.2021.2.2 [18] was subsequently used to confirm the PCG boundaries by identifying the open reading frame (ORF) and calculating the sequence similarity with related species.

The base composition, amino acid usage, and relative synonymous codon usage (RSCU) were calculated using a Python script that refers to the CAI module [19]. According to the base composition, the following formula was used: AT skew = [A − T]/[A + T] and GC skew = [G − C]/[G + C] [20]. The complete mitogenome sequence of L. arcuatus with full annotation has been deposited in GenBank (accession number: OK501152).

2.3. Phylogenetic Analysis

In total, 68 representative species were sampled for phylogenetic analysis (Supplementary Material Table S1), comprising 4 from Mycetoporinae, 3 from Tachyporinae, 60 from various rove beetle subfamilies, and 1 from Leiodidae, Sciodrepoides watsoni, as an outgroup. The mitochondrial sequences were retrieved using PhyloSuite v1.2.2 [21] and individually aligned with MAFFT v7.313 [22] using the iterative refinement method of E-INS-I for nucleotide alignment and L-INS-I for ribosome RNA alignment. Gene alignments were concatenated into a matrix using PhyloSuite v.1.2.2.

Seven datasets (matrices) were used in the phylogenetic analyses: (1) AA: the amino acid sequence of 13 PCGs (3524 sites, 13 partitions); (2) P2: the 2nd position of PCG nucleotide sequences (3405 sites, 13 partitions); (3) P12: the united 1st and 2nd positions of PCGs (6372 sites, 13 partitions); (4) P123: the united 1st, 2nd, and 3rd positions of PCGs (8783sites, 13 partitions); (5) P2R: the 2nd position of PCGs, concatenated with 2 rRNAs (4852 sites, 15 partitions); (6) P12R: the united 1st and 2nd positions of PCGs, concatenated with 2 rRNAs (7819 sites, 15 partitions); (7) P123R: the united 1st, 2nd, and 3rd positions of PCGs, concatenated with 2 rRNAs (10230 sites, 15 partitions). For each dataset, ambiguous sites and poorly aligned positions were eliminated using BMGE v1.12 (m = DNAPAM100:2 for nucleotide sequences, m = BLOSUM90 for amino acid sequences, h = 0.4 for all) separately [23]. Three partition schemes were applied to all seven datasets: (1) non-partitioned (NP, all proteins were analyzed as a single partition); (2) fully partitioned (FP, each locus was a distinct partition with its own substitution matrix); (3) merge-partitioned (MP). ModelFinder [24] was used to identify the partitioning schemes and select the substitution models based on the Bayesian Information Criterion (BIC). In addition, Two types of four-class heterotachy models [13], class-linked (+H4) and class-unlinked (*H4), were applied with the selected substitution model to the NP scheme of all datasets; two empirical profile mixture models [20], mtART+F+R7+C60 and C60+FO+R4 (alias for POISSON+G+FMIX), were applied to the AA dataset (Table 1). IQTREE v2.1.2 [25] was utilized for building the maximum likelihood (ML) trees in phylogenetic analyses. The best-resulting tree for each combination of dataset, partitioning scheme, and models was automatically generated through 4 independent runs, and the nodal supporting values (BS) were calculated using 1000 replicates of the Ultrafast Bootstrap [26]. Nodes with a BS value of 100 were determined to be fully supported, 95–99 to be strongly supported, 90–94 to be moderately supported, and <90 to be unsupported. The partition files and data matrices in Phylip format are available in Supplementary Materials.

As an alternative method, Bayesian inference (BI) under the CAT-GTR+G4 model was applied to suppress potential artifacts due to, for example, compositional heterogeneity across sites [27,28]. Three MCMC sampling chains, each with 8000 generations, were implemented in PhyloBayes MPI 1.9 [29]. After discarding the initial 2000 generations dropped as burn-in, 2 chains attained convergence (maxdiff < 0.3), with an acceptable effective sample size (minimum effsize > 50), as suggested in the PhyloBayes tutorial. The programs “bpcomp” and “tracecomp” in the PhyloBayes MPI package were used to construct the consensus tree and diagnose the chain convergence. The nodal support was quantified using marginal posterior probability (PP).

3. Results and Discussion

3.1. Genome Organization and Base Composition

The complete mitogenome of L. arcuatus (18,290 bp) (Figure 1) contains 13 PCGs (COX1–3, ND1–6, ND4L, CYTB, ATP6, and ATP8), 22 tRNA genes, and 2 rRNA genes (16S rRNA, or l-rRNA; and 12S rRNA, or s-rRNA). Neither MITOS nor Geneious, however, managed to recognize the control region. Notably, 23 of these 37 genes are located on the forward strand, while the rest are on the reverse strand (Supplementary Material Table S2). The longest gap (81 bp) was found between tRNA-Arg and tRNA-Asn. The AT content of the whole mitochondrial sequence is 76.72%, and the proportions of the four bases are as follows: A, 38.80%; T, 37.93%; C, 14.32%; and G, 8.94%. The AT and GC skews of the entire sequence are 0.01 and −0.23, respectively; it tends to be negative in PCGs but positive in tRNA and rRNA genes (Table 2).

3.2. Protein-Coding Genes and Codon Usage

The total length of PCGs is 11,161 bp, which accounts for 61.02% of the mitogenome. The longest single PDG is nad5 (ND5), while the shortest is ATP8. Five of PCGs were identified on the reverse strand (nad1, nad4, nad4L, and nad5) (Supplementary Material Table S2). Three of them (cox2, nad4, and nad5) end with an incomplete stop codon (T/TA). The AT and GC skews of PCGs are shown in Supplementary Material Table S1. Among the codon positions, the second codon exhibits the lowest AT skew, indicating that the proportion of T is greater than A.

The relative synonymous codon usage (RSCU) of L. arcuatus mitogenome is shown in Table 3 and Figure 2. The third codon contains 86.10% AT, which is significantly higher than the first and second codons. The codons and amino acids TTA (4.05, Leu), TCT (2.443, Ser), CGA (2.327, Arg), AGA (2.156, Ser), and GGA (2.062, Gly) correspond to the top five RSCU values.

3.3. Ribosomal RNAs and Transfer RNAs

Two rRNA genes of L. arcuatus are located on the reverse strand, with lengths of 795 bp (12S rRNA) and 1270 bp (16S rRNA), respectively. AT content is 76.86% (12S rRNA) and 82.05% (16S rRNA), while AT skews are −0.03 (16S rRNA) and 0.04 (12S rRNA) (Table 2). The length of 22 tRNA genes ranges from 62 to 70 bp, with an AT content of 77.87%. Most tRNAs have a characteristic clover-leaf secondary structure, except for the tRNA-Ser1 (Figure 3). Fourteen of them lie on the forward strand, while the rest are on the reverse strand.

3.4. Phylogenetic Results

Thirty-seven ML trees and one BI tree were constructed (see the trees in Supplementary Materials). For each dataset, the results were obtained from the trees under homogeneous models and three schemes and trees under mixture models with the NP scheme, among which the tree with the lowest BIC value (extracted from IQTREE output files) was selected as the final resulting ML tree (Table 1). The tree under NP-schemed mtART+C60 profile mixture model was chosen as the final tree for an amino acid dataset, which was also reported in previous mitogenome-based phylogenetic research (e.g., Zhao et al. [6]). In contrast, the trees under the heterotachy models (+H4 and *H4) revealed rate variation among sites, and lineages had a worse BIC score. The final tree for the P2 and P2R datasets was built using a four-class partitioning scheme with separate models (different substitution matrices among partitions). For the remaining four datasets (P12, P12R, P123, and P123R), trees under the four-class-unlinked GTR models (GTR+F+R*H4) were ranked highest based on their BIC values. Other than the profile mixture and heterotachy models, the partitioned models (the MP scheme with partition-specific substitution models) performed the best when reconstructing trees using mitogenomic data.

Different datasets yielded various topologies, but few of them were found to have strong evidence (Figure 4, Figure 5 and Figure 6). Additionally, a clear “regression” is that the supra-subfamily-level relationships shown in the present resulting trees were completely uncertain and radically different from those in corresponding trees built from the same data type and partition scheme in our previous study [6], in which datasets with similar but slightly larger taxon sampling were used. Since our workframe had not changed, we hypothesize that this is largely attributable to the change in outgroups.

Specifically, the results from diverse phylogenetic trees revealed in agreement that L. arcuatus and three additional mycetoporine species form a stable and fully supported clade (BS = 100, PP = 1): Bolitobius castaneus + (Ischnosoma splendidum + (L. arcuatus + L. exoletus)) (Figure 4, Figure 5, Figure 6 and Figure 7). This node appears to be 100% supported under most of the final resulting trees, with the exception of the GTR+F+R7*H4 tree of the P123 dataset, where it was scored as only 88 (Figure 6D). However, in line with [30], Tachyporinae, which was believed to be closely related to Mycetoporinae, appeared to be a polyphyletic group in the results from AA, P2, P12, and P2R datasets or to be clustered together with weaker branch supports in the other datasets (P123 and P12R datasets with BS < 90 in the partitioned model, P123R datasets with BS = 95 in the partitioned model, and = 90 in the class-linked heterotachy model) (Figure 4, Figure 5, Figure 6 and Figure 7). As indicated in previous studies [7,8], the sequence of Tachinus subterraneus (KX087351) herein remains a “tricky outlier” whose place varied among trees and was isolated from its taxonomic affiliation in most trees. Intriguingly, it is included in the Tachyporinae clade with very strong support (PP = 0.99) in our BI tree under the CAT-GTR+G4 model, which appears to indicate an LBA artifact caused by site-compositional heterogeneity. However, the tree shown in Figure 1 of the study by Song et al. [8] was also constructed using mitogenomic AA data and a site-heterogeneous CAT model. Hence, further investigation is required to determine whether it is caused by the site-compositional heterogeneity or simply the identity of the sequence. A more practical alternative is to use new data from the same species or genus, as Tachinus is a typical Tachyporinae genus based on both morphological characteristics and independent molecular phylogenetic studies [7,30]. Although Yamamoto [31] taxonomically divided the former “Tachyporinae” (including Mycetoporinae) based on morphological phylogeny, the accurate position and relationship of the associated taxa remain unresolved. Phylogenetics with more precisely identified data and a larger sample size is required for convincing conclusions.

Staphylininae is well defined as a monophyletic group based on its pupal and larval characteristics [32], despite the fact that different studies with different data supported this to varying degrees [6,7,28,29,30,31,32], including our previous results using amino acid sequences of the complete mitogenome, which fully supported its monophyly [6]. In the present study, however, we found that heterotachy models, particularly the class-unlinked models, tend to hasten the artifact of staphylinine separation, i.e., the staphylinine clade is inserted or distantly separate in five of the seven datasets (Figure 6), among which the two exceptions are the trees built under the GTR+F+R5*H4 model using P12 (Figure 6C) or P12R matrix (Figure 6F). In contrast, only one or two cases could be found under partitioned models or class-linked heterotachy models (Figure 4 and Figure 5). Otherwise, the P2 dataset appears to be more susceptible to this artifact. As the heterotachy models were used with only four classes and only applied to NP-schemed datasets, in which the model parameters were universally shared across the dataset, we hypothesize that this issue must be alleviated by raising the heterogeneous consideration into the model, e.g., a higher number of classes, partitioning schemes, and branch-length models across loci or partitions [33].

The position of Neophoninae is another issue worthy of consideration. This monotypic subfamily has been confirmed to be the sister group of Dasycerinae and Pselaphinae with either complete or very strong statistical support [6,7,30]. Our results generated using partitioned or profile mixture models (including the CAT-GTR+G4 model using the Bayesian method) perfectly recovered its sibling relationship with Pselaphinae (Figure 4 and Figure 7), whereas those generated with heterotachy models using nucleotide datasets indicated an isolated Neophoninae branch against all the other subfamilies (Figure 5 and Figure 6). That being said, the majority of our results endorse a tremendously basal position for Neophoninae, which is consistent with our previous study using mitogenomic data [6] but contradicts the multi-gene phylogenies (with nuclear genes), in which Neophoninae appeared to be a rather derived lineage in staphylinid trees [7,30].

4. Conclusions

Our research investigated the mitogenome of a Lordithon arcuatus specimen acquired from Northeast China and its phylogenetic relationship with three mycetoporine species. ML and BI analyses provide strong support for the conclusion that they form a monophyletic group. Comparing different models for amino acid and nucleotide mitogenomic data, we found that the profile mixture model plus the empirical matrix, which performs approximately as well as the site-heterogeneous CAT-GTR+G4 model, prove the most appropriate for amino acid data, whereas partitioned models provide a better and more reliable fit than heterotachy models for the nucleotide data of staphylinid mitogenomes, despite the fact that heterotachy models are more likely be superior in AIC or BIC rankings.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/d15050588/s1, Table S1: Sequences used in phylogenetic analysis, including taxonomic information, GenBank accession number, length, and AT content; Datasets: The partition files and data matrices in Phylip format; Trees: All of Maximum Likelihood (ML) and Bayesian (BI) trees; Table S2: Mitogenomic organization of L. arcuatus. A positive value in the “Gap or Overlap” column indicates the length of the gap between the current gene and the next; a negative value implies overlap.

Author Contributions

Conceptualization, T.-Y.Z. and D.-H.W.; methodology, Y.-N.S. and T.-Y.Z.; formal analysis, Y.-N.S.; investigation, L.L.; fieldwork and resources, Q.-Q.J.; data curation, L.L.; writing—original draft preparation, Y.-N.S.; writing—review and editing, Q.-Q.J. and L.L.; visualization, Y.-N.S.; supervision, D.-H.W. and L.L.; project administration, T.-Y.Z.; funding acquisition, D.-H.W. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Fundamental Resources Investigation Program of China, grant number 2018FY100300; the Hebei Normal University Start-up Funds, grant number L2018B13; and the China Scholarship Council, grant number 202104910331.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors; sequencing data have been deposited in the GenBank database.

Acknowledgments

We thank Changbai Mountain National Reserve, China, for providing permission for field collection. Additionally, we are deeply grateful to the editor and the five anonymous reviewers for their thoughtful comments and valuable feedback, which have significantly improved the quality of our manuscript and enriched our work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Schülke, M.; Smetana, A. Staphylinidae. In Hydrophiloidea—Staphylinoidea; Löbl, I., Löbl, D., Eds.; Revised and Updated; Brill: Leiden, The Netherlands, 2015; pp. 304–1134. [Google Scholar]
Campbell, J.M. A Revision of the Genus Lordithon Thomson of North and Central America (Coleoptera: Staphylinidae). Memoirs Èntomol. Soc. Can. 1982, 114, 5–116. [Google Scholar] [CrossRef]
Li, L.; Hu, J.Y.; Peng, Z.; Tang, L.; Yin, Z.W.; Zhao, M.J. Catalogue of Chinese Coleoptera Volume 3—Staphylinidae; Science Press: Beijing, China, 2019. [Google Scholar]
Ban, Y.-G.; Jeong, W.-J.; Ahn, K.-J. Taxonomy of Korean Lordithon Thomson (Coleoptera: Staphylinidae: Tachyporinae). J. Asia-Pac. Biodivers. 2019, 12, 545–557. [Google Scholar] [CrossRef]
Zheng, G.; Li, S.; Yang, X. Forest Ecology and Management Spider Diversity in Canopies of Xishuangbanna Rainforest (China) Indicates an Alarming Juggernaut Effect of Rubber Plantations. For. Ecol. Manag. 2015, 338, 200–207. [Google Scholar] [CrossRef]
Zhao, T.-Y.; He, L.; Xu, X.; Chen, Z.-N.; Gao, Y.-Y.; Lü, L. The First Mitochondrial Genome of Creophilus Leach and Platydracus Thomson (Coleoptera: Staphylinidae: Staphylinini) and Phylogenetic Implications. Zootaxa 2022, 5099, 179–200. [Google Scholar] [CrossRef] [PubMed]
Zhao, T.-Y.; Zhang, C.-J.; Lü, L. Comparative Description of the Mitochondrial Genome of Scaphidium formosanum Pic, 1915 (Coleoptera: Staphylinidae: Scaphidiinae). Zootaxa 2021, 4941, 487–510. [Google Scholar] [CrossRef]
Song, N.; Zhai, Q.; Zhang, Y. Higher-Level Phylogenetic Relationships of Rove Beetles (Coleoptera, Staphylinidae) Inferred from Mitochondrial Genome Sequences. Mitochondrial DNA Part A DNA Mapp. Seq. Anal. 2021, 32, 98–105. [Google Scholar] [CrossRef]
Yang, Z. Maximum Likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates over Sites: Approximate Methods. J. Mol. Evol. 1994, 39, 306–314. [Google Scholar] [CrossRef]
Lopez, P.; Casane, D.; Philippe, H. Heterotachy, an Important Process of Protein Evolution. Mol. Biol. Evol. 2002, 19, 1–7. [Google Scholar] [CrossRef]
Quang, L.S.; Gascuel, O.; Lartillot, N. Empirical Profile Mixture Models for Phylogenetic Reconstruction. Bioinformatics 2008, 24, 2317–2323. [Google Scholar] [CrossRef]
Wang, H.-C.; Minh, B.Q.; Susko, E.; Roger, A.J. Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation. Syst. Biol. 2018, 67, 216–235. [Google Scholar] [CrossRef]
Crotty, S.; Minh, B.Q.; Bean, N.G.; Holland, B.R.; Tuke, J.; Jermiin, L.S.; Von Haeseler, A. GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments. Syst. Biol. 2020, 69, 249–264. [Google Scholar] [CrossRef]
Vasilikopoulos, A.; Balke, M.; Kukowka, S.; Pflug, J.M.; Martin, S.; Meusemann, K.; Hendrich, L.; Mayer, C.; Maddison, D.R.; Niehuis, O.; et al. Phylogenomic Analyses Clarify the Pattern of Evolution of Adephaga (Coleoptera) and Highlight Phylogenetic Artefacts Due to Model Misspecification and Excessive Data Trimming. Syst. Èntomol. 2021, 46, 991–1018. [Google Scholar] [CrossRef]
Cai, C.; Tihelka, E.; Giacomelli, M.; Lawrence, J.F.; Ślipiński, A.; Kundrata, R.; Yamamoto, S.; Thayer, M.K.; Newton, A.F.; Leschen, R.A.B.; et al. Integrated phylogenomics and fossil data illuminate the evolution of beetles. R. Soc. Open Sci. 2022, 9, 211771. [Google Scholar] [CrossRef]
Meng, G.; Li, Y.; Yang, C.; Liu, S. MitoZ: A Toolkit for Animal Mitochondrial Genome Assembly, Annotation and Visualization. Nucleic Acids Res. 2019, 47, e63. [Google Scholar] [CrossRef]
Bernt, M.; Donath, A.; Jühling, F.; Externbrink, F.; Florentz, C.; Fritzsch, G.; Pütz, J.; Middendorf, M.; Stadler, P.F. MITOS: Improved De Novo Metazoan Mitochondrial Genome Annotation. Mol. Phylogenet. Evol. 2013, 69, 313–319. [Google Scholar] [CrossRef]
Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; et al. Geneious Basic: An Integrated and Extendable Desktop Software Platform for the Organization and Analysis of Sequence data. Bioinformatics 2012, 28, 1647–1649. [Google Scholar] [CrossRef]
Lee, B.D. Python Implementation of Codon Adaptation Index. J. Open Source Softw. 2018, 3, 905. [Google Scholar] [CrossRef]
Perna, N.; Kocher, T. Patterns of Nucleotide Composition at Fourfold Degenerate Sites of Animal Mitochondrial Genomes. J. Mol. Evol. 1995, 41, 353–358. [Google Scholar] [CrossRef]
Zhang, D.; Gao, F.; Jakovlić, I.; Zhou, H.; Zhang, J.; Li, W.X.; Wang, G.T. PhyloSuite: An Integrated and Scalable Desktop Platform for Streamlined Molecular Sequence Data Management and Evolutionary Phylogenetics Studies. Mol. Ecol. Resour. 2020, 20, 348–355. [Google Scholar] [CrossRef]
Katoh, K.; Standley, D.M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
Criscuolo, A.; Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): A new Software for Selection of Phylogenetic Informative Regions from Multiple Sequence Alignments. BMC Evol. Biol. 2010, 10, 210. [Google Scholar] [CrossRef] [PubMed]
Kalyaanamoorthy, S.; Minh, B.Q.; Wong, T.K.F.; Von Haeseler, A.; Jermiin, L.S. ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates. Nat. Methods 2017, 14, 587–589. [Google Scholar] [CrossRef] [PubMed]
Minh, B.Q.; Schmidt, H.A.; Chernomor, O.; Schrempf, D.; Woodhams, M.D.; von Haeseler, A.; Lanfear, R.; Teeling, E. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 2020, 37, 1530–1534. [Google Scholar] [CrossRef] [PubMed]
Hoang, D.T.; Chernomor, O.; Von Haeseler, A.; Minh, B.Q.; Vinh, L.S. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 2018, 35, 518–522. [Google Scholar] [CrossRef] [PubMed]
Lartillot, N.; Philippe, H. A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process. Mol. Biol. Evol. 2004, 21, 1095–1109. [Google Scholar] [CrossRef]
Lartillot, N.; Brinkmann, H.; Philippe, H. Suppression of Long-Branch Attraction Artefacts in the Animal Phylogeny Using a Site-Heterogeneous Model. BMC Evol. Biol. 2007, 7, S4. [Google Scholar] [CrossRef]
Lartillot, N.; Rodrigue, N.; Stubbs, D.; Richer, J. PhyloBayes MPI: Phylogenetic Reconstruction with Infinite Mixtures of Profiles in a Parallel Environment. Syst. Biol. 2013, 62, 611–615. [Google Scholar] [CrossRef]
Yamamoto, S. Tachyporinae Revisited: Phylogeny, Evolution, and Higher Classification Based on Morphology, with Recognition of a New Rove Beetle Subfamily (Coleoptera: Staphylinidae). Biology 2021, 10, 323. [Google Scholar] [CrossRef]
Mckenna, D.D.; Farrell, B.D.; Caterino, M.S.; Farnum, C.W.; Hawks, D.C.; Maddison, D.R.; Seago, A.E.; Short, A.E.Z.; Newton, A.F.; Thayer, M.K. Phylogeny and Evolution of Staphyliniformia and Scarabaeiformia: Forest Litter as a Stepping Stone for Diversification of Nonphytophagous Beetles. Syst. Èntomol. 2015, 40, 35–60. [Google Scholar] [CrossRef]
Lü, L.; Cai, C.-Y.; Zhang, X.; Newton, A.F.; Thayer, M.K.; Zhou, H.-Z. Linking Evolutionary Mode to Palaeoclimate Change Reveals Rapid Radiations of Staphylinoid Beetles in Low-Energy Conditions. Curr. Zool. 2021, 66, 435–444. [Google Scholar] [CrossRef]
Duchêne, D.; Tong, K.J.; Foster, C.S.P.; Duchêne, S.; Lanfear, R.; Ho, S.Y.W. Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference. Mol. Biol. Evol. 2020, 37, 1202–1210. [Google Scholar] [CrossRef]

Figure 1. Gene map of the mitochondrial genome of L. arcuatus. The blue bar diagram of the inner circle shows the GC content of every 50 sites; the red circle represents 50% GC content, and each black concentric circle represents a 5% increment. The outer circle shows the distribution of the genes; the genes inside are arranged clockwise (forward strand).

Figure 2. Relative synonymous codon usage (RSCU) of L. arcuatus. The codon families are listed in alphabetical order beneath the horizontal axis. The asterisk represents stop codons.

Figure 3. Secondary structure of tRNAs in the mitogenome of L. arcuatus. Red bases on the anticodon arm are anticodon. The number after the abbreviation of amino acid represents tRNAs with different anticodons.

Figure 4. Maximum likelihood trees built under the best model selected according to BIC rankings: (A) AA matrix with mtART+C60 profile mixture model; (B) AA matrix with C60 profile mixture model; (C) P2 matrix with MP scheme; (D) P12 matrix with MP scheme; (E) P123 matrix with MP scheme; (F) P2R matrix with MP scheme; (G) P12R matrix with MP scheme; (H) P123R matrix with MP scheme. The color of the circle at each node represents the bootstrap value (those with BS < 90 are not shown here). Branch length scales are located at the lower left of the trees. The asterisk indicates the species whose mitogenome was sequenced for this study. The “tricky outlier” Tachinus subterraneus is indicated by a black frame.

Figure 5. Maximum likelihood trees built with class-linked heterotachy models: (A) AA matrix with mtART+F+H4 model; (B) P2 matrix with TVM+F+R4+H4 model; (C) P12 matrix with GTR+F+R5+H4 model; (D) P123 matrix with GTR+F+R7+H4 model; (E) P2R matrix with GTR+F+R5+H4 model; (F) P12R matrix with GTR+F+R5+H4 model; (G) P123R matrix with GTR+F+R7+H4 model. Coloring, annotation, and legends are the same as in Figure 4. The “tricky outlier” Tachinus subterraneus is indicated by a black frame.

Figure 6. Maximum likelihood trees built with class-unlinked heterotachy models: (A) AA matrix with mtART+F*H4 model; (B) P2 matrix with TVM+F+R4*H4 model; (C) P12 matrix with GTR+F+R5*H4 model; (D) P123 matrix with GTR+F+R7*H4 model; (E) P2R matrix with GTR+F+R5*H4 model; (F) P12R matrix with GTR+F+R5*H4 model; (G) P123R matrix with GTR+F+R7*H4 model. Coloring, annotation, and legends are the same as in Figure 4. The “tricky outlier” Tachinus subterraneus is indicated by a black frame.

Figure 7. Bayesian consensus tree under the CAT-GTR+G4 model for non-partitioned AA dataset, without clade collapse. Nodes with PP < 0.97 are not shown. Coloring, annotation, and legends are the same as in Figure 4. The tip of Tachinus subterraneus is indicated by a black frame.

Table 1. Information on the ML trees, including different datasets, number of sites, partitioning schemes, substitution model, log-likelihood, AIC (Akaike Information Criterion), and BIC (Bayesian Information Criterion). FP: edge-unlinked full partitioning scheme; MP: merged and edge-unlinked partitioning scheme; NP: non-partitioning scheme (treating the entire sequence as a single locus); t: the number of partitions.

Matrix (Sites)	Partition Scheme (t)	Model	ln(Lik)	AIC	BIC
AA	NP(1)	mtART+F+R6+C60	−123,042.14	246,528.29	247,897.44
(3524)	MP(3)	-	−125,256.50	250,952.99	252,309.81
	NP(1)	C60+FO+R4	−125,549.07	251,534.13	252,878.61
	NP(1)	mtART+F+R6	−125,798.97	251,921.94	252,921.05
	FP(13)	-	−125,464.80	251,525.59	253,363.46
	NP(1)	mtART+F+R6*H4	−125,594.16	252,296.33	255,713.04
	NP(1)	mtART+F+R6+H4	−125,594.17	252,296.34	255,713.05
P2	MP(4)	-	−38,619.95	77,611.90	78,752.64
(3405)	NP(1)	TVM+F+R4	−39,019.88	78,331.76	79,227.18
	FP(13)	-	−38,491.25	77,536.50	79,235.34
	NP(1)	TVM+F+R4*H4	−37,929.36	76,966.72	80,364.41
	NP(1)	TVM+F+R4+H4	−38,646.27	78,376.53	81,700.62
P2_rRNA	MP(4)	-	−62,520.48	125,426.95	126,678.97
(4852)	FP(15)	-	−62,357.30	125,322.60	127,294.70
	NP(1)	GTR+F+R5	−63,912.06	128,122.13	129,088.71
	NP(1)	GTR+F+R5*H4	−62,228.67	125,573.34	129,193.17
	NP(1)	GTR+F+R5+H4	−63,529.66	128,145.33	131,667.85
P12	NP(1)	GTR+F+R5*H4	−98,269.24	197,654.49	201,426.38
(6372)	MP(5)	-	−100,360.79	201,131.57	202,517.30
	FP(13)	-	−100,335.13	201,218.25	203,070.40
	NP(1)	GTR+F+R5	−101,659.33	203,616.66	204,623.85
	NP(1)	GTR+F+R5+H4	−101,062.88	203,211.75	206,882.25
P12_rRNA	NP(1)	GTR+F+R5*H4	−122,181.55	245,479.10	249,365.19
(7819)	MP(6)	-	−124,234.00	248,903.99	250,422.21
	FP(15)	-	−124,177.93	248,963.86	251,081.01
	NP(1)	GTR+F+R5	−126,121.29	252,540.58	253,578.26
	NP(1)	GTR+F+R5+H4	−125,512.24	252,110.49	255,892.11
P123	NP(1)	GTR+F+R7*H4	−211,109.53	423,335.05	427,286.01
(8783)	MP(4)	-	−215,623.52	431,663.04	433,135.80
	FP(13)	-	−215,506.09	431,692.18	434,099.57
	NP(1)	GTR+F+R7	−218,658.81	437,623.62	438,706.95
	NP(1)	GTR+F+R7+H4	−218,140.69	437,367.38	441,212.13
P123_rRNA	NP(1)	GTR+F+R7*H4	−235,257.15	471,630.31	475,666.37
(10230)	MP(5)	-	−239,647.82	479,741.65	481,354.62
	FP(15)	-	−239,509.47	479,754.95	482,416.72
	NP(1)	GTR+F+R7	−243,413.52	487,133.03	488,239.70
	NP(1)	GTR+F+R7+H4	−242,877.46	486,840.93	490,768.49

Table 2. Base composition and skewness of L. arcuatus.

Location	A	T	C	G	Total	AT%	AT Skew	GC Skew
Whole mitochondrial genome	7096	6937	2620	1636	18,290	76.72	0.01	−0.23
Protein-coding genes (PCGs)	3560	4736	1487	1378	11,161	74.33	−0.14	−0.04
1st codon	1282	1298	474	668	3722	69.32	−0.01	0.17
2nd codon	758	1756	700	506	3720	67.58	−0.40	−0.16
3rd codon	1520	1682	313	204	3719	86.10	−0.05	−0.21
PCGs of the forward strand	2242	2725	1104	794	6865	72.35	−0.10	−0.16
1st codon	825	687	364	413	2289	66.06	0.09	0.06
2nd codon	471	1040	474	303	2288	66.04	−0.38	−0.22
3rd codon	946	998	266	78	2288	84.97	−0.03	−0.55
PCGs of the reverse strand	1318	2011	383	584	4296	77.49	−0.21	0.21
1st codon	457	611	110	255	1433	74.53	−0.14	0.40
2nd codon	287	716	226	203	1432	70.04	−0.43	−0.05
3rd codon	574	684	47	126	1431	87.91	−0.09	0.46
tRNAs	561	558	140	178	1437	77.87	0.00	0.12
tRNAs of the forward strand	360	348	105	100	913	77.55	0.02	−0.02
tRNAs of the reverse strand	201	210	35	78	524	78.44	−0.02	0.38
rRNAs	825	828	142	269	2065	80.05	0.00	0.31
l-rRNA	507	535	77	151	1270	82.05	−0.03	0.32
s-rRNA	318	293	65	118	795	76.86	0.04	0.29

Table 3. Relative synonymous codon usage (RSCU) of L. arcuatus. The asterisk represents stop codons.

Condon	Amino Acid	Count	RSCU	Condon	Amino Acid	Count	RSCU	Condon	Amino Acid	Count	RSCU
TTT	Phe	310	1.662	CCA	Pro	48	1.455	AGT	Ser	25	0.599
TTC	Phe	63	0.338	CCG	Pro	4	0.121	AGC	Ser	3	0.072
TTA	Leu	376	4.05	CAT	His	54	1.521	AGA	Ser	90	2.156
TTG	Leu	28	0.302	CAC	His	17	0.479	AGG	Ser	6	0.144
TCT	Ser	102	2.443	CAA	Gln	59	1.873	GTT	Val	63	1.546
TCC	Ser	24	0.575	CAG	Gln	4	0.127	GTC	Val	11	0.27
TCA	Ser	80	1.916	CGT	Arg	15	1.091	GTA	Val	75	1.84
TCG	Ser	4	0.096	CGC	Arg	2	0.145	GTG	Val	14	0.344
TAT	Tyr	126	1.527	CGA	Arg	32	2.327	GCT	Ala	85	2.012
TAC	Tyr	39	0.473	CGG	Arg	6	0.436	GCC	Ala	23	0.544
TAA	*	7	1.4	ATT	Ile	367	1.812	GCA	Ala	57	1.349
TAG	*	3	0.6	ATC	Ile	38	0.188	GCG	Ala	4	0.095
TGT	Cys	31	1.824	ATA	Met	221	1.713	GAT	Asp	56	1.697
TGC	Cys	3	0.176	ATG	Met	37	0.287	GAC	Asp	10	0.303
TGA	Trp	88	1.778	ACT	Thr	90	1.905	GAA	Glu	62	1.632
TGG	Trp	11	0.222	ACC	Thr	17	0.36	GAG	Glu	14	0.368
CTT	Leu	73	0.786	ACA	Thr	78	1.651	GGT	Gly	48	0.99
CTC	Leu	15	0.162	ACG	Thr	4	0.085	GGC	Gly	5	0.103
CTA	Leu	61	0.657	AAT	Asn	173	1.73	GGA	Gly	100	2.062
CTG	Leu	4	0.043	AAC	Asn	27	0.27	GGG	Gly	41	0.845
CCT	Pro	64	1.939	AAA	Lys	86	1.623
CCC	Pro	16	0.485	AAG	Lys	20	0.377

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ji, Q.-Q.; Sun, Y.-N.; Lü, L.; Zhao, T.-Y.; Wu, D.-H. The Complete Mitochondrial Genome of Northeast Asian Rove Beetle, Lordithon arcuatus (Solsky, 1871) and Performance of Site-Specific Mixture Models in Building the Mitogenomic Phylogeny of Staphylinidae (Insecta: Coleoptera). Diversity 2023, 15, 588. https://doi.org/10.3390/d15050588

AMA Style

Ji Q-Q, Sun Y-N, Lü L, Zhao T-Y, Wu D-H. The Complete Mitochondrial Genome of Northeast Asian Rove Beetle, Lordithon arcuatus (Solsky, 1871) and Performance of Site-Specific Mixture Models in Building the Mitogenomic Phylogeny of Staphylinidae (Insecta: Coleoptera). Diversity. 2023; 15(5):588. https://doi.org/10.3390/d15050588

Chicago/Turabian Style

Ji, Qiao-Qiao, Yi-Nuo Sun, Liang Lü, Tian-You Zhao, and Dong-Hui Wu. 2023. "The Complete Mitochondrial Genome of Northeast Asian Rove Beetle, Lordithon arcuatus (Solsky, 1871) and Performance of Site-Specific Mixture Models in Building the Mitogenomic Phylogeny of Staphylinidae (Insecta: Coleoptera)" Diversity 15, no. 5: 588. https://doi.org/10.3390/d15050588

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Complete Mitochondrial Genome of Northeast Asian Rove Beetle, Lordithon arcuatus (Solsky, 1871) and Performance of Site-Specific Mixture Models in Building the Mitogenomic Phylogeny of Staphylinidae (Insecta: Coleoptera)

Abstract

Simple Summary

Abstract

1. Introduction

2. Material and Methods

2.1. Sample Collection, DNA Extraction, and Mitogenome Sequencing

2.2. Mitogenome Assembly, Annotation, and Bioinformatic Analysis

2.3. Phylogenetic Analysis

3. Results and Discussion

3.1. Genome Organization and Base Composition

3.2. Protein-Coding Genes and Codon Usage

3.3. Ribosomal RNAs and Transfer RNAs

3.4. Phylogenetic Results

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI