# Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Selection Operator-Redistribution

#### 2.2. Consensus

#### 2.3. MOEA-RC

- Step 1.
- Population initiation: Two options can be chosen to initialize population: (1) using molecular sequences, which have been pre-processed, such as alignment, to randomly build N phylogenetic trees; and (2) using user given trees.
- Step 2.
- Evaluation: Evaluate trees by MP and ML and update reference point at the same time.
- Step 3.
- Redistribution: As described in Section 2.1, separate the population into several sub-populations according to Equation (2) and Sort trees in each sub-population by their fitness computed by Equation (3).
- Step 4.
- Computation of consensus: Generate consensus branches for each sub-population as described above.
- Step 5.
- Generation of offspring: Randomly select two individuals from the current population and do a crossover operation to generate two new solutions. Many crossover operators are available in the literature (Lewis 1998 [13]; Congdon 2002 [34]). We chose Prune–Delete–Graft (PDG), which is described by Lewis (1998) [13], because it has been successfully applied under different criteria. Then, use the nearest neighbor interchange (NNI) [35] to mutate the two new solutions. Repeat this step until N new solutions have been generated.
- Step 6.
- Merging: Merge the offspring with the current population.
- Step 7.
- Repeat: Return to Step 2 and repeat until the stop condition is reached. The stop condition in this paper is the maximum number of evaluation times set by the user. The stop condition in this paper is a given specific number of generations.

## 3. Results

^{2}), thus MOEA-RC will not take more time to run than NSGA-II while N ≥ n. Table 3 shows MP, ML and execution time of MOEA-RC, MOEA/D, NSGA-II and PhyloMOEA on three datasets. In this table, MP and ML are the best results of 10 independent runs of each program and run time is average run time of 10 independent runs. As we can see, MOEA-RC has the best result among these algorithms. In terms of run time, when the number of taxon (molecular sequence) is smaller than the size of the population, the run time of MOEA-RC is less than the others. This advantage reduces when the number of taxon increases. However, MOEA-RC’s actual run time is still not longer than the others.

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Saitou, N.; Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol.
**1987**, 4, 406–425. [Google Scholar] [PubMed] - Gascuel, O. BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol.
**1997**, 14, 685–695. [Google Scholar] [CrossRef] [PubMed] - Felsenstein, J. Inferring Phylogenies; Sinauer Associates: Sunderland, MA, USA, 2004; Volume 2. [Google Scholar]
- Jukes, T.H.; Cantor, C.R.; Munro, H. Evolution of protein molecules. In Mammalian Protein Metabolism; Elsevier: New York, NY, USA, 1969; pp. 21–132. [Google Scholar]
- Hasegawa, M.; Kishino, H.; Yano, T.-A. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol.
**1985**, 22, 160–174. [Google Scholar] [CrossRef] [PubMed] - Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol.
**1993**, 10, 512–526. [Google Scholar] [PubMed] - Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci.
**1986**, 17, 57–86. [Google Scholar] - Cancino, W.; Delbem, A. A multi-criterion evolutionary approach applied to phylogenetic reconstruction. In New Achievements in Evolutionary Computation; InTech: Rijeka, Croatia, 2010. [Google Scholar]
- Lemmon, A.R.; Milinkovitch, M.C. The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA
**2002**, 99, 10516–10521. [Google Scholar] [CrossRef] [PubMed] - Chor, B.; Tuller, T. Maximum likelihood of evolutionary trees: Hardness and approximation. Bioinformatics
**2005**, 21, i97–i106. [Google Scholar] [CrossRef] [PubMed] - Felsenstein, J. PHYLIP: Phylogeny Inference Package. Cladistics
**1989**, 5, 164–166. [Google Scholar] - Strimmer, K.; Von Haeseler, A. Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evol.
**1996**, 13, 964–969. [Google Scholar] [CrossRef] - Lewis, P.O. A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol. Biol. Evol.
**1998**, 15, 277–283. [Google Scholar] [CrossRef] [PubMed] - Matsuda, H. Construction of phylogenetic trees from amino acid sequences using a genetic algorithm. Genome Inform.
**1995**, 6, 19–28. [Google Scholar] - Moilanen, A. Searching for most parsimonious trees with simulated evolutionary optimization. Cladistics
**1999**, 15, 39–50. [Google Scholar] [CrossRef] - Santander-Jiménez, S.; Vega-Rodríguez, M.A. On the design of shared memory approaches to parallelize a multiobjective bee-inspired proposal for phylogenetic reconstruction. Inf. Sci.
**2015**, 324, 163–185. [Google Scholar] [CrossRef] - Goloboff, P.A.; Farris, J.S.; Nixon, K.C. TNT, a free program for phylogenetic analysis. Cladistics
**2008**, 24, 774–786. [Google Scholar] [CrossRef] - Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics
**2014**, 30, 1312–1313. [Google Scholar] [CrossRef] [PubMed] - Minh, B.Q.; Vinh, L.S.; Von Haeseler, A.; Schmidt, H.A. pIQPNNI: Parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics
**2005**, 21, 3794–3796. [Google Scholar] [CrossRef] [PubMed] - Ronquist, F.; Teslenko, M.; Van Der Mark, P.; Ayres, D.L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M.A.; Huelsenbeck, J.P. MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol.
**2012**, 61, 539–542. [Google Scholar] [CrossRef] [PubMed] - Zwickl, D.J. Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets under the Maximum Likelihood Criterion; University of Texas at Austin: Austin, TX, USA, 2006. [Google Scholar]
- Helaers, R.; Milinkovitch, M.C. MetaPIGA v2.0: Maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics. BMC Bioinform.
**2010**, 11, 379. [Google Scholar] [CrossRef] [PubMed] - Macey, J.R. Plethodontid salamander mitochondrial genomics: A parsimony evaluation of character conflict and implications for historical biogeography. Cladistics
**2005**, 21, 194–202. [Google Scholar] [CrossRef] - Burleigh, J.G.; Mathews, S. Assessing systematic error in the inference of seed plant phylogeny. Int. J. Plant Sci.
**2007**, 168, 125–135. [Google Scholar] [CrossRef] - Schmidt, O.; Drake, H.L.; Horn, M.A. Hitherto unknown [Fe-Fe]-hydrogenase gene diversity in anaerobes and anoxic enrichments from a moderately acidic fen. Appl. Environ. Microbiol.
**2010**, 76, 2027–2031. [Google Scholar] [CrossRef] [PubMed] - Poladian, L.; Jermiin, L. Multi-objective evolutionary algorithms and phylogenetic inference with multiple data sets. Soft Comput.
**2006**, 10, 359–368. [Google Scholar] [CrossRef] - Coello, C.C.; Dhaenens, C.; Jourdan, L. Advances in Multi-Objective Nature Inspired Computing; Springer: Berlin/Heidelberg, Germany, 2009; Volume 272. [Google Scholar]
- Ishibuchi, H.; Nojima, Y.; Doi, T. Comparison between single-objective and multi-objective genetic algorithms: Performance comparison and performance measures. In Proceedings of the 2006 IEEE International Conference on Evolutionary Computation, Vancouver, BC, Canada, 16–21 July 2006; pp. 1143–1150. [Google Scholar]
- Deb, K. A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput.
**2000**, 6, 182–197. [Google Scholar] [CrossRef] - Santander-Jiménez, S.; Vega-Rodríguez, M.A. A hybrid approach to parallelize a fast non-dominated sorting genetic algorithm for phylogenetic inference. Concurr. Comput.
**2015**, 27, 702–734. [Google Scholar] [CrossRef] - Santander-Jiménez, S.; Vega-Rodríguez, M.A. Parallel multiobjective metaheuristics for inferring phylogenies on multicore clusters. IEEE Trans. Parallel Distrib. Syst.
**2015**, 26, 1678–1692. [Google Scholar] [CrossRef] - Zhang, Q.; Li, H. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Trans. Evol. Comput.
**2007**, 11, 712–731. [Google Scholar] [CrossRef] - Jansson, J.; Shen, C.; Sung, W.-K. Improved algorithms for constructing consensus trees. JACM
**2016**, 63, 28. [Google Scholar] [CrossRef] - Congdon, C.B. Gaphyl: An evolutionary algorithms approach for the study of natural evolution. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, New York, NY, USA, 9–13 July 2002; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 2002; pp. 1057–1064. [Google Scholar]
- Waterman, M.S.; Smith, T.F. On the similarity of dendrograms. J. Theor. Biol.
**1978**, 73, 789–800. [Google Scholar] [CrossRef] - Ingman, M.; Gyllensten, U. mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences. Nucleic Acids Res.
**2004**, 34, 749–751. [Google Scholar] [CrossRef] [PubMed] - Chase, M.W.; Soltis, D.E.; Olmstead, R.G.; Morgan, D.; Les, D.H.; Mishler, B.D.; Duvall, M.R.; Price, R.A.; Hills, H.G.; Qiu, Y.-L. Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL. Ann. Mo. Bot. Gard.
**1993**, 80, 528–580. [Google Scholar] [CrossRef]

**Figure 1.**Illustration of redistribution. Rectangles marked W1–Wn are populations for different weight vectors. m is the number of defaults of all the individuals. The number of individual cycles is denoted by 2 × m. m number of grey cycles are eliminated in each iteration of redistribution.

**Figure 2.**Pareto Front (PF) of the four algorithms in the three datasets at 1000 evaluation: (

**a**) performances on rbcl_55; (

**b**) performances on mtDNA_186; and (

**c**) performances on ZILLA_500.

**Figure 3.**Change of ML and MP of the algorithms with various multiples of evaluation on rbcl_55: (

**a**) Maximum Parsimony; and (

**b**) Maximum Likelihood.

**Figure 4.**Change of MP and ML of the algorithms with various multiples of evaluation on mtDNA_186: (

**a**) Maximum Parsimony; and (

**b**) Maximum Likelihood.

Dataset | Number of Sequence | Nucleotides per Sequence | Data Description |
---|---|---|---|

rbcl_55 | 55 | 1314 | rbcl plastid gene [13] |

mtDNA_186 | 186 | 16,608 | Human mitochondrial DNA [36] |

ZILLA_500 | 500 | 759 | 500 rbcL sequences from plant plastids [37] |

Parameter | Value |
---|---|

Population size | 100 |

Generations | 100 |

Selection method | Binary tournament [9] |

Crossover method | Prune-Delete-Graft [13] |

Crossover probability | 0.8 |

Mutation method | NNI [35] |

Mutation probability | 0.2 |

Substitution model | GTR [7] |

**Table 3.**The comparison of several multi-objective algorithms on three real-world datasets. The ML and MP in this table is the best value after 100 generations of each algorithm. The mark of (+) means the value is better than the other algorithms.

Algorithm | Metrics | Dataset | ||
---|---|---|---|---|

rbcl_55 | mtDNA_186 | ZILLA_500 | ||

MOEA-RC | ML | −22,156.4 (+) | −39,927.6 (+) | −84,633.1 (+) |

MP | 4977 (+) | 2460 (+) | 17,186 (+) | |

Run time | 1059.80 | 66,215.13 | 13,016.22 | |

MOEA/D | ML | −22,169.6 | −39,937.5 | −84,715.9 |

MP | 4979 | 2461 | 17,194 | |

Run time | 1302.64 | 65,620.20 | 13,082.54 | |

NSGA-II | ML | −22,193.3 | −39,942.5 | −84,719.4 |

MP | 4979 | 2463 | 17,192 | |

Run time | 1287.07 | 66,010.94 | 12,812.62 | |

PhyloMOEA | ML | −2200.1 | −39,938.2 | −84,704.6 |

MP | 4982 | 2461 | 17,191 | |

Run time | 2163.4 | >24 h | >24 h | |

MOEA-R | ML | −22,244.7 | −39,984.6 | −84,714.6 |

MP | 4979 | 2464 | 17,196 | |

Run time | 1061.13 | 65,174.37 | 12,442.53 |

**Table 4.**The comparison of MOEA-RC and several phylogenetic software packages on three real-world datasets. The mark of (+) means the value is better than the other algorithms.

Metrics | Algorithm | Dataset | ||
---|---|---|---|---|

rbcl_55 | mtDNA_186 | ZILLA_500 | ||

MP | MOEA-RC | 4977 (+) | 2460 | 17,186 |

DNAPARS | 4984 | 2438 (+) | 17,055 (+) | |

MEGA 7 | 4978 | 2457 | 17,077 | |

ML | MOEA-RC | −22,156.4 (+) | −39,927.6 (+) | −84,633.1 (+) |

Raxml | −22,188.2 | −40,921.0 | −95,577.0 | |

MEGA 7 | −22,205.2 | −40,957.0 | −95,834.0 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Min, X.; Zhang, M.; Yuan, S.; Ge, S.; Liu, X.; Zeng, X.; Xia, N.
Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies. *Int. J. Mol. Sci.* **2018**, *19*, 62.
https://doi.org/10.3390/ijms19010062

**AMA Style**

Min X, Zhang M, Yuan S, Ge S, Liu X, Zeng X, Xia N.
Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies. *International Journal of Molecular Sciences*. 2018; 19(1):62.
https://doi.org/10.3390/ijms19010062

**Chicago/Turabian Style**

Min, Xiaoping, Mouzhao Zhang, Sisi Yuan, Shengxiang Ge, Xiangrong Liu, Xiangxiang Zeng, and Ningshao Xia.
2018. "Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies" *International Journal of Molecular Sciences* 19, no. 1: 62.
https://doi.org/10.3390/ijms19010062