In this Section, we present two new heuristic algorithms for the MaxRTC problem.
4.1. FastTree
The first heuristic algorithm has a bottomup greedy approach, which is faster than the other previously known algorithms employing a simple data structure.
Let
R(T) denote the set of all triplets consistent with a given tree,
T.
R(T) is called the
reflective triplet set of
T. It forms a minimally dense triplet set and represents
T uniquely [
17]. Now, we define the
closeness of the pair,
{i,j}. The closeness of the pair,
{i,j},
${C}_{i,j}$, is defined as the number of triplets of the form,
ijk, in a triplet set. Clearly, for any arbitrary tree,
T, the closeness of a cherry species equals
$n2$, which is maximum in
R(T). The reason is that every cherry species has a triplet with every other species. Now, suppose we contract every cherry species of the form, {
i,j}, to their parents,
${p}_{ij}$, and then update
R(T) as follows. For each contracted cherry species, {
i,j}, we remove triplets of the form,
ij
k, from
R(T) and replace
i and
j with
${p}_{ij}$ within the remaining triplets. The updated set,
${R}^{\prime}\left({T}^{\prime}\right)$, would be the reflective triplet set for the new tree,
${T}^{\prime}$. Observe that, for cherries of the form,
$\{{p}_{ij},k\}$, in
${T}^{\prime}$,
${C}_{i,k}$ and
${C}_{j,k}$ would equal n3 in
R(T). Similarly, for cherries of the form,
$\{{p}_{ij},{p}_{kl}\}$, in
${T}^{\prime}$,
${C}_{i,k}$,
${C}_{j,k}$,
${C}_{i,l}$ and
${C}_{j,l}$ would equal n4 in
R(T). This forms the main idea of the first heuristic algorithm. We first compute the closeness of pairs of species by visiting triplets. Furthermore, sorting the pairs according to their closeness gives us the reconstruction order of the tree. This routine outputs the unique tree,
T, for any given reflective triplet set,
R(T). Yet, we have to consider that the input triplet set is not always a reflective triplet set. Consequently, the reconstruction order produced by sorting may not be the right order. However, if the loss of triplets admits a uniform distribution, it will not affect the reconstruction order. An approximate solution for this problem is refining the closeness. This can be done by reducing the closeness of the pairs, {
i,k} and {
j,k}, for any visited triplet of the form,
ij
k. Thus, if the pair, {
i,j}, is actually cherries, then the probability of choosing the pairs, {
i,k} or {
j,k}, before choosing the pair, {
i,j}, due to triplet loss, will be reduced. We call this algorithm FastTree. See Algorithm 1 for the whole algorithm.
Algorithm 1 FastTree 
 1:
Initialize a forest, F, consisting of n onenode trees labeled by species.  2:
for each triplet of the form ijk do  3:
${C}_{i,j}$: = ${C}_{i,j}$+1  4:
${C}_{i,k}$: = ${C}_{i,k}$−1  5:
${C}_{j,k}$: = ${C}_{j,k}$−1  6:
end for  7:
Create a list, L, of pairs of species.  8:
Sort L according to the refined closeness of pairs with a lineartime sorting algorithm.  9:
while L>0 do  10:
Remove the pair, {i,j}, with maximum, ${C}_{i,j}$.  11:
if i and j are not in the same tree then  12:
Add a new node and connect it to roots of trees containing i and j.  13:
end if  14:
end while  15:
if F has more than one tree then  16:
Merge trees in any order, until there would be only one tree.  17:
end if  18:
return the tree in F

Theorem 1. FastTree runs in $O(m+\alpha \left(n\right){n}^{2})$ time.
Proof. Initializing a forest in Step 1 takes
$O\left(n\right)$ time. Steps 2–6 take
$O\left(m\right)$ time. We know that the closeness is an integer value between 0 and
$n2$. Thus, we can employ a lineartime sorting algorithm [
18]. There are
$O\left({n}^{2}\right)$ possible pairs; therefore, Step 8 takes
$O\left({n}^{2}\right)$ time. Similarly, the while loop in Step 9 takes
$O\left({n}^{2}\right)$ time. Each removal in Step 10 can be done in
$O\left(1\right)$ time. By employing optimal data structures, which are used for disjointset unions [
18], the amortized time complexity of Steps 11 and 12 will be
$O\left(\alpha \right(n\left)\right)$, where
$\alpha \left(n\right)$ is the inverse of the function,
$f\left(x\right)=A(n,n)$, and
A is the wellknown fastgrowing
Ackermann function. Furthermore, Step 16 takes
$O\left(n\alpha \right(n\left)\right)$ time. Hence, the running time of FastTree would be
$O(m+\alpha \left(n\right){n}^{2})$. ☐
Since
$A(4,4)={2}^{{2}^{{2}^{65536}}}$,
$\alpha \left(n\right)$ is less than four for any practical input size,
n. In comparison to the fast version of Aho
et al.’s algorithm, FastTree employs a simpler data structure and, in comparison to Aho
et al.’s original algorithm, it has lower time complexity. Yet, the most important advantage of FastTree to Aho
et al.’s algorithm is that it will not stick if there is not a consistent tree with the input triplets, and it will output a proper tree in such a way that the clusters are very similar to that of the real network. The tree in
Figure 2 is the output of FastTree on a dense set of triplets based on the yeast,
Cryptococcus gattii, data. There is no consistent tree with the whole triplet set; however, Van Iersel
et al. [
19] presented a level2 network consistent with the set (see
Figure 3). This set is available online [
20]. In comparison to BPMR and BPMF, FastTree runs much faster for large sets of triplets and species. However, for highly sparse triplet sets, the output of FastTree may satisfy considerably less triplets than the tree constructed by BPMF or BPMR.
Figure 2.
Output of FastTree for a dense triplet set of the yeast, Cryptococcus gattii, data.
Figure 2.
Output of FastTree for a dense triplet set of the yeast, Cryptococcus gattii, data.
4.2. BPMTR
Before explaining the second heuristic algorithm, we need to review BPMF [
11] and BPMR [
16]. BPMF utilizes a bottomup approach similar to hierarchical clustering. Initially, there are
n trees, each containing a single node representing one of
n given species. In each iteration, the algorithm computes a function, called
$e\_score$, for each combination of two trees. Furthermore, two trees with the maximum
$e\_score$ are merged into a single tree by adding a new node as the common parent of the selected trees. Wu [
11] introduced six alternatives for computing the
$e\_score$ using combinations of
w,
p and
t. (see
Table 1). However, in each run, one of the six alternatives must be used. In the function,
$e\_score({C}_{1},{C}_{2})$,
w is the number of triplets satisfied by merging
${C}_{1}$ and
${C}_{2}$, which is the number of triplets of the form
ij
k, in which
i is in
${C}_{1}$,
j is in
${C}_{2}$ and
k is neither in
${C}_{1}$ nor in
${C}_{2}$. The value of
p is the number of triplets that are in conflict with merging
${C}_{1}$ and
${C}_{2}$. It is the number of triplets of the form,
ij
k, in which
i is in
${C}_{1}$,
k is in
${C}_{2}$ and
j is neither in
${C}_{1}$ nor in
${C}_{2}$. The value of
t is the total number of triplets of the form,
ij
k, in which
i is in
${C}_{1}$ and
j is
${C}_{2}$. Wu compared the BPMF with
OneLeafSplit and
MinCutSplit and showed that BPMF works better on randomly generated triplet sets. He also pointed out that none of the six alternatives of
$e\_score$ is significantly better than the other.
Figure 3.
A Level2 network for a dense triplet set of the yeast, Cryptococcus gattii, data.
Figure 3.
A Level2 network for a dense triplet set of the yeast, Cryptococcus gattii, data.
Table 1.
The six alternatives of e_score.
Table 1.
The six alternatives of e_score.
IfPenalty   Ratio Type  

False  w  w/(w + p)  w/t 
True  w − p  (w − p)/(w + p)  (w − p)/t 
Maemura
et al. [
16] introduced a modified version of BPMF, called BPMR, that outperforms the results of BPMF. BPMR works very similarly in comparison to BPMF, except for a reconstruction step used in BPMR. Suppose
${T}_{x}$ and
${T}_{y}$ are two trees having the maximum,
$e\_score$, at some iteration and are selected to merge into a new tree. By merging
${T}_{x}$ and
${T}_{y}$, some triplets will be satisfied, but some other triplets will be in conflict. Without loss of generality, suppose
${T}_{x}$ has two subtrees, namely the left subtree and the right subtree. In addition, suppose a triplet,
ij
k, in which
i is in the left subtree of
${T}_{x}$,
k is in the right subtree of
${T}_{x}$ and
j is in
${T}_{y}$. Observe that by merging
${T}_{x}$ and
${T}_{y}$, the mentioned triplet becomes inconsistent. However, swapping
${T}_{y}$ with the right subtree of the
${T}_{x}$ satisfies this triplet, while some other triplets become inconsistent. It is possible that the resulting tree of this swap satisfies more triplets than the primary tree. This is the main idea behind the BPMR. In BPMR, in addition to the regular merging of
${T}_{x}$ and
${T}_{y}$,
${T}_{y}$ is swapped with the left and the right subtree of
${T}_{x}$, and also,
${T}_{x}$ is swapped with the left and the right tree of
${T}_{y}$. Finally, among these five topologies, we choose the one that satisfies the most triplets.
Suppose the left subtree of the ${T}_{x}$ also has two subtrees. Swapping ${T}_{y}$ with one of these subtrees would probably satisfy new triplets, while some old ones would become inconsistent. There are examples in which this swap results in a tree that satisfies more triplets. This forms our second heuristic idea that swapping of ${T}_{y}$ with every subtree of ${T}_{x}$ should be checked. ${T}_{x}$ should also be swapped with every subtree of ${T}_{y}$. At every iteration of BPMF after choosing two trees maximizing the , the algorithm tests every possible swapping of these two trees with subtrees of each other and, then, chooses the tree with the maximum consistency of the triplets. We call this algorithm BPMTR (Best Pair Merge with Total Reconstruction). See Algorithm 2 for details of the BPMTR.
Algorithm 2 BPMTR 
 1:
Initialize a set, T, consisting of n onenode trees labeled by species.  2:
while T>1 do  3:
Find and remove two trees, ${T}_{x}$, ${T}_{y}$, with maximum $e\_score$.  4:
Create a new tree, ${T}_{merge}$, by adding a common parent to ${T}_{x}$ and ${T}_{y}$  5:
${T}_{best}$ : = ${T}_{merge}$  6:
for each subtree ${T}_{sub}$ of ${T}_{x}$ do  7:
Let ${T}_{swapped}$ be the tree constructed by swapping ${T}_{sub}$ with ${T}_{y}$  8:
if the number of consistent triplets with ${T}_{swapped}$ was larger than the number of triplets consistent with ${T}_{best}$ then  9:
${T}_{best}$ : = ${T}_{swapped}$  10:
end if  11:
end for  12:
for each subtree ${T}_{sub}$ of ${T}_{y}$ do  13:
Let ${T}_{swapped}$ be the tree constructed by swapping ${T}_{sub}$ with ${T}_{x}$  14:
if the number of consistent triplets with ${T}_{swapped}$ was larger than the number of triplets consistent with ${T}_{best}$ then  15:
${T}_{best}$ : = ${T}_{swapped}$  16:
end if  17:
end for  18:
Add ${T}_{best}$ to T.  19:
end while  20:
return the tree in T

Theorem 2. BPMTR runs in $O\left(m{n}^{3}\right)$ time.
Proof. Step 1 takes
$O\left(n\right)$ time. In Step 2, initially,
T contains
n clusters, but in each iteration, two clusters merge into a cluster. Hence, the while loop in Step 2 takes
$O\left(n\right)$ time. In Step 3,
$e\_score$ is computed for every subset of
T of size two. By applying Bender and FarachColton’s preprocessing algorithm [
21], which runs in
$O\left(n\right)$ time for a tree with
n nodes, every LCAquery can be answered in
$O\left(1\right)$ time. Therefore, the consistency of a triplet with a cluster can be checked in
$O\left(1\right)$ time. Since there are
m triplets, Step 3 takes
$\left(\genfrac{}{}{0pt}{}{\leftT\right}{2}\right)O\left(m\right)$ time. In Steps 5, 9 and 15,
${T}_{best}$ is a pointer that stores the best topology found so far during each iteration of the while loop in
$O\left(1\right)$ time. The complexity analysis of the loops in Steps 6–11 and 12–17 are similar, and it is enough to consider one. Every rooted binary tree with
n leaves has
$O\left(n\right)$ internal nodes, so the total number of swaps in Step 7 for any two clusters will be at most
$O(nT\left\right)$. In Step 8, computing the number of consistent triplets with
${T}_{swapped}$ takes no more than
$O\left(m\right)$ time. Steps 4, 7 and 18 are implementable in
$O\left(1\right)$ time. Accordingly, the running time of Steps 2–19 would be:
Step 20 takes $O\left(1\right)$ time. Hence, the time complexity of BPMTR is $O\left(m{n}^{3}\right)$. ☐
We tested BPMTR over randomly generated triplet sets with
n = 15, 20 species and
m = 500, 1,000 triplets. We experimented hundreds of times for each combination of
n and
m. The results in
Table 2 indicate that BPMTR outperforms BPMR. However, in these hundreds of tests, there were a few examples of BPMR performing better than BPMTR. For
n = 30 and
m = 1,000, in 62 triplet sets out of a hundred randomly generated triplet sets, BPMTR satisfied more triplets. In 34 triplet sets, BPMR and BPMTR had the same results, and in four triplet sets, BPMR satisfied more triplets.
Table 2.
Performance results of Best Pair Merge with Total Reconstruction (BPMTR) in comparison to Best Pair Merge with Reconstruction (BPMR).
Table 2.
Performance results of Best Pair Merge with Total Reconstruction (BPMTR) in comparison to Best Pair Merge with Reconstruction (BPMR).
No. of species and triplets  % better results  % worse results 

n = 20, m = 500  %29  %0.0 
n = 20, m = 1000  %37  %1 
n = 30, m = 500  %61  %3 
n = 30, m = 1000  %62  %4 