# Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Complete Genome Datasets

#### 2.2. The Measure of Position-Weighted K-mers

#### 2.3. Distance Calculations

#### 2.4. Selection of the k Value

#### 2.5. Accuracy Test of the Phylogenetic Tree Based on the Robinson–Foulds Distance and Robustness Test Using the Modified Bootstrap Method

## 3. Results

#### 3.1. Subtyping of HIV-1 Based on PWkmer Feature for Complete Genome Sequences

#### 3.2. Application of Our Method on Other Datasets

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Conflicts of Interest

## References

**Figure 1.**The trend chart of k value vs. scoring scheme score(k). The red circles represent the score of the HIV dataset for different k values, and the blue dots represent the score of the HEV dataset for different k value.

**Figure 2.**Subtyping of HIV based on position weighted k-mers feature for whole genome sequences. The Neighbor-Joining (NJ) tree of 44 HIV whole genomes is constructed by position weighted k-mers feature distance matrix $(k=8)$.

**Figure 3.**Subtyping of HIV based on alignment for whole genome sequences. The NJ tree of 44 HIV whole genomes is constructed by ClustalX.

**Figure 5.**The modified bootstrap consensus tree for Figure 2 based on 100 replicates.

No. | Accession | Subtype | Length (bp) | Area |
---|---|---|---|---|

1 | U51190 | A1 | 8999 | Uganda |

2 | AF004885 | A1 | 9160 | Kenya |

3 | AF069670 | A1 | 8813 | Somalia |

4 | AF484509 | A1 | 8807 | Uganda |

5 | AF286237 | A2 | 9060 | Cyprus |

6 | AF286238 | A2 | 8972 | DRC |

7 | AY173951 | B | 8996 | Thailand |

8 | AY331295 | B | 8834 | USA |

9 | AY423387 | B | 9359 | Netherlands |

10 | K03455 | B | 9719 | France |

11 | AF146728 | B | 8887 | Australia |

12 | AF067155 | C | 9002 | India |

13 | AY772699 | C | 9011 | South Africa |

14 | U46016 | C | 9031 | Ethopia |

15 | U52953 | C | 8959 | Brazil |

16 | AY371157 | D | 8379 | Cameroon |

17 | K03454 | D | 9176 | DRC |

18 | U88824 | D | 8952 | Uganda |

19 | AF005494 | F1 | 8968 | Brazil |

20 | AF075703 | F1 | 8925 | Finland |

21 | AF077336 | F1 | 8903 | Belgium (DRC) |

22 | AJ249238 | F1 | 8614 | France |

23 | AF377956 | F2 | 8782 | Cameroon |

24 | AJ249236 | F2 | 8555 | Cameroon |

25 | AJ249237 | F2 | 8589 | Cameroon |

26 | AY371158 | F2 | 8349 | Cameroon |

27 | AF061641 | G | 9047 | Finland(Kenya) |

28 | AF061642 | G | 9074 | Sweden (DRC) |

29 | AF084936 | G | 9707 | Belgium (DRC) |

30 | AF005496 | H | 8953 | Cent.Afr. Rep |

31 | AF190127 | H | 9056 | Belgium |

32 | AF190128 | H | 9707 | Belgium |

33 | AF082394 | J | 8943 | Sweden |

34 | AF082395 | J | 8953 | Sweden |

35 | AJ249235 | K | 8600 | DRC |

36 | AJ249239 | K | 8604 | Cameroon |

37 | AJ006022 | N | 9182 | Cameroon |

38 | AJ271370 | N | 9045 | Cameroon |

39 | AY532635 | N | 8938 | Cameroon |

40 | AJ302647 | O | 9829 | Senegal |

41 | AY169812 | O | 9110 | Cameroon |

42 | L20571 | O | 9793 | Cameroon |

43 | L20587 | O | 9754 | Cameroon |

44 | AF447763 | CPZ | 9326 | Tanzania |

**Table 2.**Robinson–Foulds distances between phylogenetic trees reconstructed by our method at $k=2,3,\dots ,9,10$ in Manhattan distance and the tree reconstructed by ClustalX on the HIV dataset.

Species | k = 2 | k = 3 | k = 4 | k = 5 | k = 6 | k = 7 | k = 8 | k = 9 | k = 10 |
---|---|---|---|---|---|---|---|---|---|

HIV | 74 | 54 | 38 | 26 | 20 | 14 | 10 | 12 | 14 |

