HCV Database
HCV sequence database
 


Please read an important announcement about the future of the HCV database here.
PhyloPlace Documentation


Overview of the PhyloPlace Service

The Phylogenetic Placement (phyloplace) services provided here include Pairwise Distance and Branching Index analyses. Both tools start with a single sequence in FASTA format. The sequence is aligned with reference sequences and then analyzed accordingly. Both analysis methods use PAUP*. Pairwise Distance uses uncorrected distances, while Branching Index uses branch lengths from neighbor-joining trees with F84 (Felsenstein, 1984) and BioNJ (Gascuel, 1997).

Pairwise Distance. This summarizes the distribution of pairwise distances among aligned sequences. For n sequences, there are 1/2(n)(n-1) pairwise comparisons. When selected, a menu appears, with options for what pairs of distances to compute. The default option is "Type and Subtype", which plots your sequence in the context of Type by Type, within Type, between Subtype, and within Subtype.

Distances are shown as a histogram, with number of sequence pairs (y-axis) as a function of distance (x-axis). This distribution typically has three peaks (cf. Van Regenmortel 2007). The three peaks correspond to distances (1) within the same subtype, (2) between subtypes of the same genotype, and (3) between genotypes. This method indicates how closely the query sequence is related with sequences in the reference set. In each case, the query sequence is compared with the reference sequences. The resulting distances are summarized by a histogram.

Example of pairwise distance analysis results

The example above shows results from "Type and Subtype" P-dist analysis. Note the tri-modal distribution of distances between Types (blue); within Types, between Subtypes (yellow); and within Types, within Subtypes (green). Distances associated with the query sequence are colored red.


Branching Index. This approach quantifies relatedness with known clades as a ratio of branch lengths where your sequence connects to the reference tree (Wilbe et al., 2003). Values range from 0 (unrelated) to 1 (perfectly related) and are compared with a threshold to infer when the degree of relatedness is significant.

A Branching Index profile slides overlapping windows over the sequence. The window length is 400 nt and moves 80 nt between 2 windows. A minimum sequence length of 200 nt is required. The analysis can take considerable time to complete, longer for lengthy sequences. The result is a profile of branching index values over the extent of the query sequence. Line color indicates predicted taxa, and a horizontal line is drawn to delineate between significant (above) and insignificant (below) results (Wilbe et al., 2003; Hraber et al., submitted).

example BI output

The example above illustrates results from Branching Index analysis of a genome sequence with accession number AY651061. Line color depicts the most closely related subtype clade in a phylogenetic tree. Putative recombination breakpoints are found where value of the BI function is minimal. Multiple breakpoints between subtypes 1a and 1c are clearly evident as alternating peaks in BI values that correspond to different subtypes.

References

Felsenstein J. (1984) Distance methods for inferring phylogenies: a justification. Evolution 38:16-24.

Gascuel O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685-695.

Hraber P, Kuiken C, Waugh M, Geer S, Bruno W, Leitner T. (submitted) Automatic classification of HCV and HIV-1 sequences with the branching index.

Van Regenmortel MHV. (2007) Virus species and virus identification: past and current controversies. Infection, Genetics, and Evolution, 7:133-144.

Wilbe K, Salminen M, Laukkanen T, McCutchan F, Ray SC, Albert J, Leitner T. (2003) Characterization of novel recombinant HIV-1 genomes using the branching index. Virology 316:116-125.




Questions or comments? Contact us at hcv-info@lanl.gov