Overview of the PhyloPlace Service
The Phylogenetic Placement (phyloplace) services provided here include
Pairwise Distance and Branching Index analyses. Both
tools start with a single sequence in FASTA format. The
sequence is aligned with reference sequences and then analyzed
accordingly. Both analysis methods use PAUP*. Pairwise
Distance uses uncorrected distances, while Branching Index
uses branch lengths from neighbor-joining trees with F84 (Felsenstein,
1984) and BioNJ (Gascuel, 1997).
Pairwise Distance. This summarizes the distribution of pairwise
distances among aligned sequences. For n sequences, there are
1/2(n)(n-1) pairwise comparisons. When selected, a menu
appears, with options for what pairs of distances to compute. The
default option is "Type and Subtype", which plots your sequence in the
context of Type by Type, within Type, between Subtype, and within
Subtype.
Distances are shown as a histogram, with number of sequence pairs
(y-axis) as a function of distance (x-axis). This
distribution typically has three peaks (cf. Van Regenmortel 2007).
The three peaks correspond to distances (1) within the same subtype,
(2) between subtypes of the same genotype, and (3) between genotypes.
This method indicates how closely the query sequence is related with
sequences in the reference set. In each case, the query sequence is
compared with the reference sequences. The resulting distances are
summarized by a histogram.
The example above shows results from "Type and Subtype" P-dist
analysis. Note the tri-modal distribution of distances between Types
(blue); within Types, between Subtypes (yellow); and within Types,
within Subtypes (green). Distances associated with the query sequence
are colored red.
Branching Index. This approach quantifies relatedness with
known clades as a ratio of branch lengths where your sequence connects
to the reference tree (Wilbe et al., 2003). Values range from 0
(unrelated) to 1 (perfectly related) and are compared with a threshold
to infer when the degree of relatedness is significant.
A Branching Index profile slides overlapping windows over the
sequence. The window length is 400 nt and moves 80 nt between 2
windows. A minimum sequence length of 200 nt is required. The
analysis can take considerable time to complete, longer for lengthy
sequences. The result is a profile of branching index values over the
extent of the query sequence. Line color indicates predicted taxa,
and a horizontal line is drawn to delineate between significant
(above) and insignificant (below) results (Wilbe et al., 2003; Hraber
et al., submitted).
The example above illustrates results from Branching Index analysis of
a genome sequence with accession number AY651061. Line color depicts
the most closely related subtype clade in a phylogenetic tree.
Putative recombination breakpoints are found where value of the BI
function is minimal. Multiple breakpoints between subtypes 1a and 1c
are clearly evident as alternating peaks in BI values that correspond
to different subtypes.
References
Felsenstein J. (1984) Distance methods
for inferring phylogenies: a justification. Evolution
38:16-24.
Gascuel O. (1997) BIONJ: an improved
version of the NJ algorithm based on a simple model of sequence
data. Mol Biol Evol 14:685-695.
Hraber P, Kuiken C, Waugh M, Geer S,
Bruno W, Leitner T. (submitted) Automatic classification of HCV and
HIV-1 sequences with the branching index.
Van Regenmortel MHV. (2007) Virus species
and virus identification: past and current controversies.
Infection, Genetics, and Evolution, 7:133-144.
Wilbe K, Salminen M, Laukkanen T,
McCutchan F, Ray SC, Albert J, Leitner T. (2003) Characterization of
novel recombinant HIV-1 genomes using the branching
index. Virology 316:116-125.
|