Use your browser's BACK button to return to your page of origin.
Predicting demographic group structures based on DNA sequence data.
Molecular Biology and Evolution 2003;20(7):1168-2003.
Anderson JP, Learn GH, Rodrigo AG, He X, Wang Y, Weinstock
H, Kalish
ML, Robbins KE, Hood L, Mullins JI.
Abstract
The ability to infer relationships between groups of sequences, either by searching
for their evolutionary history or by comparing their sequence similarity,
can be a crucial step in hypothesis testing. Interpreting relationships of
human immunodeficiency virus type 1 (HIV-1) sequences can be challenging
because of their rapidly evolving genomes, but it may also lead to a better
understanding of the underlying biology. Several studies have focused on
the evolution of HIV-1, but there is little information to link sequence
similarities and evolutionary histories of HIV-1 to the epidemiological information
of the infected individual. Our goal was to correlate patterns of HIV-1 genetic
diversity with epidemiological information, including risk and demographic
factors. These correlations were then used to predict epidemiological information
through analyzing short stretches of HIV-1 sequence. Using standard phylogenetic
and phenetic techniques on 100 HIV-1 subtype B sequences, we were able to
show some correlation between the viral sequences and the geographic area
of infection and the risk of men who engage in sex with men. To help identify
more subtle relationships between the viral sequences, the method of multidimensional
scaling (MDS) was performed. That method identified statistically significant
correlations between the viral sequences and the risk factors of men who
engage in sex with men and individuals who engage in sex with injection drug
users or use injection drugs themselves. Using tree construction, MDS, and
newly developed likelihood assignment methods on the original 100 samples
we sequenced, and also on a set of blinded samples, we were able to predict
demographic/risk group membership at a rate statistically better than by
chance alone. Such methods may make it possible to identify viral variants
belonging to specific demographic groups by examining only a small portion
of the HIV-1 genome. Such predictions of demographic epidemiology based on
sequence information may become valuable in assigning different treatment
regimens to infected individuals.