HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Geography Search Explanation

HIV Database Geography

Background

The HIV geography tool shows the geographic distribution of HIV-1 sequences and their subtypes. The distributions shown are based on the same information available from the HIV sequence search interface. These maps are derived from the database, and are updated nightly. Maps were created using GMT (Generic Mapping Tools) and iGMT (interactive Mapping of Geoscientific Datasets).

Data do not represent population frequency!

Be careful not to draw conclusions about the epidemiology of HIV-1 from the subtype distribution presented here. The data stored in the database are taken from publications in the literature, and in general there is no epidemiological framework; the database is just a listing of available sequences and subtypes, and the maps a summary. The geographic distribution shown on the maps is based on the country of sequence isolation. While this set is not particularly helpful for epidemiological purposes, one can get a sense of how intensively a region has been studied and a rough indication of the subtype distributions.

An example of how the sequences may not reflect subtypes in a given country is the following:

Imagine a scenario where HMA revealed 90% subtype A, 10% other, in a particular study. The laboratory may have sequenced only the 10% non-A for confirmation of subtype, and thus the sequences from this study would not reflect the distribution of subtypes circulating in the study population. There has been a tendency in many studies to sequence what is most unusual, and this scenario is not uncommon.

On the other hand, C subtype dominates the South African epidemic, and almost all sequences from South Africa are C's, so this impression is conveyed appropriately by the map. In summary, one should simply interpret this map with caution.

Examples

The following are examples of the kinds of searches you can do using this interface.

Example 1. What are some CRFs that commonly occur in sequences from Brazil?

Select HIV-1/Brazil and click the “Show recombinant sequences” button. The resulting pie chart shows just the CRF and other recombinant sequences sampled in Brazil. To see what subtypes are represented in the “Other” category, click the corresponding pie slice, and you will see them listed.

Example 2. Are non-A subtypes of HIV-2 more common among sequences from Asia or Africa?

This question requires 2 separate searches: one for HIV-2/Asia, and one for HIV-2/Africa. The resulting pie charts can be downloaded and compared. However, it is important to note that small sampling sizes can be a problem. At the time of this writing, there are only 182 HIV-2 sequences available from Asia, and only 3 of them are non-A subtype.

Example 3. Do all countries in Asia have similar distributions of HIV-1 sequence subtypes in the database?

Search for HIV-1/Asia “show all”. The resulting map shows Asia with a separate pie chart for each country. It is clear at a glance there are vast differences in subtype distributions of sequences collected from Asian countries. You can further click on each country to view and download its subtype distribution individually.

Example 4. What are the rarest HIV-1 sequence subtypes ever sampled in Sweden?

This is a search that would be difficult using the regular Search Interface, but easy here. Select HIV-1/Sweden and “Show all”. In the resulting pie chart, click on the “Other” slice. The resulting list shows the subtypes of the sequences that were too rare to display as separate pie slices.

Additional limitations and details

There is redundancy in the information that can be extracted by the Geography Search and the regular Search Interface. For example, if you want to extract all subtype B sequences from Africa, you could use either interface. The main reason that you may prefer to do the search using the Geography Search is the graphic presentation of all subtypes. The graphic output may provide some interesting insights that you would not notice in the lists obtained from the sequence Search Interface. However, if the objective is only to extract sequences of a single subtype, without any interest in the geographic distribution of that subtype or the representation of that subtype relative to others, then the regular sequence search interface may serve you better.

The results from this tool need to be interpreted with care: it is easy to overlook the sampling biases that can distort the frequencies of sequences in the database relative to those in the population. Do not draw conclusions about the epidemiology of HIV-1 from the subtype distribution presented here. The data stored in the database are taken from publications in the literature, and in general, there is no epidemiological framework - the database is just a listing of available sequences. Many studies focus on rarer subtypes and recombinants, and this tends to cause overrepresentation of such sequences. Furthermore, the distribution shown on the maps is based only on the country of sequence isolation, which is not always the country of infection. While the results of this tool are not particularly helpful for epidemiological purposes, one can still use the tool to get a sense of how intensively a region has been studied and a rough indication of the subtype distributions.

The data generated by this search are only as good as the annotation of the sequences in the database. There can be errors in the classification of sequences, so all results should be interpreted carefully. When examining rare subtypes, it may be worthwhile to verify the correct subtyping of specific sequences. Furthermore, not all the sequences in the HIV sequence database have an assigned subtype. At the time of this writing, approximately 15% of all sequences in the database have no annotation of subtype. Sequences where the subtype field is blank are not reported in the output from this tool.

Small sample sizes can also be problem with this tool. In countries where there are few subtyped sequences, one or a few studies (with whatever their objectives may have been) determine the output. This is particularly true for HIV-2, for which far fewer sequences are available than for HIV-1.

This tool lumps together sequences that are annotated with certain sub-subtypes. For example, the sequences listed as subtype “A” from this tool include all sequences annotated as “A”, “A1”, or “A2”. Sequences listed as subtype “F” from this tool include all sequences annotated as “F”, “F1”, or “F2”. However, sub-subtypes that are part of recombinants are not lumped together. For example, “BF”, “BF1”, and “BF2” are each graphed separately.

Additional resources

Other sites that provide useful information of HIV prevalence by geographic region (though not by subtype) is the website of the US Census Bureau, and the UNAIDS AIDS Epidemic Update.

last modified: Thu Jun 5 08:51 2008


Questions or comments? Contact us at seq-info@lanl.gov.