HIV Databases HIV Databases home HIV Databases home
HIV sequence database



HIV and SIV Nomenclature

Overview

Currently defined subtypes are:

Nomenclature within the HIV-1 M Group

M group Subtypes

The HIV-1 M group subtypes are phylogenetically associated groups or clades of HIV-1 sequences, and are labeled A1, A2, B, C, D, F1, F2, G, H, J and K. The sequences within any one subtype or sub-subtype are more similar to each other than to sequences from other subtypes throughout their genomes. These subtypes represent different lineages of HIV, and have some geographical associations. Although there are many ambiguities in the subtyping system, it describes genetic clustering patterns and provides a useful system for organizing viruses by genetic similarity. Since the subtypes were originally defined based only on fragments of the HIV-1 genome, in some cases no intact prototype sequence was available (the former subtypes E and I, for example, which are both now defined as circulating recombinant forms). In these situations there was scant information to differentiate between parental and recombined forms of the virus, and the nomenclature was controversial. Additionally, viruses within one subtype may be evolving at different rates, and there may also be differences in rates between different subtypes, especially in some limited regions of the genome. As new sequences and better analysis tools become available, some of these ambiguities may be resolved, and the subtyping nomenclature system is itself evolving as we learn more. We attempt to keep our database up to date with the HIV research community consensus as we perceive it; therefore the nomenclature in the database is not static. Each year we gather a set of subtype reference sequences that are considered to be representative of complete (or near complete) genomes of all of the subtypes and circulating recombinant forms of the HIV-1 M group, and isolates from the HIV-1 N and O groups. Subtype Reference Alignments of each gene and complete genomes of these subtype references sequences are also available.

We have created a web-accessible program called SUDI for testing whether a newly sequenced, non-recombinant genome fits the criteria of a "new subtype". This tool is designed to be used after phylogenetic analyses of subgenomic regions, and other methods such as bootscanning or RIP have been used to determine that the genome is equidistant from currently defined subtypes over its entire length, i.e. it is not a recombinant of existing subtypes or CRFs.

M Group Recombinants and Circulating Recombinant Forms

All retroviruses have a propensity to recombine with other relatively closely related retroviruses, and HIVs and SIVs are no exception. The viral genome is packaged as two copies of ss-RNA (not to be confused with ds-RNA) and if a given cell is infected with two different viral genomes (from the same or differnet strains) the odds are good that some virions package one copy from each of those two viruses. If the two strains belong to different subtypes of the HIV-1 M group, the result can be a mosaic genome composed of regions from each of the two subtypes, due to the fact that the viral reverse transcriptase engages in "template switching", or hopping from one of the packaged genomes to the other, during reverse transcription, after the co-packaged genomes enter a new cell.

Inter-subtype recombinant genomes are common, but many of them are found only in the single dually-infected (or multiply-infected) individual patient in which they arose. If an inter-subtype recombinant virus is transmitted from one patient to others, and becomes one of the circulating strains in the HIV epidemic, it can be classified as a "circulating recombinant form (CRF)".

The circulating recombinant forms are labeled with numbers rather than letters, and numbered in the order in which they were first adequately described in the peer-reviewed literature.

How the HIV database classifies sequences

The classification and naming of sequences, CRFs, and recombinants is fairly complicated. We have a separate page to explain how the HIV Database Classifies Subtypes.

 

HIV-1 N and O Groups

HIV-1 groups M, N and O, as well as chimpanzee and gorilla SIVs, are all part of same SIVcpzPtt radiation within the primate lentviruses. Group M is the "main" group of viruses in the HIV-1 global pandemic, and it contains multiple subtypes and recombinant forms described above, and is thought to have originated as a cross-species transmission from chimpanzee to humans. Group N is a very distinctive form of the virus that has only been identified in a few individuals in Cameroon. N is sometime referred to as Not-M, Not-O, also sometimes as the "new" group, and is also thought to have originated in a chimpanzee zoonosis. HIV-1 Group O, sometimes referred to as the "outlier" group, like group M contains very diverse viruses, but is still relatively rarely found. It is thought to have originated in a transmission to humans from wild gorilla populations (Van Heuverswyn, Nature 444:164 (2006)). Intra-group diversification begins once transmitted virus begins to expand in the human population after each interspecies transfer event.

 

HIV-2

HIV-2 is very distinct from HIV-1. While HIV-1 is most closely related to SIVs from chimpanzees, HIV-2 is closely related to SIVs isolated from sooty mangabeys. No sooty mangabey virus with a sequence falling within the HIV-2 A, B, C, F or G clades (formerly referred to as "subtypes", now referred to as "groups") has yet been found, but within the D and E clades sooty mangabey viruses have been sequenced which are very similar to HIV-2 virus sequences. It thus appears that each group of HIV-2 represents at least one separate sooty mangabey to human transmission event.

 

SIVs

Simian immunodeficiency viruses are very diverse. Their genomic sequences are far more diverse than the genomes of the hosts which carry them. In general it seems that each simian species which is known to carry an immunodeficiency virus, carries its own clade of virus, and exceptions to this general rule are believed to provide evidence for cross-species transmission events, both in the wild, and in captivity.

Developing a biologically-relevant and human-friendly nomenclature system for the HIVs and SIVs is an ongoing process as we learn more about the viruses, and have time to inform the research community of proposed changes in nomenclature, so that old names can be replaced with new ones without too much confusion. For the time being, the combination of nomenclature and the fields used to store that nomenclature in the HIV Sequence Database are not always biologically relavent. For example, the use of the subptype field in our database to organize clades of HIV-1, HIV-2 and various SIVs does not imply that the subtypes of HIV-1 are equivalent to the groups of HIV-2, nor that the common ancestor lived in the same species in each case.

Currently, there are numerous inconsistencies in the nomenclature and storage format in our database. For example, the subtypes of African green monkey SIVs (VERVET, TANTALUS, GRIVET and SABAEUS) are so named because each was isolated from a different subspecies of African green monkey, but the chimpanzee SIVs, which also come from different subspecies of chimpanzees (Pan troglodytes troglodytes and Pan troglodytes schweinfurthii), are all listed as subtype CPZ. We maintain a list of non-human primate species from which lentiviruses have been isolated (see Overview of Subtypes of Primate Immunodeficiency Viruses).

The subtype listed for these viruses is the species in which the virus was first isolated, or begun passaging. For example, the SIV-SMM-PBj viruses have been extensivley passaged in rhesus macaques, but they were all derived from SIV-SMM-SMM9 which was first idolated from a sooty mangabey. The SIV-STM-STM virus was first isolated from a stump-tailed macaque in 1989, eight years after other stump-tailed macaques were imported to the Yerkes primate center from a California primate center in 1981. Sooty mangabeys were not housed at the Yerkes center, and a previous outbreak (in 1977) of immunodeficiency at the California center had given rise to the series of viruses that the entry with accession number X60667 was from.

last modified: Tue Apr 22 12:25 2008


Questions or comments? Contact us at seq-info@lanl.gov.