Currently defined subtypes are:
Subtypes and sub-subtypes of the HIV-1 M group are thought to have diverged in humans, following a single chimpanzee-to-human transmission event. Circulating Recombinant Forms represent recombinant HIV-1 genomes that have infected three or more persons who are not epidemiologically related, so they can be assumed to have an epidemiologically relevant contribution to the HIV-1 M group epidemic. The nomenclature for HIV-1 subtypes and CRFs was revised in the fall of 1999; the results of that revision were published here: HIV-1 Nomenclature Proposal 1999. We maintain a set of HIV-1 Subtype Reference Alignments; these are sets of complete genomes of all "reference strains" of the HIV-1 viruses, with representatives of each subtype and CRF. These sets are updated yearly.
Subtypes within the HIV-1 O group are not yet clearly defined. The diversity of sequences within the HIV-1 O group is nearly as great as the diversity of sequences in the HIV-1 M group, but a phylogenetic analysis of the gag and env genes does not reveal clades of virus as clearly as the clades detected in the HIV-1 M group.
Subtypes within the HIV-1 N group are not yet clearly defined. Very few isolates have been identified and sequenced.
As of September 2001, what were formerly known as the subtypes of the HIV-2 viruses are now known as groups. This was decided upon by the HIV Nomenclature Committee because sequences from these viral clades are nearly as distant from one another as are sequences from the M, N and O groups of the HIV-1 virus, and also because both the sequence diversity and the epidemiology of HIV-2 viruses suggest that each clade of virus was the result of a separate Sooty mangabey-to-human transmission event. For groups D and E at least, the strains of HIV-2 found within a geographic region are documented to be more similar to SIV-SMMs from that region, than they are to HIV-2 from other regions or groups.
Lentiviruses have now been isolated from many different non-human primate species, all with natural ranges on the African continent. New world primates and Asian primates have not been found to be naturally infected with lentiviruses. Because only a few viral isolates and sequences have been obtained for each non-human primate, the "species" of lentivirus is currently stored in the HIV database in the "subtype" field.
M group Subtypes
The HIV-1 M group subtypes are phylogenetically associated groups or clades of HIV-1 sequences, and are labeled A1, A2, B, C, D, F1, F2, G, H, J and K. The sequences within any one subtype or sub-subtype are more similar to each other than to sequences from other subtypes throughout their genomes. These subtypes represent different lineages of HIV, and have some geographical associations. Although there are many ambiguities in the subtyping system, it describes genetic clustering patterns and provides a useful system for organizing viruses by genetic similarity. Since the subtypes were originally defined based only on fragments of the HIV-1 genome, in some cases no intact prototype sequence was available (the former subtypes E and I, for example, which are both now defined as circulating recombinant forms). In these situations there was scant information to differentiate between parental and recombined forms of the virus, and the nomenclature was controversial. Additionally, viruses within one subtype may be evolving at different rates, and there may also be differences in rates between different subtypes, especially in some limited regions of the genome. As new sequences and better analysis tools become available, some of these ambiguities may be resolved, and the subtyping nomenclature system is itself evolving as we learn more. We attempt to keep our database up to date with the HIV research community consensus as we perceive it; therefore the nomenclature in the database is not static. Each year we gather a set of subtype reference sequences that are considered to be representative of complete (or near complete) genomes of all of the subtypes and circulating recombinant forms of the HIV-1 M group, and isolates from the HIV-1 N and O groups. Subtype Reference Alignments of each gene and complete genomes of these subtype references sequences are also available.
We have created a web-accessible program called SUDI for testing whether a newly sequenced, non-recombinant genome fits the criteria of a "new subtype". This tool is designed to be used after phylogenetic analyses of subgenomic regions, and other methods such as bootscanning or RIP have been used to determine that the genome is equidistant from currently defined subtypes over its entire length, i.e. it is not a recombinant of existing subtypes or CRFs.
M Group Recombinants and Circulating Recombinant Forms
All retroviruses have a propensity to recombine with other relatively closely related retroviruses, and HIVs and SIVs are no exception. The viral genome is packaged as two copies of ss-RNA (not to be confused with ds-RNA) and if a given cell is infected with two different viral genomes (from the same or differnet strains) the odds are good that some virions package one copy from each of those two viruses. If the two strains belong to different subtypes of the HIV-1 M group, the result can be a mosaic genome composed of regions from each of the two subtypes, due to the fact that the viral reverse transcriptase engages in "template switching", or hopping from one of the packaged genomes to the other, during reverse transcription, after the co-packaged genomes enter a new cell.
Inter-subtype recombinant genomes are common, but many of them are found only in the single dually-infected (or multiply-infected) individual patient in which they arose. If an inter-subtype recombinant virus is transmitted from one patient to others, and becomes one of the circulating strains in the HIV epidemic, it can be classified as a "circulating recombinant form (CRF)".
The circulating recombinant forms are labeled with numbers rather than letters, and numbered in the order in which they were first adequately described in the peer-reviewed literature.
How the HIV database classifies sequences
The classification and naming of sequences, CRFs, and recombinants is fairly complicated. We have a separate page to explain how the HIV Database Classifies Subtypes.
HIV-1 groups M, N and O, as well as chimpanzee and gorilla SIVs, are all part of same SIVcpzPtt radiation within the primate lentviruses. Group M is the "main" group of viruses in the HIV-1 global pandemic, and it contains multiple subtypes and recombinant forms described above, and is thought to have originated as a cross-species transmission from chimpanzee to humans. Group N is a very distinctive form of the virus that has only been identified in a few individuals in Cameroon. N is sometime referred to as Not-M, Not-O, also sometimes as the "new" group, and is also thought to have originated in a chimpanzee zoonosis. HIV-1 Group O, sometimes referred to as the "outlier" group, like group M contains very diverse viruses, but is still relatively rarely found. It is thought to have originated in a transmission to humans from wild gorilla populations (Van Heuverswyn, Nature 444:164 (2006)). Intra-group diversification begins once transmitted virus begins to expand in the human population after each interspecies transfer event.
HIV-2 is very distinct from HIV-1. While HIV-1 is most closely related to SIVs from chimpanzees, HIV-2 is closely related to SIVs isolated from sooty mangabeys. No sooty mangabey virus with a sequence falling within the HIV-2 A, B, C, F or G clades (formerly referred to as "subtypes", now referred to as "groups") has yet been found, but within the D and E clades sooty mangabey viruses have been sequenced which are very similar to HIV-2 virus sequences. It thus appears that each group of HIV-2 represents at least one separate sooty mangabey to human transmission event.
Simian immunodeficiency viruses are very diverse. Their genomic sequences are far more diverse than the genomes of the hosts which carry them. In general it seems that each simian species which is known to carry an immunodeficiency virus, carries its own clade of virus, and exceptions to this general rule are believed to provide evidence for cross-species transmission events, both in the wild, and in captivity.
Developing a biologically-relevant and human-friendly nomenclature system for the HIVs and SIVs is an ongoing process as we learn more about the viruses, and have time to inform the research community of proposed changes in nomenclature, so that old names can be replaced with new ones without too much confusion. For the time being, the combination of nomenclature and the fields used to store that nomenclature in the HIV Sequence Database are not always biologically relavent. For example, the use of the subptype field in our database to organize clades of HIV-1, HIV-2 and various SIVs does not imply that the subtypes of HIV-1 are equivalent to the groups of HIV-2, nor that the common ancestor lived in the same species in each case.
Currently, there are numerous inconsistencies in the nomenclature and storage format in our database. For example, the subtypes of African green monkey SIVs (VERVET, TANTALUS, GRIVET and SABAEUS) are so named because each was isolated from a different subspecies of African green monkey, but the chimpanzee SIVs, which also come from different subspecies of chimpanzees (Pan troglodytes troglodytes and Pan troglodytes schweinfurthii), are all listed as subtype CPZ. We maintain a list of non-human primate species from which lentiviruses have been isolated (see Overview of Subtypes of Primate Immunodeficiency Viruses).
The chimpanzee sequences are currently grouped in one subtype (CPZ), but they come from at least two different subspecies of chimpanzees. The subspecies of chimpanzees are Pan troglodytes troglodytes, Pan troglodytes schweinfurthii, Pan troglodytes verus, Pan troglodytes vellerosus and Pan paniscus (pygmy chimp). The SIV-CPZ-US, SIV-CPZ-CAM3, SIV-CPZ-CAM5 and SIV-CPZ-GAB genomes are all derived from Pan troglodytes troglodytes. The SIV-CPZ-ANT genome is from a Pan troglodytes schweinfurthii. The paper that described the sequencing of the SIV-CPZ-US genome has a good discussion of the subspecies of chimps, their viruses and their geographic ranges. In addition, papers published in 2006 describing SIV from gorillas and SIV from Cameroonian chimpanzees provided yet more evidence of the relationships of the HIV-1 M, N and O groups to primate lentiviruses.
The SIV-AGMs are subdivided into subtypes, based on the subspecies of African green monkey: GRIVET (Cercopithecus aethiops aethiops) VERVET (Cercopithecus aethiops pygerythrus) TANTALUS (Cercopithecus aethiops tantalus) and SABAEUS (Cercopithecus aethiops sabaeus). The BABOON subtype viruses were isolated from wild-caught chacma baboons (Papio ursinus) and wild-caught yellow baboons (Papio hamadryas cynocephalus) infected in the wild (South Africa and Tanzania respectively) with SIVs which cluster with the vervet African green monkey SIVs.
The SIVs isolated from sooty mangabeys and macaque species are all known to be derived from sooty mangabey viruses, because wild macaques have been extensively studied and found to be seronegative. The macaques have become infected via contact with sooty mangabeys in captivity, mostly in the primate centers in the USA. There are currently three major lineages of these viruses with sequences in the database: 1) SIV-MAC-251 and viruses known to be derived from SIV-MAC-251, SIV-MAC142 and SIV-MNE-MNE. 2) SIV-SMM-SMM9 and viruses derived from SIV-SMM-SMM9 (notably the PBj series), SIV-SMM-F236 and SIV-SMM-PGM. 3) SIV-STM-STM. It is clear from the intermingling of sequences from sooty mangabeys and macaques, that several cross-species transmissions took place relatively recently, at least in the latter half of the 20th century.
The subtype listed for these viruses is the species in which the virus was first isolated, or begun passaging. For example, the SIV-SMM-PBj viruses have been extensivley passaged in rhesus macaques, but they were all derived from SIV-SMM-SMM9 which was first idolated from a sooty mangabey. The SIV-STM-STM virus was first isolated from a stump-tailed macaque in 1989, eight years after other stump-tailed macaques were imported to the Yerkes primate center from a California primate center in 1981. Sooty mangabeys were not housed at the Yerkes center, and a previous outbreak (in 1977) of immunodeficiency at the California center had given rise to the series of viruses that the entry with accession number X60667 was from.