Data Dictionary

HIV Sequence Database Structure

Click to download Data Dictionary as an Excel file.

  A  B  C  D  E  F  G  H 
 1  Serial number Table name Table definition Field name Field definition Source Field type * Details
 2  1 Sequence sample(SSAM) Information about sample from which sequence originated SE id(SSAM) Integer sequence ID DB-generated Int  
 3        PAT id(SSAM) Integer patient ID DB-generated Int  
 4        Name Sequence name Curation or GB record Text  
 5        Locus name Locus name GB record Text  
 6        Isolate name Isolate name GB record Text  
 7        Clone name Clone name GB record Text  
 8        Georegion Geographical region of sample origin Generated from Country CV Africa,
Sub-Saharan Africa,
Central America,
Former USSR,
Middle East,
North America,
South America
 9        Country Sampling country Curation or GB record CV ISO 2-letter country codes
 10        Sampling city Sampling city or region Curation or GB record Text  
 11        Sampling year Sampling date (only month and year will be displayed) Curation or GB record Date  
 12        Sampling year upper Sampling year upper bound if range Curation or GB record Date  
 13        Patient Age Patient age in days at time of sampling Curation Int  
 14        Patient health Health status at time of sampling Curation CV acute infection,
 15        Organism Virus species GB record CV HIV-1,
synthetic DNA
 16        Subtype Subtype clade of virus Curation or GB record CV  
 17        Phenotype Syncytium inducing phenotype Curation CV SI,
 18        Coreceptor Co-receptor(s) used Curation Text Space separated list of co-receptors
 19        Sample tissue Sample tissue ("body part") from which virus was derived Curation or GB record CV List available in Main Search Interface (under "More sequence information")
 20        Culture method Was virus cultured before isolation? Curation CV Cultured,
 21        Molecule type Where the virus was isolated from Curation or GB record CV DNA,
 22        Drug naive Was patient treated before sample was taken? Curation Boolean Yes,
 23        Problematic Is there a problem with the sequence? Curation or DB-generated Int / CV N: Non-ACTG characters,
C: Contaminant,
H: Hypermutant,
S: Synthetic,
D: Deletion,
T: Tiny,
R: Reverse complement
 24        Viral load HIV viral load at time of sample Curation Int  
 25        CD4 count CD4 count at time of sample Curation Int  
 26        CD8 count CD8 count at time of sample Curation Int  
 27        Days from infection Number of days between time of infection and time sample
was taken
Curation Int  
 28        Days from seroconversion Number of days between patient’s seroconversion and day sample was taken Curation Int  
 29        Days from first sample Number of days between first sample and current sample Curation Int  
 30        Sequencing method Denotes if the sample was cloned or sequenced directly Curation CV Clone,
 31        Amplification strategy Denotes how the sample was amplified before sequencing Curation CV bulk,
limiting dilution PCR
 32        Fiebig stage The stage of early HIV infection Curation CV Stages described in Search Help
 33        Annotated Denotes if the record has ever been manually curated Curation Boolean True, false
 34        Days from treatment start Number of days between treatment start and sample date Curation Int  
 35        Days from treatment end Number of days between treatment end and sample date Curation Int  
 36  2 Patient(PAT) Information about patient PAT id Integer patient ID DB-generated Int  
 37        Patient code Code or name for patient in publication Curation Text  
 38        Patient sex Patient sex Curation CV M or F
 39        Risk factor probable route of infection Curation CV SB: bisexual,
PB: blood transfusion,
EX: experimental,
PH: hemophiliac,
SH: heterosexual,
SW: sex worker,
SG: homosexual,
SU: sexual undescribed,
PI: IV drug user,
SM: male sex with male,
MB: mother-baby,
NO: nosocomial,
OT: other,
NR: not recorded
 40        Infection country Infection country if different from sampling country Curation CV ISO 2-letter country codes
 41        Infection city Infection city or region if different from sampling city or region Curation Text  
 42        Infection year Infection date (only month and year displayed) Curation Date  
 43        Patient comment Comments about patient Curation Text  
 44        HLA type Any information about the patient's HLA types Curation Text  
 45        Project Project or cohort enrolled by patient Curation CV List available in Main Search Interface (under "Patient information")
 46        Patient ethnicity Ethnicity of patient Curation CV  
 47        Progression Rate of progression of the patient Curation CV EC: elite controller,
LTNP: long-term non-progressor,
SP: slow progressor,
RP: rapid progressor,
P: progressor
 48        # of patient seqs # of sequences linked to patient DB-generated Int To find patients who have more than N sequences
 49        # of patient timepoints # timepoints available from this patient DB-generated Int To find patients with longitudinal data
 50        Host species SIV host species Curation Text  
 51  3 Accession(SA)   SE id(SA) Integer sequence ID DB-generated Int  
 52        Accession GenBank Accession GB record Text  
 53        GI number GI number GB record Int  
 54        Version Version name GB record Text  
 55  4 Map Image(MI) Information about sequence co-ordinates Map image(SE id) Integer sequence ID DB-generated Int These coordinates are the HXB2 or Mac239 coordinates.
 56        MI start start position Imported Int  
 57        MI stop stop position Imported Int  
 58  5 Sequence Map(SM) Information about sequence co-ordinates SE id(SM) Integer sequence ID DB-generated Int System of internal database coordinates; the SM fields are not included in the Advanced Search
 59        SM start start position Imported Int  
 60        SM stop stop position Imported Int  
 61  6 Sequence Entry(SE) Information about a sequence obtained from GenBank SE id Integer sequence ID DB-generated Int  
 62        Sequence length Number of nucleotides GB record Int  
 63        GB comment Comment from GB GB record Text  
 64        DB comment Comment from HIV DB staff Curation Text  
 65        Sequence Actual sequence GB record Text  
 66        GB create date GB create date GB record Date  
 67        GB update date GB update date GB record Date  
 68  7 Publication Links(SPL) Information to link publication and sequence SE id(SPL) Integer sequence ID DB-generated Int  
 69        PUB id(SPL) Integer publication ID DB-generated Int  
 70        Publication number Publication number DB-generated Int  
 71  8 Publication(PUB) Information about publication that describes sequence PUB id(SPL) Integer publication ID DB-generated Int  
 72        Pubmed ID Pubmed ID of a published paper Curation or GB record Int  
 73        Title Title of the publication GB record Text  
 74        Journal Journal name GB record Text  
 75        Consortium Consortium name GB record Text  
 76  9 Person(PER) Information about authors listed on publication PER id Integer person ID DB-generated Int  
 77        Last name Last name of the author GB record Text  
 78  10 Author(AU) Information to link publication and author PUB id(AU) Integer publication ID DB-generated Int  
 79        PER id(AU) Integer person ID DB-generated Int  
 80        Author number Author number DB-generated Int  
 81  11 Sequence Entry Feature(SEF) Information about a sequence entry feature SE id(SEF) Integer sequence ID DB-generated Int  
 82        Feature type(SEF) Sequence entry feature type GB record CV  
 83        Description(SEF) Sequence entry feature description GB record Text  
 84        PUB id(SEF) Integer publication ID DB-generated Int  
 85  12 Location(LOC) Information about a feature that has a location in the sequence LOC id Integer location ID DB-generated Int  
 86        SE id(LOC) Integer sequence ID DB-generated Int  
 87        Feature type(LOC) Location feature type GB record CV  
 88        Description(LOC) Location description GB record Text  
 89  13 Sequence Feature(SF) Information about a sequence feature LOC id(SF) Integer location ID DB-generated Int  
 90        Feature type(SF) Sequence feature type GB record CV  
 91        Feature value(SF) Sequence feature value GB record Text  
 92  14 Cluster(CLU) Information about a cluster, which is a group of patients epidemiologically linked CLU id(CLU) Integer cluster ID DB-generated Int  
 93        Cluster name Name assigned to each linked cluster of patients Curation Text List available in Main Search Interface (under "Patient information", "More patient information")
 94        Cluster description Comments describing cluster Curation Text  
 95        PUB id(CLU) Integer publication ID associated with cluster DB-generated Int  
 96  15 Cluster Link(CPL) Information to link cluster to patients CLU id(CPL) Integer cluster ID DB-generated Int  
 97        PAT id(CPL) Integer patient ID DB-generated Int  
 98        Cluster transmission type Mode(s) of viral transmission among patients in cluster Curation Text SB: bisexual,
PB: blood transfusion,
EX: experimental,
PH: hemophiliac,
SH: heterosexual,
SG: homosexual,
SU: sexual undescribed,
PI: IV drug user,
SM: male sex with male,
MB: mother-baby,
NO: nosocomial,
OT: other,
NR: not recorded

* (CV = Code Value)