HIV sequence database

Advanced search interface help

This interface dynamically reads the schema of the database and generates a graphical overview of the tables and fields. You can use this overview to generate your own custom-made search interface (click here to return to Advanced Search). You can get information and examples of the content of the tables and fields by mousing over or clicking on the table names. For specific information about the contents of each field, see Regular Search Help.

Advantages of advanced search interface

Advanced search allows additional types of searches not possible on the regular interface. As one example, you could restrict your search to include only samples from non-drug-naive patients.
You can search and view data from all available fields, including multiple fields that appear under "Other Fields" in the regular interface.
You can specifically select sequences that have a "null" value (no data) in specific fields. If you want to exclude such sequences, include the "Problematic" field in your search.

Limitations of advanced search interface

The search output does not directly interface with TreeBuilder.
Options for searching for a specific genomic region are limited (as explained below).
Problematic sequences are not removed by default.

Other differences from the standard search interface

Searches are limited to 10,000 results. If your search produces more than this, it will fail. Restrict your search to produce fewer results.
Queries are case-sensitive. If you unexpectedly get 0 hits, try UPPERCASing and/or adding * to the search. Adding * will turn the search into a case-insensitive wildcard search for most fields. (The text fields "Sequence" and "Sequence comment" cannot be made case-insensitive.) The * used as a wildcard will only expand in its own location, so if you want to search for a bit of text that could be in the middle of unknown text, use *query*.
The options for doing a genomic region search are limited using this interface. For example, to retrieve all RT sequences, you need to find the HXB2 RT coordinates (see the HIV Gene Map), and then search for HXB2_start <2550 and HXB2_stop >3869. It is not possible to include fragments within that range; you would need to do separate searches for any smaller fragments you might want within the range.
With this interface you cannot do a search for sequences that have more than a certain number of nucleotides between two genome coordinates. The first and second half are possible, but not the combination, since this interface does not handle sequential searches.
When you generate your customized search interface, all fields are pre-filled with "ANY". This is the equivalent of entering "*"; ANY will display what is entered in that field (even if blank), but will not restrict the search. If you remove the word "ANY", you will restrict the search to entries that have no data in the field (see next point).
This interface will allow you to search specifically for records that have no value entered in the field. In the pulldown menu fields, select "NULL". For other fields, simply blank the box or enter "NULL". The result will be restricted to only those sequences that have no data in that field. At present, there is no simple method for restricting the search to exclude the null set (i.e., include only non-null entries).

A short description of the tables

Accession (SA) only contains accession numbers. One sequence can have multiple accession numbers that are distinguished by their order, although usually only the first is shown.
Sequence sample (SSAM) contains information about the sample the sequence was derived from (although this is not entirely consistent; some of the information in this table, such as phenotype, is sequence-, not sample-specific).
Patient (PAT) contains non-time-dependent information about the patient (age and health information, for example, are stored in Sequence Sample because they vary over time).
Map image (MI) contains the location information needed to generate the little graphic images that show the location on the sequence relative to the complete genome.
Sequence map (SM) contains the start and stop coordinates of the sequence relative to HXB2.
Sequence entry (SE) contains the sequence itself and its length.
Publication link (SPL) is a link table between the Sequence Sample and Publication tables.
Publication (PUB) contains the publication information, both any published papers about the sequence and the original submission information.
Person (PER) contains a list of authors' last names and initials. There are often duplicate entries for the same person, one with one initial, the other will all initials.
Author (AU) contains a list of authors for every publication that specifies the author order.
Sequence entry feature (SEF) contains the information from the GenBank ORGANISM, DEFINITION, SOURCE, BASE COUNT, KEYWORDS and REFERENCE lines.
Location (LOC) contains additional GenBank entry features like CDS and source.
Sequence feature (SF) contains the qualifier information. The qualifiers (/note, /isolate, /organism, /protein, etc.) are stored in the SF_featureType field, their content in the SF_featureValue field.

About the number after the field names

-1: text field
2, 4: integer field
11: date field
12: variable length string field

Links

Advanced Search Interface
Regular Search Interface
Regular Search Interface Help

last modified: Mon Oct 1 16:32 2007

Index of all tools	ADRA
Branchlength	Codon Alignment
Consensus Maker	ELF
ElimDupes	Entropy
Epilign	FindModel
Format converter	Gap strip/squeeze
Gene Cutter	HDent/HDdist
Heatmap	Hepitope
Highlighter	HIV BLAST
HIValign	Hypermutation
jpHMM at GOBICS	Mosaic Vaccine Tool Suite
Motif Scan	N-Glycosite
ODprep/ODfit	PCOORD
PeptGen	PhyloPlace
Primalign	Protein Feature Accent
Protein structure	Recombinant HIV-1 drawing tool
RIP	SeqPublish
Sequence locator	SNAP
SUDI subtyping	SynchAlign
Translate	Treemaker
External tools