Advanced search interface help
This interface dynamically reads the schema of the database and generates a
graphical overview of the tables and fields. You can use
this overview to generate your own custom-made search
interface (click here to return to Advanced Search). You can get information and examples of the content of the tables and
fields by mousing over or clicking on the table names. For specific information about the contents of each field,
see Regular Search Help.
Advantages of advanced search interface
- Advanced search allows additional types of searches not possible on the regular interface. As one example,
you could restrict your search to include only samples from non-drug-naive patients.
- You can search and view data from all available fields, including multiple fields that appear under "Other Fields" in the regular interface.
- You can specifically select sequences that have a "null" value (no data) in specific fields. If you want to exclude such sequences, include the "Problematic" field in your search.
Limitations of advanced search interface
- The search output does not directly interface with
TreeBuilder.
- Options for searching for a specific genomic region are
limited (as explained below).
- Problematic sequences are not removed by default.
Other differences from the standard search interface
-
Searches are limited to 10,000 results. If your search produces more than this, it will fail. Restrict your search to produce fewer results.
-
Queries are case-sensitive. If you unexpectedly get 0 hits, try
UPPERCASing and/or adding * to the search. Adding * will turn the search
into a case-insensitive wildcard search for most fields. (The text fields "Sequence" and "Sequence comment"
cannot be made case-insensitive.) The * used as a wildcard will only expand in its own
location, so if you want to search for a bit of text that could be in the middle of unknown
text, use *query*.
-
The options for doing a genomic region search are limited using this interface.
For example, to retrieve all RT sequences, you need to find the HXB2 RT coordinates
(see the HIV Gene Map), and then search for HXB2_start
<2550 and HXB2_stop >3869. It is not possible to include fragments
within that range; you would need to do separate searches for any smaller fragments you might want within the range.
-
With this interface you cannot do a search for sequences that have more
than a certain number of nucleotides between two genome
coordinates. The first and second half are possible, but not the combination,
since this interface does not handle sequential searches.
-
When you generate your customized search interface, all fields are pre-filled with "ANY". This is the equivalent
of entering "*"; ANY will display what is entered in that field (even if blank), but will not restrict
the search. If you remove the word "ANY", you will restrict the search to entries that have no data in the field (see next point).
-
This interface will allow you to search specifically for records that have no value entered in the field.
In the pulldown menu fields, select "NULL".
For other fields, simply blank the box or enter "NULL".
The result will be restricted to only those sequences that have no data in that field. At present, there is no simple method for restricting the search to exclude the null set (i.e., include only non-null entries).
A short description of the tables
-
Accession (SA) only contains accession numbers. One sequence can
have multiple accession numbers that are distinguished by their order,
although usually only the first is shown.
-
Sequence sample (SSAM) contains information about the sample the sequence
was derived from (although this is not entirely consistent; some of the
information in this table, such as phenotype, is sequence-, not
sample-specific).
-
Patient (PAT) contains non-time-dependent information about the
patient (age and health information, for example, are stored in Sequence Sample because they vary
over time).
-
Map image (MI) contains the location information needed to
generate the little graphic images that show the location on the sequence
relative to the complete genome.
-
Sequence map (SM) contains the start and stop coordinates of the
sequence relative to HXB2.
-
Sequence entry (SE) contains the sequence itself and its length.
-
Publication link (SPL) is a link table between the Sequence Sample and
Publication tables.
-
Publication (PUB) contains the publication information, both any
published papers about the sequence and the original submission information.
-
Person (PER) contains a list of authors' last names and initials.
There are often duplicate entries for the same person, one with one
initial, the other will all initials.
-
Author (AU) contains a list of authors for every publication that
specifies the author order.
-
Sequence entry feature (SEF) contains the information from the
GenBank ORGANISM, DEFINITION, SOURCE, BASE COUNT, KEYWORDS and REFERENCE
lines.
-
Location (LOC) contains additional GenBank entry features like
CDS and source.
-
Sequence feature (SF) contains the qualifier information. The
qualifiers (/note, /isolate, /organism, /protein, etc.) are stored in
the SF_featureType field, their content in the SF_featureValue field.
About the number after the field names
-
-1: text field
-
2, 4: integer field
-
11: date field
-
12: variable length string field
Links
Advanced Search Interface
Regular Search Interface
Regular Search Interface Help
last modified: Mon Oct 1 16:32 2007