skip to main content
Enter a query string   or TI number 
   

Searching Tips

The Trace Archive stores the raw data obtained from the sequencing machines. None of this data has been trimmed for vector sequences or quality scores.

What can be obtained from the Trace Archive

How to retrieve the data from the Archive

Type query described below into the Query Text Box located on the top of the Main and Obtaining data Trace Archive web pages and press Submit for retrieving data. If no field name is specified, it is assumed to be a TI number.

How to view the obtained data

Querying the Trace Archive will send you to a results page. The default view for this page is a listing of the all of the FASTA files that satisfy the query. You can choose alternative views using pull down menu next to the Show button.

Simultaneous FASTA format and quality score view available when box "in color" is checked.

To navigate the chromotogram use either left mouse button or click on a particular region of the small image on the top of each trace.

How to retrieve Statistics

Every request besides the data also yields number of records complying to the query.


There are three different ways to view Statistics

Statistics web pages allow to create queries for selecting traces by CENTER_NAME, SPECIES_CODE, TRACE_TYPE_CODE, or STRATEGY and different combinations of these fields.

The first view of the Detailed by Query page is statistics for Submitting centers ordered by count (decending).

The result page will have Statistics and "Link to this Result Set" for receiving the same statistics for the current data in future.

Also there is a possibility to add different conditions for the statistics. For this purpose use the Condition builder to create a correct request.

How to BLAST

BLASTable database has been developed to allow users to perform sequence based query. Currently, the BLAST databases are updated weekly. Mechanisms are being implemented to exclude contaminants from these databases. Click here to BLAST.

How to download from the Archive

How to download large data sets

The number of records which can be obtained on a single request is limited. Currently this number is set to 40,000. In order to download more records, you would need to place several requests accordingly. Although it is generally possible to download all needed data with a browser, the best approach to do this job is to use our Perl script query_tracedb. After copying this script, don't forget to make it executable.

All records in the archive are assigned a unique identifier - TI, and therefore, first, you would need to obtain all identifiers which comply to your query. Using these identifiers you can then retrieve the actual data that you need. Let's see how this works on a real example (please note that this page is static, and all the numbers shown in the example may not reflect the current status of the archive):

  1. The first step is to count all available records:
    query_tracedb "query count species_code='AEDES AEGYPTI'"
    122116
  2. A simple calculation shows that to retrieve all records we will need to make at least 4 requests, so let's obtain the identifiers. Please note that the identifiers are in network (BIG ENDIAN) format:
    query_tracedb "query page_size 40000 page_number 0 binary species_code='AEDES AEGYPTI'" > page1.bin
    query_tracedb "query page_size 40000 page_number 1 binary species_code='AEDES AEGYPTI'" > page2.bin
    ...
    query_tracedb "query page_size 40000 page_number 3 binary species_code='AEDES AEGYPTI'" > page4.bin
  3. You can now retrieve the data in the submission form (tarball). Pointer "0b" shows that following data are in binary format.
    (echo -n "retrieve_tgz all 0b"; cat page1.bin) | query_tracedb > data1.tgz
    ...
    (echo -n "retrieve_tgz all 0b"; cat page4.bin) | query_tracedb > data4.tgz
    The above will retrieve all files from the archive: fasta, quality scores, chromatograms in scf format, mate_pairs, and ancillary files.
  4. *Note: steps 2 and 3 can be done at the same time:
    (echo -n "retrieve_tgz all 0b"; query_tracedb "query page_size 40000 page_number 0 binary species_code='AEDES AEGYPTI'") | query_tracedb > data1.tgz

For more information please apply 'query_tracedb help' for available data formats, and 'query_tracedb usage' for usage examples.

If you need to save only TI numbers for future reference, you might want to obtain them in text form:

query_tracedb "query page_size 40000 page_number 0 text species_code='AEDES AEGYPTI'" > page1.txt