Searching Tips
The Trace Archive stores the raw data obtained from the sequencing machines. None of this data has been trimmed for vector sequences or quality scores.
What can be obtained from the Trace Archive
- Traces
- Basecalls (in FASTA format)
- Quality Scores
- Ancillary Data
- Statistics
How to retrieve the data from the Archive
Type query described below into the Query Text Box located on the top of the Main and Obtaining data Trace Archive web pages and press Submit for retrieving data. If no field name is specified, it is assumed to be a TI number.
- Query by Ti
-
All traces deposited in this archive obtain a Ti number. These numbers can
be used to query the database, either in a list 1,2,3through specifying a range1-10or in a combination of both1,2,5-10Such requests may also have different formatTI in (1-5,10,11)This format allows complex requests:TI in (1-10000) and CENTER_NAME='WIBR'
-
All traces deposited in this archive obtain a Ti number. These numbers can
be used to query the database, either in a list
- Query by Field
- Traces with specific characteristics can be queried upon using the Searchable Fields.
- For quick request building use the new Query Builder, activating by mouse click on "use Query Builder".
- The value NULL may also be used for comparison. It represents unfilled data
- For example, all traces from a particular
genome center can be retrieved using the following query:
CENTER_NAME='WIBR'
- More complex queries:
CENTER_NAME='WIBR' and SPECIES_CODE='MUS MUSCULUS'
How to view the obtained data
Querying the Trace Archive will send you to a results page. The default view for this page is a listing of the all of the FASTA files that satisfy the query. You can choose alternative views using pull down menu next to the Show button.
- View the FASTA File (choose FASTA) - DEFAULT view
- View the Quality Scores (choose Quality)
- View the Chromatogram (choose Trace)
- View the Ancillary Information (chose Info)
Simultaneous FASTA format and quality score view available when box "in color" is checked.
To navigate the chromotogram use either left mouse button or click on a particular region of the small image on the top of each trace.
How to retrieve Statistics
Every request besides the data also yields number of records complying to the query.
There are three different ways to view Statistics
- Graph: allows visualize the growth of receiving data.
- Detailed by Query: allows to create your own statistical query.
- Reports: provides cumulative trace accumulation.
Statistics web pages allow to create queries for selecting traces by CENTER_NAME, SPECIES_CODE, TRACE_TYPE_CODE, or STRATEGY and different combinations of these fields.
The first view of the Detailed by Query page is statistics for Submitting centers ordered by count (decending).
- To change column name use the pull down menu.
- To change sorting order, to add or remove columns use buttons on the right.
- To retrieve new statistics use the refresh button under the Request section.
The result page will have Statistics and "Link to this Result Set" for receiving the same statistics for the current data in future.
Also there is a possibility to add different conditions for the statistics. For this purpose use the Condition builder to create a correct request.
How to BLAST
BLASTable database has been developed to allow users to perform sequence based query. Currently, the BLAST databases are updated weekly. Mechanisms are being implemented to exclude contaminants from these databases. Click here to BLAST.
How to download from the Archive
-
The following data can be included into the Tar file and downloaded to your local computer
by using the "Save" button (number of traces has to be less then 40,000).
This could be large data file,
so please be practical in your selections.
- FASTA File
- Quality Score File
- SCF/RCF file – this is the actual trace
- Info-XML – ancillary information in XML format
- Mate Pair information
Choose all of these files, or eliminate the ones you do not want. In addition, it should be noted that special software is need to view the actual trace file on your local machine. The package is now available for download from the public ftp site as Java Applet It consists of a ready-to-use compiled java application, and the actual sources of the viewer. In order to run the standalone java application you would need the java engine of version 1.8 or higher to be accessible from your computer. Then all you have to do is pick your a TI of interest and supply it as a parameter to the application:
java -jar trace.jar trace=TI - Retrieving data from FTP
- FTP site has ready to download FASTA, Quality scores(in FASTA format), Clip values, Ancillary information(in xml format) files grouped by species.
- For automated bulk downloads you can use the query_tracedb Perl script, which provides almost all of the functions available in the "trace.cgi" application.
How to download large data sets
The number of records which can be obtained on a single request is limited. Currently this number is set to 40,000. In order to download more records, you would need to place several requests accordingly. Although it is generally possible to download all needed data with a browser, the best approach to do this job is to use our Perl script query_tracedb. After copying this script, don't forget to make it executable.
All records in the archive are assigned a unique identifier - TI, and therefore, first, you would need to obtain all identifiers which comply to your query. Using these identifiers you can then retrieve the actual data that you need. Let's see how this works on a real example (please note that this page is static, and all the numbers shown in the example may not reflect the current status of the archive):
- The first step is to count all available records:
query_tracedb "query count species_code='AEDES AEGYPTI'"122116
- A simple calculation shows that to retrieve all records we will need to make
at least 4 requests, so let's obtain the identifiers. Please note that the identifiers are in
network (BIG ENDIAN) format:
query_tracedb "query page_size 40000 page_number 0 binary species_code='AEDES AEGYPTI'" > page1.binquery_tracedb "query page_size 40000 page_number 1 binary species_code='AEDES AEGYPTI'" > page2.bin...query_tracedb "query page_size 40000 page_number 3 binary species_code='AEDES AEGYPTI'" > page4.bin
- You can now retrieve the data in the submission form (tarball). Pointer "0b" shows that following data are in binary format.
(echo -n "retrieve_tgz all 0b"; cat page1.bin) | query_tracedb > data1.tgz...(echo -n "retrieve_tgz all 0b"; cat page4.bin) | query_tracedb > data4.tgzThe above will retrieve all files from the archive: fasta, quality scores, chromatograms in scf format, mate_pairs, and ancillary files.
- *Note: steps 2 and 3 can be done at the same time:
(echo -n "retrieve_tgz all 0b"; query_tracedb "query page_size 40000 page_number 0 binary species_code='AEDES AEGYPTI'") | query_tracedb > data1.tgz
For more information please apply 'query_tracedb help' for available data formats, and 'query_tracedb usage' for usage examples.
If you need to save only TI numbers for future reference, you might want to obtain them in text form: