HIV Databases HIV Databases home HIV Databases home
HIV sequence database



HIV Sequence Quality Control (QC)

Purpose: This tool has 2 objectives: (1) To examine sets of HIV-1 nucleotide sequences for quality control problems. (2) To prepare HIV-1 sequence sets, together with related data, for submission to GenBank.

Details for QC analysis: This tool will perform a set of tests that will help you easily find problems with your sequences. To run, enter a set of HIV-1 DNA sequences. The sequences should be a FastA file, aligned or unaligned, containing any number of sequences of any sequence length. Results will be returned to you by e-mail. The results will include the following analyses: subtyping (using RIP 3.0), Neighbor-joining tree of each sequence with subtype references, analysis of the number of stop codons and frameshifts (using GeneCutter), and analysis of hypermutation (using HyperMut). For the sequence set as a whole, the output also includes a Neighbor-joining tree of all sequences in the set and a complete GeneCutter alignment output. The results are given in a user-friendly table format, with links to more detail for each analysis. See Sample Output.

Details for preparing GenBank submissions: This tool can also be used to prepare sequences for GenBank submission. This step is not required if you only want to do the QC analysis. To prepare a GenBank submission, you must first run the QC tool as detailed above. From the output page you receive, there is a link labeled "Create GenBank Submission". After choosing this link, you will be taken to a form where you can fill in various data fields with information about the sequences (sampling year, clade, clinical parameters, etc.). From here, your data will be returned to you in a format ready for GenBank deposit.

Limitations: For very short sequences, some of the QC analyses may not work, but the tool can still be used to prepare the sequences and related data for GenBank entry. If your FastA file contains sequence fragments from different regions of the genome (for example, 20 gag sequences and 20 env sequences), this file will work for most of the QC analyses, but the tree intended to compare all sequences together will not be informative. In such cases, you should either run the tool separately for each genomic region (1 analysis for the gag sequences and 1 analysis for the env sequences), or alternatively run the set together here and perform separate tree analyses on your own (see Neighbor TreeMaker).

Input
Upload your sequence set
Enter your e-mail address

 

Related Links:
Sequence Quality Control Tutorial

 

 

last modified: Tue Nov 27 08:33 2007


Questions or comments? Contact us at seq-info@lanl.gov.

 
Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
Copyright © 2005-2006 LANSLLC All rights reserved | Disclaimer/Privacy

Dept of Health & Human Services Los Alamos National Institutes of Health