HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Gap Strip/Squeeze Alignments

Purpose: This tool takes a nucleotide alignment and deletes columns that contain an "intolerable" number of gaps. You set the gap tolerance to any value between 0% and 100%. A value of 0% will cause columns to be deleted if they contain only a single gap, (called "gap stripping") while a value of 100% will delete only columns that are entirely gaps ("gap squeezing"). See examples of usage.

How to use: Upload your alignment file in the space provided. The program will automatically identify any standard format. Each sequence must have an associated name, so you cannot submit raw sequence files. Also specify the gap character if it is not a dash (-). If you want to specify more than one gap character, enter the characters as a list with no breaks of any kind between the characters. Finally, indicate the desired gap tolerance and specify the options.

Alternative tools: We provide 3 mini-tools that perform 3 separate functions, shown here. See examples of usage.

Gapstreeze Tool: The form below can be used to perform these 3 functions separately or in combination. See examples of usage. The program should automatically identify any standard format. If you are submitting raw sequences, make sure each sequence is on one line and is separated from the next sequence by a carriage return. Also specify the gap character if it is not a dash (-). If you want to specify more than one gap character, enter the characters as a list with no breaks or punctuation of any kind between the characters. You can also specify ordinary letters to be gaps. This is useful if, for example, you were interested in removing all columns containing IUPAC ambiguity codes (e.g., R and Y) from your alignment, thereby preserving only columns with ATGCU. Next adjust the Tolerance value. Its default is set so only columns that are entirely gaps will be removed. If you select the "Show deleted columns" box, your output will include the first sequence in your alignment with marks showing columns that were deleted in the stripped alignment that follows. The "Preserve deleted columns" button, when selected, will replace columns in your alignment by "#" symbols, so that on the results page you can preview which columns were removed from the alignment. On the results page there is an option to squeeze out those columns. Finally, if the sequences in your alignment are codon-aligned nucleotides, you can choose to remove columns in groups of three if any one of the three columns that comprise the codon exceeds the gap-tolerance value. This is done by checking the "codons" box and specifying the reading frame of your alignment.

Input
Paste your alignment here

[Sample Input]
Or upload your alignment:

Options

Gap character(s)
Gap tolerance%
Show deleted gaps
Preserve codons
(delete gaps in triplets)
Reading frame of alignment

last modified: Mon Apr 14 11:08 2008


Questions or comments? Contact us at seq-info@lanl.gov.