HIV sequence database

ElimDupes: Duplicate Sequence Removal

Purpose: Given an alignment or set of unaligned nucleotide or protein sequences, this tool compares the sequences and eliminates any duplicates or very similar suquences, thus producing a set of unique sequences.

Details: By default, the program removes all non-letter characters from the sequences, converts all letters to uppercase, and considers as a "duplicate" any sequence that is a subsequence of a longer sequence (e.g., the sequence ATG is a duplicate of the sequence CATGCC). These three default behaviors can be modified by changing the first three options shown below. In the fourth option, you can choose to restore any gaps or non-uppercase characters that were present in the input. In the fifth option, you can descide how similar among sequences would be removed. The final option gives a means of automatically analyzing your input sequences as a series of sequence groups. The results page summarizes the duplicate and unique sequence sets and allows you to view and download the resulting unique sequences file and the duplicate sequences file.

For more details, see ElimDupes Explanation.

You have javascript turned off
Please note that some tool features, form validation in particular, may not work properly.

last modified: Fri May 30 11:28 2008

Index of all tools	ADRA
Branchlength	Codon Alignment
Consensus Maker	ELF
ElimDupes	Entropy
Epilign	FindModel
Format converter	Gap strip/squeeze
Gene Cutter	HDent/HDdist
Heatmap	Hepitope
Highlighter	HIV BLAST
HIValign	Hypermutation
jpHMM at GOBICS	Mosaic Vaccine Tool Suite
Motif Scan	N-Glycosite
ODprep/ODfit	PCOORD
PeptGen	PhyloPlace
Primalign	Protein Feature Accent
Protein structure	Recombinant HIV-1 drawing tool
RIP	SeqPublish
Sequence locator	SNAP
SUDI subtyping	SynchAlign
Translate	Treemaker
External tools

Remove extraneous characters from sequences	yes no
Make all letters uppercase	yes no
Consider subsequences as duplicates	yes no
Restore original sequences in output	yes no
Eliminate sequences more similar than	%
To analyze input by groups	enter number of leading digits

Paste your sequences here [Sample Input]
or upload your file