HIV sequence database

ElimDupes

Duplicate Sequence Removal

Purpose: Given an alignment or set of unaligned nucleotide or protein sequences, this tool compares the sequences and eliminates any duplicates or very similar sequences, thus producing a set of unique sequences.

Details: By default, the program removes all non-letter characters from the sequences, converts all letters to uppercase, and considers as a "duplicate" any sequence that is a subsequence of a longer sequence (e.g., the sequence ATG is a duplicate of the sequence CATGCC). These three default behaviors can be modified by changing the first three options shown below. In the fourth option, you can choose to restore any gaps or non-uppercase characters that were present in the input. In the fifth option, you can decide how similar among sequences would be removed. Checking the box of sequences are aligned will dramatically speed up the program. Otherwise, it would run in the background with results emailed to you. The final option gives a means of automatically analyzing your input sequences as a series of sequence groups. The results page summarizes the duplicate and unique sequence sets and allows you to view and download the resulting unique sequences file and the duplicate sequences file.

Note: This program needs an alignment. If your sequences are NOT aligned, please uncheck the box at the bottom of the Input block. Adding an alignment step slows the program down dramatically.

For more details, see ElimDupes Explanation.

You have javascript turned off
Please note that some tool features, form validation in particular, may not work properly.

last modified: Mon Mar 28 12:11 2011

Questions or comments? Contact us at seq-info@lanl.gov.

Index of all tools	HIV BLAST	Quality Control
ADRA	HIVAlign	QuickAlign
Branchlength	Hypermut	Rainbow Tree
Codon Alignment	jpHMM at GOBICS	Recombinant HIV-1 Drawing Tool
Consensus Maker	Mosaic Vaccine Tool Suite	RIP
ELF	Motif Scan	SeqPublish
ElimDupes	N-Glycosite	Sequence Locator
Entropy	PCOORD	SNAP
FindModel	PepMap	SUDI Subtyping
Format Converter	PeptGen	SynchAlign
Gap Strip/Squeeze	PhyloPlace	Translate
GenBank Entry Generation	PhyML	TreeMaker
Gene Cutter	Pixel	TreeRate
Heatmap	Poisson-Fitter	VESPA
Hepitope	Protein Feature Accent	External Tools
Highlighter	Protein Structure

Remove extraneous characters from sequences	yes no
Make all letters uppercase	yes no
Consider subsequences as duplicates	yes no
Restore original sequences in output	yes no
Eliminate sequences more similar than	%
To analyze input by groups	enter number of leading digits

Paste your sequences here [Sample Input]
or upload your file
Uncheck if your sequences are not aligned (this will make the program MUCH slower!)