RepeatMasker on Helix
Repeatmasker screens DNA sequences for repetitive elements and low complexity sequences.
A detailed annotation is produced that identifies all of the repetitive
elements in a query sequence. RepeatMasker
is commonly employed prior to searching a database because it produces
a modified version of the query sequence in which all the annotated
repeats and low complexity sequences have been masked (default: replaced
by N's). Without RepeatMasker, database searches can provide misleading
results because almost 50% of a human genomic DNA sequence consists
of repetitive or low-complexity sequences. [Repeatmasker website]
Sequence comparisons in RepeatMasker are performed by Cross_Match. Comparisons are made to curated databases of repetitive element families derived from RepBase or RepBase Update.
Features:
- Screens DNA sequences for repetitive elements including small RNA pseudogenes, Alus, LINEs, SINEs, LTR elements, and others.
- Produces a table annotating the masked sequences and a table that identifies families of repetitive elements in the query sequence.
- Mask repetitive and low-complexity sequences prior to database searches.
- Use any size query sequence.
- Helpful in designing primers or oligonucleotide probes from sequence data.
- Test for primate or rodent DNA contamination.
- Remove the sequence of an E. coli transposon or insertion elements from a DNA sequence.
- Limit masking to low complexity DNA, Alus, interspersed repeats, or non-RNA sequences.
- Set an upper limit for the level of divergence of a match in order to restrict masking to young insertion elements.
Running Repeatmasker on Helix
Repeatmasker accepts input sequences in Fasta format only. Sequences in other formats can be converted using the EMBOSS 'seqret' function, as in the example below. At the Helix prompt, type repeatmasker with no parameters to get a brief help page. repeatmasker -help will print detailed help.
Version
Type 'repeatmasker' with no parameters to see the current installed version of Repeatmasker, along with a brief help page.
Sample session: (user input in bold)
helix% emboss [...] helix% seqret Reads and writes (returns) sequences Input (gapped) sequence(s): a00006.gb_pat output sequence(s) [a00006.fasta]: helix% repeatmasker a00006.fasta RepeatMasker version open-3.1.9 Search engine: Crossmatch analyzing file a00006.fasta Checking for E. coli insertion elements identifying simple repeats in batch 1 of 1 identifying full-length ALUs in batch 1 of 1 identifying full-length interspersed repeats in batch 1 of 1 identifying remaining ALUs in batch 1 of 1 identifying most interspersed repeats in batch 1 of 1 identifying long interspersed repeats in batch 1 of 1 identifying ancient repeats in batch 1 of 1 identifying retrovirus-like sequences in batch 1 of 1 identifying tough LINE1s in batch 1 of 1 identifying more simple repeats in batch 1 of 1 identifying low complexity regions in batch 1 of 1 processing output: cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 cycle 8 cycle 9 cycle 10 masking done helix%