Scientific Supercomputing at the NIH

RepeatMasker on Helix
Repeatmasker screens DNA sequences for repetitive elements and low complexity sequences. A detailed annotation is produced that identifies all of the repetitive elements in a query sequence. RepeatMasker is commonly employed prior to searching a database because it produces a modified version of the query sequence in which all the annotated repeats and low complexity sequences have been masked (default: replaced by N's). Without RepeatMasker, database searches can provide misleading results because almost 50% of a human genomic DNA sequence consists of repetitive or low-complexity sequences. [Repeatmasker website]

Sequence comparisons in RepeatMasker are performed by Cross_Match. Comparisons are made to curated databases of repetitive element families derived from RepBase or RepBase Update.

Features:

Running Repeatmasker on Helix

Repeatmasker accepts input sequences in Fasta format only. Sequences in other formats can be converted using the EMBOSS 'seqret' function, as in the example below. At the Helix prompt, type repeatmasker with no parameters to get a brief help page. repeatmasker -help will print detailed help.

Version

Type 'repeatmasker' with no parameters to see the current installed version of Repeatmasker, along with a brief help page.

Sample session: (user input in bold)

	  
helix% emboss
[...]
helix% seqret
Reads and writes (returns) sequences
Input (gapped) sequence(s): a00006.gb_pat
output sequence(s) [a00006.fasta]: 
helix% repeatmasker  a00006.fasta 
RepeatMasker version open-3.1.9
Search engine: Crossmatch

analyzing file a00006.fasta

Checking for E. coli insertion elements
identifying simple repeats in batch 1 of 1
identifying full-length ALUs in batch 1 of 1
identifying full-length interspersed repeats in batch 1 of 1
identifying remaining ALUs in batch 1 of 1
identifying most interspersed repeats in batch 1 of 1
identifying long interspersed repeats in batch 1 of 1
identifying ancient repeats in batch 1 of 1
identifying retrovirus-like sequences in batch 1 of 1
identifying tough LINE1s in batch 1 of 1
identifying more simple repeats in batch 1 of 1
identifying low complexity regions in batch 1 of 1

processing output: 
cycle 1 
cycle 2 
cycle 3 
cycle 4 
cycle 5 
cycle 6 
cycle 7 
cycle 8 
cycle 9 
cycle 10 
masking
done

helix%

Documentation

repeatmasker.help