CGAP       the Cancer Genome Anatomy Project
Skip Main Navigation
    CGAP HOW TO GenesChromosomesTissuesSAGERNAiPathwaysTools  
SAGE

Human SAGE Genie Tools

Mouse SAGE Genie Tools

Digital Karyotyping

Related Links

Quick Links:

NCI Logo


Extract SAGE Tags From Sequence Files

What the SAGE Tag Extraction Tool Can Do

The tag extraction tool allows you to extract 10-bp or 17-bp SAGE tags from sequence files that you upload on this page. You may request that linker-similar tags be removed from the results; for this option you may use your own list of linker-similar tags or use default lists. The tag extraction tool will return to you the list of extracted tags as well as a report on the process. The extraction tool also allows you to extract 10-bp tags from a list of 17-bp tags by taking the first 10 base pairs of each 17-bp tag and then collating results.

1. Extract Tags From Sequence Files

1. Prepare a compressed file containing your all sequences in fasta format; each sequence file must have the extension '.seq' before compression. The only compression formats that are accepted are (1) Winzip zip file produced on Windows, and (2) .zip, .gz files produced on Unix/Linux systems. Note that it is not necessary to have a separate file for each fasta sequence; it is possible to have a single '.seq' file containing multiple fasta sequences (or multiple '.seq' files each containing multiple fasta sequences). If you are submitting multiple '.seq' files from a Unix/Linux machine, first use tar to create a single file, which can then be compressed (xxx.tar.zip or xxx.tar.gz). We only process Window's Winzip's zip file and UNIX tar.zip, tar.gz.
2. Enter the name of the compressed file containing your sequence file(s) or use the "Browse" button to locate the file in a local directory.

3. Chose following one options: Specify your own linker-similar sequences, or specify the default linker-similar sequences, or don't exclude any linker-similar sequences. The default linker-similar lists contain every tag that is a one-bp substition, insertion, or deletion variant of TCCCTATTAA and TCCCCGTACA (short SAGE), or TCGGACGTACATCGTTA and TCGGATATTAAGCCTAG (long SAGE).

Don't exclude linker-similar sequences
Use default:     default long linker-similar list     default short linker-similar list
Use your own list (uncompressed). Enter the name of the linker file or use the "Browse" button to locate the linker file in a local directory
 
4. Select the extracted tag length:


5. Enter trim sequence length:


6. Enter maximum ditag length:


7. Enter minimum ditag length:


8. Click "Extract Tags" button:



2. Extract Short Tags From Long Tags

1. Prepare a file containing long tags with their frequencies. Each line in the file must have one tag and its numeric frequency, seaprated by a TAB. Don't compress the file.
2. Enter the name of the file containing your list of long tags and frequencies, or use the "Browse" button to locate the file in a local directory.

3. Click "Extract Tags" button:


If you have any questions, comments, or need information about CGAP, please contact the NCI CGAP Help Desk.