PubMed Entrez BLAST OMIM Taxonomy Structure

Spidey Home

 

1. How do I run multiple mRNAs against a single genomic sequence?
2. Why didn't I get any results?
3. What does the 'divergent sequences' checkbox do?
4. What happens when I pick "large intron sizes"?
5. Why is the organism that the genomic sequence is from important?
6. What do the minimum percent identity and minimum length boxes do?
7. How is Spidey different from Blast2Sequences?
8. Does Spidey look at annotation in Genbank to help arrive at a gene model?
9. What does the moltype warning mean?
10. Where did the name 'Spidey' come from? Who did the logo?

If you have a question that you don't see answered here, or if you have a comment or otherwise want to contact the author (Sarah Wheelan), please send an e-mail to the Help Desk.

How do I run multiple mRNAs against a single genomic sequence?

The entry in the mRNA textbox can be a single FASTA sequence, a single gi or accession, a list of FASTA sequences, or a list of gis or accessions. When multiple mRNAs are aligned to a single genomic sequence, the results page will start with a list of the mRNAs that were run so that you can easily jump down to see the results. You cannot mix FASTA sequences with gis or accessions.

When you have multiple mRNAs, you can elect to produce a multiple alignment of those mRNAs against the genomic sequence. If the mRNAs overlap in any region, the multiple alignment will appear at the bottom of the results page. If the mRNAs do not overlap, you will see an error message.

Why didn't I get any results?

There are a few reasons why you might not get any results. First, double-check the accessions/gis you entered to make sure that they are supposed to align. You may even want to use BLAST 2 Sequences to ensure that your sequences are related. Next, make sure that your mRNA sequences do not consist solely of repeats, which are masked out for the alignment. You can use the RepeatMasker web server to check your sequences. Then, make sure that the percent identity and percent length aligning cutoffs are set at zero, so that you will see all alignments that Spidey can produce, no matter how poor the quality. Last, if you are attempting to do an interspecies alignment (mRNA and genomic sequences from different species), make sure that the divergent sequences checkbox is checked.

There are a few non-biological reasons why you did not get any results. The program may have crashed, hit its CPU limits, or otherwise unexpectedly quit. If you have continuing problems with blank results pages, first try using a Spidey executable, available here, and if that fails, please contact the author.

What does the 'divergent sequences' checkbox do?

If you try to align sequences from two different species (or finished mRNAs to draft sequence) using the standard parameters, usually you will get many, many short exons. To force the alignments to be longer and to have more gaps and mismatches (as you will see in interspecies comparisons), the BLAST parameters need to be changed. The mismatch penalty is lowered, and the gap opening and extension penalties are lowered. This encourages BLAST to merge nearby alignments into a single exon; we find that with these parameters, the gene models look much more realistic.

What happens when I pick "large intron sizes"?

Spidey tries to keep its models compact, so it has a maximum allowed intron size (two, actually -- a smaller size for internal introns, and a larger size for the first and last introns). In some cases, though, the intron size of the correct model is larger than Spidey's maximum intron size, so Spidey cannot arrive at the correct model. If you know that the introns are especially large (longer than 35kb for internal introns, 100kb for terminal introns), then you may want to try the large intron mode.

Why is the organism that the genomic sequence is from important?

Since the genomic sequence is the one that has the splice sites, Spidey needs to know what organism the genomic sequence is from so that it can use the correct splice site matrices. Spidey has splice site matrices for vertebrates, Drosophila, C. elegans, and plants.

What do the minimum percent identity and minimum length boxes do?

The minimum percent identity and mRNA length coverage are cutoffs that Spidey uses to evaluate the final alignments. If the percent identity of the final alignment is lower than the specified cutoff, or if the percent of the mRNA's length that is covered by the alignment is lower than the percent length chosen, the final alignment will not be reported.

How is Spidey different from Blast2Sequences?

Spidey is specialized for doing spliced alignments. After an initial BLAST search, Spidey sorts through the hits to find a set of alignments that are consistent and that cover the mRNA pretty well. Then, Spidey tries its hardest to align any pieces of the mRNA that aren't yet aligned -- to do this, it uses low-stringency BLAST as well as another type of local alignment program. Finally, Spidey uses splice matrices to adjust the boundaries of the exon alignments so that they do not overlap or underlap and so that they are adjacent to good splice sites.

Does Spidey look at annotation in Genbank to help arrive at a gene model?

Short answer: no. Longer answer: The strength of the evidence for intron and exon features annotated in Genbank varies widely. Since many of these have been found by programs, not through experimental work, it doesn't make sense to use these annotations because then Spidey would just pick up all the quirks of other alignment programs. Also, since Spidey was designed to take FASTA sequences as input, it has to be able to align sequences blindly anyway.

What does the moltype warning mean?

If you input two accessions or gis, the sequences are retrieved from Genbank, and their annotations can be examined. Spidey looks to see whether both sequences are annotated as the same sequence type, for example, both genomic or mRNA sequences. Since Spidey is designed for mRNA-to-genomic alignments, it may give strange-looking output for genomic-to-genomic alignments or for alignments of other combinations of molecule types. The warning simply indicates to the user that he or she is trying to do something that Spidey was not meant to do.

Where did the name 'Spidey' come from? Who did the logo?

Everyone wants to know this one . . . I have to admit that the choice was almost an accident. I had written some functions and needed to save the file, so I had to call it something. I was listening to Moxy Fruvous' song 'Spiderman' on my headphones, and 'Spidey' just seemed right somehow, so I figured 'Spidey' would be a good temporary name until I could think up something catchier. After a while I liked 'Spidey' too much to give it up. I like to think of the program as a spider, picking its way around a web of sticky and non-sticky strands and being agile and smart enough not to get stuck (most of the time, anyway). The spider logo was done by my husband, Brian Greenlee, who is a graphic designer. I suppose the spider is also a tribute to one of my favorite musicians, John Entwistle, bassist for The Who and author of '5:15', 'My Wife', and, of course, 'Boris the Spider'.

Thanks to Bhanu Rajput and Kim Pruitt for their terrific comments and suggestions and for their insightful questions.

 

 

Spidey executable FAQ Download Source Privacy statement Disclaimer