| |||||||||||||||||||||||
ProSplign is a global alignment tool developed by Dr. Boris Kiryutin. It produces accurate spliced alignments and locates alignments of distantly related proteins with low similarity. ProSplign algorithm is an integral component of the NCBI's Genome Annotation Pipeline (Gnomon), which has been used to annotate critical genomes that include many different plant and animal species (such as human, mouse, cow etc.). The Pipeline was used by the Sea Urchin Genome Sequencing center for sequence analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus that was published in Science in 2006. The integration of ProSplign with the genome annotation pipeline significantly improved the quality of genome annotation over existing available methods. Due to the success of the method it was used to annotate Tribolium castaneum (Nature, 2008), Taurine Cattle (Science, 2009), Acyrthosiphon Pisum (PLoS Biology, 2010), Nasonia (Science, 2010), and many other genomes. Also ProSplign is a central part of the automatic pipeline for Influenza virus genomes, an important part of the Influenza Genome Sequencing Project. Sponsored by the National Institutes of Health, the Influenza Project is an international collaboration of critical importance for the public health. It has already led to multiple new discoveries about the recent evolution and pathogenesis of influenza, which have been published in leading journals including Journal of Virology, PLoS Biology, and Nature. For other questions, comments or bug reporting please contact NCBI help desk. | |||||||||||||||||||||||
ProSplign is a utility for computing the alignment of proteins to genomic nucleotide sequence. This alignment can include eukaryotic splicing. At the heart of the program is a global alignment algorithm that specifically accounts for introns and splice signals. It is due to this algorithm that ProSplign is accurate in determining splice sites and tolerant to sequencing errors. ProSplign uses BLAST hits to identify possible locations of genes and their duplications on genomic sequences and then to speed up the core dynamic programming. Please follow one of the links below or navigate using the menu bar at the top of this page. This web site is a single-point source of information on ProSplign, the tool for computing protein-to-genomic alignments that include an effort to account for mRNA splicing. ProSplign was developed with the following goals in mind:
ProSplign is used to compute transcript alignments as a part of the NCBI Genome Annotation Pipeline. ProSplign is available for use in a number of different ways. There is no online version of ProSplign. You must download and install the console version which is available for major platforms (and may also be available for a few platforms not listed - please request). You can also link to ProSplign from your own applications in a portable way since ProSplign is a part of the NCBI C++ Toolkit. And finally, ProSplign is available as a plugin for the NCBI Genome Workbench. Reference: ProSplign - Protein to Genomic Alignment Tool. B. Kiryutin, A. Souvorov, T. Tatusova. Manuscript in preparation | |||||||||||||||||||||||
Binaries (updated 07/09/07) Pre-built executables are available for the following platforms: If you need an executable for a platform not listed above, please contact us. Sources ProSplign is included into the NCBI C++ Toolkit. Download and unzip the Toolkit. Then, depending upon the system, go to ncbi_cxx\src\algo\align\prosplign\demo\ or ncbi_cxx/src/algo/align/prosplign/demo/. You can also browse the Toolkit's code through the source browser. Search for CProSplign C/C++ Symbol to go directly to ProSplign sources. For details on how to configure and build the Toolkit, please consult the NCBI C++ Toolkit book. | |||||||||||||||||||||||
Using the console version
| |||||||||||||||||||||||
Algorithmic detailsProSplign works with input sequences on a pairwise basis. In other words, exon/intron structures are determined independently for each query and subject. The dynamic programming alone is accurate in determining splice junctions but computationally expensive. Also, if copies of a gene share same genomic sequence and strand, direct application may produce incorrect results by connecting exons from different copies. Thus, for every input query/subject pair, it is important to localize genes on the genomic sequence which ProSplign achieves with the algorithm to compartmentize the BLAST hits. The compartmentization step starts with computing protein-to-genomic blast hits. These give initial insight into the structure of compartments. Hits are separated into two same-strand sets and then compartments are identified within each strand. To do so, we formally define the optimization problem in terms of genomic sequence coverage and then solve it with a dynamic programming algorithm whose running time is short compared to the core dynamic programming described above. | |||||||||||||||||||||||
Frequently Asked QuestionsQ: Why am I getting "Unable to locate XXX" exceptions? Q: What does 'No compartment found' log file message mean? What is compartment? | |||||||||||||||||||||||
For questions on how to build the NCBI C++ Toolkit and ProSplign, please write to For other questions, comments or bug reporting please contact NCBI help desk. | |||||||||||||||||||||||
|