Genome Informatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


108. Sensitive Detection of Distant Protein Relationships Using Hidden Markov Model Alignment 

Xiaobing Shi and David J. States 
Washington University in St. Louis, St. Louis, Missouri 
states@ibc.wustl.edu 

Hidden Markov models are statistical models of the primary structure of a sequence family. In this poster, an algorithm to align hidden Markov models (HMMs) of protein sequences is presented along with the software implementation. Aligning HMMs provides a way to compare sequence families. Compared to pair-wise sequence alignment, HMM alignment is more sensitive to identify relationships between sequence families and requires less computation. Our algorithm uses dynamic programming to identify similarities between two HMMs. Two scoring algorithms are used: the local alignment algorithm, which identify the most similar segments from two HMMs, and the "glocal" alignment algorithm, which aligns the entire length of one HMM to a similar segment of the other model. 

We have developed software to perform the alignment and set up a website allowing users to perform the alignment on the internet. Besides allowing users to input or upload HMMs, the website can build HMMs form user-inputted raw sequences or multiple alignments. All HMMs in the Pfam database are also available for aligning on that website. We also provided a method to generate and then align two random HMMs, the score of which can be used to determine the significance of a HMM alignment score. 

We have used this software to align all pairs of HMMs in the Pfam database, and the result has revealed some interesting relationships between existing protein families that have not previously been recognized. For example, the high HMM local alignment score of the Sodium:solute symporter family (SSF) and the Amino acid permease family suggests that these two families are closely related. Other examples include the Tropomyosin family and the Filament family, the GerE family and the sigma70 family.


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
Order a copy Informatics Infrastructure