The Pfam database is a large collection of protein
families, each represented by multiple sequence alignments
and hidden Markov models (HMMs).
More...
Proteins are generally composed of one or more functional regions,
commonly termed domains. Different combinations of domains give
rise to the diverse range of proteins found in nature. The identification
of domains that occur within proteins can therefore provide insights
into their function.
There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A
entries are high quality, manually curated families. Although these
Pfam-A entries cover a large proportion of the sequences in the
underlying sequence database, in order to give a more comprehensive
coverage of known proteins we also generate a supplement using the
ADDA
database. These automatically generated entries are called
Pfam-B. Although of lower quality, Pfam-B families can be
useful for identifying functionally conserved regions when no Pfam-A
entries are found.
Pfam also generates higher-level groupings of related families, known as
clans. A clan is a collection of Pfam-A entries which are
related by similarity of sequence, structure or profile-HMM.