pmc logo imageJournal ListSearchpmc logo image
Logo of prosciProtein ScienceCSHL PressJournal HomeSubscriptionseTOC AlertsThe Protein Society
Protein Sci. 2001 September; 10(9): 1881–1886.
PMCID: PMC2253204
Circularly permuted proteins in the protein structure database
Jongsun Jung and Byungkook Lee
Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892-4255, USA
Reprint requests to: Dr. Byungkook Lee, Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bldg. 37, Room 4B15, 37 Convent Drive MSC 4255, Bethesda, MD 20892-4255, USA; e-mail: bk/at/nih.gov ; fax:(301) 402-1344.
Received February 7, 2001; Revised June 14, 2001; Accepted June 14, 2001.
Abstract
Some proteins are homologous to others after their sequence is circularly permuted. A few such proteins have been recognized, mainly by sequence comparison, but also by comparing their three-dimensional structures. Here we report the result of a systematic search for all protein pairs in the SCOP 90% id domain database that become structurally superimposable when the sequence of one of the pairs is circularly permuted. Using a reasonable set of criteria, we find that 47% of all protein domains are superimposable to at least one other protein domain in the database after their sequence is circularly permuted. Many of these are symmetric proteins, which superimpose to another protein both with and without a circular permutation of the sequence. However, 412 of the total 3035 domains are nonsymmetric, and these become structurally superimposable to another protein only after a circular permutation of the sequence. These include most known and many previously undetected circularly permuted proteins with remote homology.
Keywords: Circular permutation, protein structure, structure alignment, gene duplication
 
Proteins have been circularly permuted artificially to study folding and stability of the protein or to move the N or C terminus to another position in the protein structure in protein engineering contexts (Heinemann and Hahn 1995a; Baird et al. 1999; McWherter et al. 1999; Nakamura and Iwakura 1999; Iwakura et al. 2000). Circularly permuted proteins occur also in nature. Lindqvist and Schneider (1997) reviewed some eight naturally circularly permuted proteins that were known by 1997, but at least six more (Garcia-Vallve et al. 1998; Murzin 1998; Castillo et al. 1999; Jeltsch 1999; Polekhina et al. 1999; Jung and Lee 2000) have been reported since then. Circularly permuted proteins can arise from a posttranslational modification (Carrington et al. 1985; Bowles et al. 1986), but a majority probably arose from gene duplication (Luger et al. 1989; Ponting and Russell 1995; Jeltsch 1999) or exon shuffling (Doolittle 1987; Gilbert 1987) events. Natural circularly permuted proteins occur in a variety of organisms, including viruses, bacteria, plants, and higher animals. They are mostly β-sheet and α/β proteins, but saposins (Ponting and Russell 1995; Liepinsh et al. 1997) are α-helical proteins. In most known cases, the N and C termini are close to each other (Thornton and Sibanda 1983), but we have found in this work many examples wherein the two termini are not close together. Detecting repeated sequence segments and circularly permuted proteins from a sequence database has been reported recently (Marcotte et al. 1999; Uliel et al. 1999). Here we report the results of a systematic search for protein pairs that have similar structures, but the structural alignment of which requires circular permutation of one of the sequences.
Results and Discussion

There are more than 10,000 entries in the protein structure databank (Berman et al. 2000), which consist of more than 16,000 domains according to the manual SCOP domain parsing result (Murzin et al. 1995). We selected 3035 protein domains from the SCOP domain database, version 1.41, that were at least 40 residues long and had 90% or less sequence identity between any pair of them. Attempts were made to structurally align all pairs of these domains both with and without circularly permuting one of the sequences. Two structures are said to be structurally related when they are sufficiently similar that the structural alignment produces a sufficiently large number of aligned pairs of residues (see Materials and Methods). Of the 9.2 million (3035 × 3035) possible pairs, 136,975 pairs met the criteria for a structural relation when neither sequence was permuted (unpermuted alignment), and 48,016 pairs met the criteria when one of the two sequences was circularly permuted (permuted alignment). The pairs in the latter set are said to be CP related.

The automatic procedure found most known CP relations, including those between plant lectins (Cunningham et al. 1979), bacterial glucanases (Heinemann and Hahn 1995b), (β/α)8 barrel proteins (Sergeev and Lee 1994; Jia et al. 1996; Macgregor et al. 1996; Garcia-Vallve et al. 1998), the C2 domain proteins (Nalefski and Falke 1996), ferredoxins (Jung and Lee 2000), flavin-binding β-barrel domains (Murzin 1998), the six-stranded double-ξ β-barrels (Castillo et al. 1999), and the DNA and other methyltransferases (Jeltsch 1999). Some new examples of CP-related protein pairs are shown in Figure 1 [triangle]. When a protein has a symmetric structure, it aligns to itself and to other structurally similar proteins both with and without circular permutation of its sequence. One can use this property to identify symmetric structures. Therefore, we operationally define a protein to be symmetric if it is related to another protein both with and without circular permutation and if the two alignments are judged to be distinct (see Materials and Methods). One feature that can be noted from the structures shown in Figure 1 [triangle] is that the N and C termini are far apart in many of the structures. The proximity of the N and C termini are not a prerequisite condition for circular permutation.

Fig. 1.Fig. 1.Fig. 1.
Molscript images of some symmetric (A) and nonsymmetric (B) circularly permuted protein pairs. Each structure is made of two parts, colored red and blue. For each pair, similarly colored parts match structurally, red to red and blue to blue. The red part (more ...)

Individual structural relations are shown in Figure 2 [triangle]. The number of relations between proteins that belong to the same or different fold, superfamily and family, according to the SCOP classification, are shown in Figure 3 [triangle]. The unpermuted relations (blue and green dots in Fig. 2 [triangle]) are mostly between proteins in the same superfamilies (Fig. 3 [triangle]), indicating that our criteria for structural similarity roughly match the criteria used for the manual SCOP superfamily classification. Many relations do connect different classes (blue dots outside of the boxes in Fig. 2 [triangle]), but most of these involve protein domains that are small α-helical pieces or small α + β motifs, which resemble a part of many larger proteins. Most of the symmetric CP relations (green dots in Fig. 2 [triangle]) occur within the same SCOP folds and superfamilies (Fig. 3 [triangle]), but many nonsymmetric CP relations (red dots in Fig. 2 [triangle]) connect proteins in different superfamilies and folds (Fig. 3 [triangle]).

Fig. 2.Fig. 2.
Unpermuted and circularly permuted structural relations among the 3035 protein domains. The x- and y-axes represent the proteins sorted according to the SCOP classification number. The chain of seven boxes along the diagonal indicates the seven classes—α, (more ...)
Fig. 3.Fig. 3.
Number of unpermuted, CP, symmetric CP, and nonsymmetric CP structural relations. The dotted and solid gray areas indicate the number of relations in which the related pair belongs, respectively, to the same and different class, fold, superfamily, or (more ...)

The number of proteins that bear a relation with another protein is listed in Table 1. Also listed are the number of families, superfamilies, folds, and classes, as defined by SCOP, which these proteins represent. Obviously the precise numbers given in the table depend on the criteria used to judge structural similarity (see Materials and Methods). The fact that structural similarity depends on an ultimately arbitrary choice of a cutoff value is somewhat unsatisfactory. However, the situation is similar in the case of the detection of sequence homology, where a similarly arbitrary cutoff value for the e-score is commonly used. The z-score that we used in this work and the e-score are closely related, being precisely interconvertible when the score distribution is Gaussian for random matches. We made numerous spot checks by visual inspection of superimposed structures and confirmed to our satisfaction that in all cases we concur with the judgment made by the automatic procedure concerning the structural similarity or the lack thereof.

Table 1.Table 1.
Distribution of structurally related protein domains in different structural typesa

It can be seen from Table 1 that 47% (1433 of 3035) of the protein domains have a CP relation with at least one other known protein domain and that such proteins are not restricted to a few special folds; circularly permuted proteins occur in all structural classes and in about half (226 of 446) of all known folds. In the SCOP classification, more than one-third of the protein domains belong to the 15 largest folds (1068 out of 3035). There is at least one circularly permuted protein in each of these 15 folds and, on average, 44% of the proteins are permuted in a given fold. It has long been recognized that many multidomain proteins were generated by different combinations of a small number of domains (Patthy 1993). The finding that a large number of protein domains have circular permutation relations with other protein domains indicates that individual domains themselves are also made from a combination of smaller units.

Some 71% of the circularly permuted proteins (1025 of 1433) have symmetric structures. The number of symmetric proteins detected here is therefore 34% of the total number of proteins. These structures might have arisen from ancient gene duplication events (Lang et al. 2000). Marcotte et al. (1999) reported that duplicated gene segments occur in 14% of all protein sequences and more than 20% of all eukaryotic proteins. These must reflect relatively recent gene duplication events because they were detected by sequence homology. In the case of the symmetric structural domains detected here, the sequence homology is generally low; only 91 of the 34,581 symmetric circularly permuted pairs have >30% sequence identity between them. If the symmetry has indeed arisen from gene duplication events, therefore, most of them must be ancient events. Alternatively, one cannot rule out the possibility that at least some of these structures arose without a gene duplication event (convergent evolution).

Materials and methods

Finding circularly permuted alignment
A protein sequence was circularly permuted by deciding on a cut position and then renumbering the residues starting from the carboxy side of the cut position forward to the C terminus of the protein and then continuing to the N terminus and finishing at the amino side of the cut position. The cut position was initially chosen to be the middle of the sequence (Fig. 4 [triangle]). The structure of the permuted protein was then aligned to another protein, the sequence of which is not permuted, using the recently described structure–structure alignment program SHEBA (Jung and Lee 2000). This structural alignment procedure preserves connectivity so that two structures that are identical except for the numbering of the residues are considered distinct. A new cut position was then determined from the structural alignment. Let na be the number of residues that are matched in the first half (the half that contains the original C terminus) and nb the number of residues that are matched in the second half of the permuted protein. The new cut position is chosen to be next to the last residue matched in the second half if na > nb or chosen just before the first residue matched in the first half if nanb. Circular permutation using this new cut position increases the number of matched residues in the structural superposition.
Fig. 4.Fig. 4.
Circularly permuted structural alignment procedure. N and C indicate the original N and C termini. (a) The two protein sequences to be aligned are shown as parallel arrows, not yet aligned. The second sequence will be permuted. The short vertical line (more ...)

Criteria for a structural relation
A structural alignment between two proteins, a and b, gives the match score mab, which is the fraction of matched residues in protein a. For each protein a, the mean match score ma of the random distribution was computed by averaging mab over all b proteins that are structurally unrelated (those with mab < 40%). The root-mean-square deviation σa of mab about ma was also computed. The match scores were then converted to z-score zab, which was defined as (mabma)/σa. For the straight structural alignment, a pair of proteins were considered to be structurally related when zab was >5.0. This z-score cutoff value is the same as that used previously for clustering protein structures into groups of similar structures (Jung and Lee 2000). This particular value was chosen primarily because the number of multimember clusters reached a plateau of maximum value at this cutoff value. Two proteins were considered to be related by circular permutation (CP related) if za`b is >5.0, where a` is the permuted protein, and if the number of matched residues of the C- and N-terminal parts of the permuted protein were both >10% of the total number of matched residues for the protein pair.

Criteria for distinct alignment
Two alignments were judged to be distinct if the mean alignment shift per residue, Δr (Jung and Lee 2000), was greater than 5 positions between the two alignments.

Acknowledgments

This study used the high-performance computational capabilities of the Biowulf Cluster at the Center for Information Technology, National Institutes of Health.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

Notes
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1101/
References
  • Baird, G.S., Zacharias, D.A., and Tsien, R.Y. 1999. Circular permutation and receptor insertion within green fluorescent proteins. Proc. Natl. Acad. Sci. USA 96: 11241–11246. [PubMed].
  • Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235–242. [PubMed].
  • Bowles, D.J., Marcus, S.E., Pappin, D.J., Findlay, J.B., Eliopoulos, E., Maycox, P.R., and Burgess, J. 1986. Posttranslational processing of concanavalin A precursors in jackbean cotyledons. J. Cell Biol. 102: 1284–1297. [PubMed].
  • Carrington, D.M., Auffret, A., and Hanke, D.E. 1985. Polypeptide ligation occurs during post-translational modification of concanavalin A. Nature 313: 64–67. [PubMed].
  • Castillo, R.M., Mizuguchi, K., Dhanaraj, V., Albert, A., Blundell, T.L., and Murzin, A.G. 1999. A six-stranded double-ψ β barrel is shared by several protein superfamilies. Structure Fold Des. 7: 227–236. [PubMed].
  • Cunningham, B.A., Hemperly, J.J., Hopp, T.H., and Edelman, G.E. 1979. Favin versus concanavalin A: Circularly permuted amino acid sequences. Proc. Natl. Acad. Sci. USA 76: 3218–3222. [PubMed].
  • Doolittle, W.F. 1987. What introns have to tell us: Hierarchy in genome evolution. Cold Spring Harbor Symp. Quant. Biol. 52: 907–913. [PubMed].
  • Garcia-Vallve, S., Rojas, A., Palau, J., and Romeu, A. 1998. Circular permutants in β-glucosidases (family 3) within a predicted double-domain topology that includes a (β/α)8-barrel. Proteins 31: 214–223. [PubMed].
  • Gilbert, W. 1987. The exon theory of genes. Cold Spring Harbor Symp. Quant. Biol. 52: 901–905. [PubMed].
  • Heinemann, U. and Hahn, M. 1995a. Circular permutation of polypeptide chains: Implications for protein folding and stability. Prog. Biophys. Mol. Biol. 64: 121–143. [PubMed].
  • ———. 1995b. Circular permutations of protein sequence: Not so rare? Trends Biochem. Sci. 20: 349–350. [PubMed].
  • Iwakura, M., Nakamura, T., Yamane, C., and Maki, K. 2000. Systematic circular permutation of an entire protein reveals essential folding elements. Nat. Struct. Biol. 7: 580–585. [PubMed].
  • Jeltsch, A. 1999. Circular permutations in the molecular evolution of DNA methyltransferases. J. Mol. Evol. 49: 161–164. [PubMed].
  • Jia, J., Huang, W., Schorken, U., Sahm, H., Sprenger, G.A., Lindqvist, Y., and Schneider, G. 1996. Crystal structure of transaldolase B from Escherichia coli suggests a circular permutation of the α/β barrel within the class I aldolase family. Structure 4: 715–724. [PubMed].
  • Jung, J. and Lee, B. 2000. Protein structure alignment using environmental profiles. Protein Eng. 13: 535–543. [PubMed].
  • Lang, D., Thoma, R., Henn-Sax, M., Sterner, R., and Wilmanns, M. 2000. Structural evidence for evolution of the β/α barrel scaffold by gene duplication and fusion. Science 289: 1546–1550. [PubMed].
  • Liepinsh, E., Andersson, M., Ruysschaert, J.M., and Otting, G. 1997. Saposin fold revealed by the NMR structure of NK-lysin. Nat. Struct. Biol. 4: 793–795. [PubMed].
  • Lindqvist, Y. and Schneider, G. 1997. Circular permutations of natural protein sequences: Structural evidence. Curr. Opin. Struct. Biol. 7: 422–427. [PubMed].
  • Luger, K., Hommel, U., Herold, M., Hofsteenge, J., and Kirschner, K. 1989. Correct folding of circularly permuted variants of a βα-barrel enzyme in vivo. Science 243: 206–210. [PubMed].
  • Macgregor, E.A., Jespersen, H.M., and Svensson, B. 1996. A circularly permuted α-amylase-type α/β-barrel structure in glucan-synthesizing glucosyltransferases. FEBS Lett. 378: 263–266. [PubMed].
  • Marcotte, E.M., Pellegrini, M., Yeates, T.O., and Eisenberg, D. 1999. A census of protein repeats. J. Mol. Biol. 293: 151–160. [PubMed].
  • McWherter, C.A., Feng, Y., Zurfluh, L.L., Klein, B.K., Baganoff, M.P., Polazzi, J.O., Hood, W.F., Paik, K., Abegg, A.L., Grabbe, E.S., et al. 1999. Circular permutation of the granulocyte colony-stimulating factor receptor agonist domain of myelopoietin. Biochemistry 38: 4564–4571. [PubMed].
  • Murzin, A.G. 1998. Probable circular permutation in the flavin-binding domain. Nat. Struct. Biol. 5: 101. [PubMed].
  • Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequence and structures. J. Mol. Biol. 247: 536–540. [PubMed].
  • Nakamura, T. and Iwakura, M. 1999. Circular permutation analysis as a method for distinction of functional elements in the M20 loop of Escherichia coli dihydrofolate. J. Biol. Chem. 274: 19041–19047. [PubMed].
  • Nalefski, E.A. and Falke, J.J. 1996. The C2 domain calcium-binding motif: Structural and functional diversity. Protein Sci. 5: 2375–2390. [PubMed].
  • Patthy, L. 1993. Modular design of proteases of coagulation, fibrinolysis, and complement activation: Implications for protein engineering and structure–function studies. Methods Enzymol. 222: 10–21. [PubMed].
  • Polekhina, G., Board, P.G., Gali, R.R., Rossjohn, J., and Parker, M.W. 1999. Molecular basis of glutathione synthetase deficiency and a rare gene permutation event. EMBO J. 18: 3204–3213. [PubMed].
  • Ponting, C.P. and Russell, R.B. 1995. Swaposins: Circular permutations within genes encoding saposin homologues. Trends Biochem. Sci. 20: 179–180. [PubMed].
  • Sergeev, Y. and Lee, B. 1994. Alignment of β-barrels in (β/α)8 proteins using hydrogen bonding pattern. J. Mol. Biol. 244: 168–182. [PubMed].
  • Thornton, J.M. and Sibanda, B.L. 1983. Amino and carboxy-terminal regions in globular proteins. J. Mol. Biol. 167: 443–460. [PubMed].
  • Uliel, S., Fliess, A., Amir, A., and Unger, R. 1999. A simple algorithm for detecting circular permutations in proteins. Bioinformatics 15: 930–936. [PubMed].