|
Copyright © 2008 Schleiss et al; licensee BioMed Central Ltd. Analysis of the nucleotide sequence of the guinea pig cytomegalovirus (GPCMV) genome ![]() Mark R Schleiss: schleiss/at/umn.edu; Alistair McGregor: mcgre077/at/umn.edu; K Yeon Choi: choix207/at/umn.edu; Shailesh V Date: date.shailesh/at/gene.com; Xiaohong Cui: xcui/at/vcu.edu; Michael A McVoy: mmcvoy/at/vcu.edu Received October 15, 2008; Accepted November 12, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. | |||||||
Abstract In this report we describe the genomic sequence of guinea pig cytomegalovirus (GPCMV) assembled from a tissue culture-derived bacterial artificial chromosome clone, plasmid clones of viral restriction fragments, and direct PCR sequencing of viral DNA. The GPCMV genome is 232,678 bp, excluding the terminal repeats, and has a GC content of 55%. A total of 105 open reading frames (ORFs) of > 100 amino acids with sequence and/or positional homology to other CMV ORFs were annotated. Positional and sequence homologs of human cytomegalovirus open reading frames UL23 through UL122 were identified. Homology with other cytomegaloviruses was most prominent in the central ~60% of the genome, with divergence of sequence and lack of conserved homologs at the respective genomic termini. Of interest, the GPCMV genome was found in many cases to bear stronger phylogenetic similarity to primate CMVs than to rodent CMVs. The sequence of GPCMV should facilitate vaccine and pathogenesis studies in this model of congenital CMV infection. | |||||||
Findings Guinea pig cytomegalovirus (GPCMV) serves as a useful model of congenital infection, due to the ability of the virus to cross the placenta and infect the fetus in utero [1-3]. This model is well-suited to vaccine studies for prevention of congenital cytomegalovirus (CMV) infection, a major public health problem and a high-priority area for new vaccine development [4]. However, an impediment to studies in this model has been the lack of detailed DNA sequence data. Although a number of reports have identified specific gene products or clusters of genes [5-11], to date a full genomic sequence has not been available. We recently reported the construction and preliminary sequence map of a GPCMV bacterial artificial chromosome (BAC) clone maintained in E. coli [12,13], and this clone was used as an initial template for sequence analysis of the full GPCMV genome. BAC DNA was purified using Clontech's NucleoBond® Plasmid Kits as described previously [14] and both strands were sequenced using an ABI PRISM® 377 DNA Sequencer, with primers synthesized, as needed, to 'primer-walk' the nucleotide sequence. In parallel, Hind III- and EcoR I-digested fragments were gel-purified and cloned into pUC and pBR322-based vectors as previously described [15]. Plasmid sequences were determined from overlapping Hind III and EcoR I fragments using the map coordinates originally described by Gao and Isom [16]. These sequences were compared to the BAC sequence to facilitate assembly of a full-length contiguous sequence. Since the cloning of the BAC in E. coli involved insertion of BAC origin sequences into the Hind III "N" region of the viral genome, sequence obtained from this specific restriction fragment cloned in pBR322 was utilized for assembly of the final contiguous sequence; analysis of this sequence confirmed that there were no adventitious deletions in the Hind III "N" region generated during the original BAC cloning process. Since a deletion in the Hind III "D" region occurred during cloning of the GPCMV BAC in E. coli [17], DNA sequence from a plasmid containing the full-length Hind III "D" fragment was similarly obtained, and used for assembly of the final contiguous sequence. The GPCMV genomic sequence has been deposited with GenBank (Accession Number FJ355434). Sequence analysis of GPCMV revealed a genome length of 232,678 bp with a GC content of 55%. This value is in agreement with the value of 54.1% determined previously by CsCl buoyant density centrifugation [18]. A total of 326 open reading frames (ORFs) were identified that were capable of encoding proteins of ≥ 100 amino acids (aa). For ORFs predicted by the sequence analysis that had substantial overlap with other adjacent or complementary GPCMV ORFs that appeared to encode gene products that were highly conserved in other cytomegaloviruses, only those sequences with < 60% overlap with these highly conserved ORFs were further analyzed. ORFs homologous to those encoded by other CMVs with an e-value of < 0.1 and ≥ 100 aa were identified, based on comparisons analyzed using NCBI Blast (blastall version program 2.2.16). Of the ORFs so identified, 104 had sequence and/or positional homology to one or more ORFs encoded by human (HCMV), murine (MCMV), rat (RCMV), rhesus (RhCMV), chimpanzee (CCMV), or tupaia herpesvirus (THV) cytomegaloviruses (Table 1). Of note, homologs of HCMV ORFs UL23 through UL122 were identified [19]. For ease of nomenclature, we have designated these ORFs using upper case font (GP23 through GP122). ORFs with homologs in other CMVs that do not correspond to HCMV UL23 through UL122 have been designated with a lower case "gp" prefix. Homologs of HCMV UL41a (69 aa; gp38.2), UL51 (99 aa; GP51), and UL91 (87 aa; GP91) were annotated in these initial analyses, based primarily on positional, and not sequence, homology to the respective HCMV ORFs. Three ORFs, homologs of MHC class I genes known to be encoded by multiple other CMVs (gp 147–149, Table 1) were also identified. One ORF, gp1 (homolog of CC chemokines), did not have a positional or sequence homolog when compared to other CMVs, but was included in the annotation because of its previous molecular characterization [9]. Including ORFs with mapped exons, the total number of ORFs annotated in this preliminary analysis was 105 [Table 1].
A map of the GPCMV genome illustrating the relative positions of these ORFs is shown in Fig. 1. ORFs that represent homologs of the individual exons of spliced HCMV genes, in particular UL89 (terminase) and UL112/UL113 (replication accessory protein) are annotated separately. The splice junction for the GP89 mRNA was predicted based on comparisons to other CMVs. For the UL112/113 region, further studies will be required to map the precise splicing patterns of the putative transcripts encoded by this region of the GPCMV genome. Similarly, the ORF encoding the sequence homolog of the HCMV IE transactivator, UL122, has been annotated without regard to the splicing events previously shown to take place in this region of the genome [20]; further analyses of cDNA from this and other GPCMV genome regions of IE transcription, including those encoded in the Hind III 'D' region of the genome, will likely result in annotation of multiple heretofore unidentified ORFs. A comprehensive table of all ORFs > 25 aa and their homology to other CMV genomes is provided in additional files 1 and 2. As RNA analyses are completed, the total number of annotated GPCMV ORFs will expand in number.
The schematic representation of GPCMV ORFs demonstrated in Fig. 1 highlights several gene families of particular interest. Of particular interest and importance to vaccine studies in the guinea pig model are conserved homologs of the ORFs encoding major envelope glycoproteins gB, gH/gL/gO/, and gM/gN. These glycoproteins are important determinants of humoral immune responses in the setting of CMV infection, and serve as potential subunit vaccine candidates. Of these, the gB homolog has been demonstrated to confer protection against congenital GPCMV infection in subunit vaccine studies [21-23]. Homologs of putative HCMV immune modulation genes, including G-protein coupled receptors and major histocompatibility class I homologs, were also identified [24]. Also of interest was the presence of multiple US22 gene family homologs, heavily clustered near the rightward terminus of the GPCMV genome. These ORFs predict protein products that are analogous to the MCMV dsRNA-binding proteins, M142 and M143, that have been shown to inhibit dsRNA-activated antiviral pathways [25,26]. Members of this family have also been implicated in macrophage tropism in MCMV [27]. Our sequence analysis also confirmed the findings of Liu and Biegalke [8] that the GPCMV genome does not encode a positional homolog of the antiapoptotic HCMV UL36 gene [28]. However, an ORF with homology to R36, which encodes the presumed RCMV cell death suppressor, was identified (gp29.1, Table 1). Further studies will be required to determine whether this putative gene supplies a UL36-like function. It was also of interest to note the presence of ORFs that have apparent homology to the MCMV M129-133 region. This region has positional homologs in human and primate CMVs [29-31], but is absent in THV [32]. Recently, it was determined that passage of GPCMV in cultured fibroblasts promotes the deletion of a ~1.6-kb locus containing potential positional homologs of this gene cluster. The presence of this 1.6 kb locus was found by Inoue and colleagues to be associated with an enhanced pathogenesis of GPCMV in vivo [33]. We independently confirmed the presence of this locus and its sequence in our salivary gland-derived viral stocks, and have included this sequence in our GenBank annotation (Accession Number FJ355434). Further studies will be required to fully annotate the transcripts encoded by this region of the GPCMV genome. Interestingly, the original GPCMV BAC clone that we sequenced was derived using GPCMV viral DNA obtained after long-term tissue culture passage of ATCC 2122 viral stock, and not surprisingly this BAC was found to lack the 1.6 kb virulence locus [12]. Subsequently, PCR and preliminary sequencing of a more recently obtained GPCMV BAC clone with an excisable origin of replication [17] revealed that the 1.6-kb sequence was retained in this clone. The apparent modifications of this locus that occur following viral passage on fibroblast cells are reminiscent of the mutations and deletions that occurred during fibroblast-passage of HCMV [34] and rhesus CMV [35]. The congruence of these events suggests that the selective pressures that promote mutational inactivation of genes in this region may be similar across viral species. Additional analyses, including sequencing of a full-length GPCMV genome derived from replicating virus in vivo, will be required to determine what other deletions or mutations are present in genomes from tissue culture-passaged viruses. Since additional ORFs are likely to be identified by these analyses, we have annotated the first ORF identified in the BAC sequence to the right of this 1.6 kb region as gp138 (Fig. 1), to allow for ease of nomenclature as ORFs in this virulence locus are better characterized. Application of other genome sequence analysis methods, including identification of small or overlapping genes and further assessment of mRNA splicing or unconventional translation signals, will likely result in identification of other putative ORFs in future studies [36]. Comparisons of GPCMV ORFs with sequences from other CMV genomes yielded interesting results. ORF translations were compared with all proteins from the 6 sequenced CMV genomes (HCMV, MCMV, RCMV, RhCMV, THV, and CCMV), and hits with e-values less than 1e-5 were aligned individually for each protein, using both ClustalW (version 1.82; [37]) and Muscle (version 3.6; [38]). The alignments were then used to generate trees based on neighbor-joining using JalView. Clustal trees for glycoproteins B (GP55) and N (GP73) are shown in Fig. 2, with distance scores indicated. Overall, comparison of the various glycoproteins (gB, gM, gH, and gO) yielded similar phylogenies, with GPCMV glycoproteins generally appearing closer to primate CMVs than rodent CMVs [39], except for the gN homolog, which appears closer to rodents. ClustalW and Muscle comparisons of GPCMV ORFs with homologous ORFs from the other sequenced CMVs are provided in additional file 3.
In summary, the complete DNA sequence of GPCMV was determined, using a combination of sequencing of BAC DNA, viral DNA, and cloned Hind III and EcoRI fragments. These analyses identified both conserved ORFs found in all mammalian CMVs, as well as the presence of novel genes apparently unique to the GPCMV. These similarities underscore the usefulness of the guinea pig model, with positive translational implications for development and testing of CMV intervention strategies in humans. Further characterization of the GPCMV genome should facilitate ongoing vaccine and pathogenesis studies in this uniquely useful small animal model of congenital CMV infection. | |||||||
Competing interests The authors declare that they have no competing interest. SVD is an employee of Genentech Corporation. | |||||||
Authors' contributions MRS cloned viral fragments, performed sequence analysis, analyzed the data and prepared the communication. AM and XC cloned the GPCMV BACs. AM cloned individual genes for sequence analysis. AM, XC and KYC, performed sequence analysis, participated in data analysis, and helped in preparation of the communication. MAM cloned viral DNA fragments, performed sequence analysis, participated in BAC cloning, and aided in preparation of the communication. SVD performed comparative genomic analyses and comparisons and aided in the preparation of the communication. | |||||||
Additional file 1 ORFs of ≥ 25 aa (tab A). 50 aa (tab B), or 100 aa (tab C) with Blast analysis against other sequenced CMV genomes; e-value cutoff of 0.1. Click here for file (671K) Additional file 2 ORFs of ≥ 25 aa (tab A). 50 aa (tab B), or 100 aa (tab C) with Blast analysis against other sequenced CMV genomes; e-value cutoff of 1e-5. Click here for file (459K) Additional file 3 Phylogenetic trees for glycoproteins gB, gH, gO, gL, gM and gN, IRS 1–3 family, and GP116 (functional homolog of UL119; Fc receptor/immunoglobulin binding domains). Alignments generated using both ClustalW and Muscle, as described in the text. Click here for file (149K) | |||||||
Acknowledgements Grant support was provided from NIH HD044864-01 and HD38416-01 (to MRS) and R01AI46668 (to MAM). The authors acknowledge helpful discussions and input from Becket Feierbach (Genentech, Inc.). The authors also acknowledge the technical contributions of Yonggen Song and the gift of the Hind III "D" plasmid from HC Isom, Penn State University. | |||||||
References
| |||||||