Skip navigation links
 
NIGMS Home | Site Map | Staff Search

Center for Eukaryotic Structural Genomics (CESG)


PI:  John Markely, Ph.D., University of Wisconsin-Madison

Characterization of ORF Expression via DNA Chip Technology

CESG has carried out a complete analysis of the transcriptome from Arabidopsis thaliana T87 suspension cells (akin to plant stem cells). The objective was to test whether a high-throughput method could be developed for comparing experimental results on mRNA levels with an available genome sequence. We used reverse transcriptase to create ssDNA complementary to the mRNA isolated from T87 cells, and this was used as template for labeling with standard PCR-based poly dT/biotin labeling procedures prior to hybridization to the DNA chips. For the chips, we custom designed three high-density oligonucleotide arrays and prepared these with a maskless array synthesizer (technology developed at UW-Madison in the Sussman/Cerrina laboratories and commercialized by NimbleGen Systems, Inc., Madison, WI). (1) The first array consisted of six 60mers complementary to different segments of the 3¢ end of each gene, together with six 60mer mismatch oligos to correct for background. (2) The second array consisted of ten 24mers complementary to different segments of the 3¢ end of each gene, together with ten 24mer mismatch oligos to correct for background. (3) The third array was a whole genome-tiling chip containing one 37mer complementary to a portion of each 60 basepair segment across the entire 120-megabase A. thaliana genome. Several types of stringency were chosen for hybridization, and the results were compared to the dataset of CESG’s PCR trials from this cDNA pool: 1,200 total, with 700 successful and 500 unsuccessful. The results showed that DNA chips could provide an excellent way of deciding which genes can be extracted from a cDNA pool by PCR. In addition, tandem mass spectrometry was used to identify proteins present at high levels in the suspension cells, and these results showed a good correlation between mRNA levels and protein concentration in the plant cells. Our experience with the Arabidopsis tiling arrays now provides an excellent platform for quantifying eukaryotic mRNA in other multicellular organisms. Such studies should help streamline cloning efforts and assist in determining whether the computer annotated intron/exon boundaries are fact or fiction, prior to isolation of cDNA. A manuscript describing this work is in preparation.

Information Management and Process Pipelining

Users interact with CESG’s laboratory management system (‘Sesame’) through a series of web-based Java applet-applications designed to organize data generated by projects in structural genomics, structural biology, and shared laboratory resources. Sesame allows collaborators on a given project to enter, process, view, and extract relevant data, regardless of location, so long as web access is available. Sesame is a multi-tier system, with data that reside in an Oracle relational database. Sesame serves as a digital laboratory notebook and allows users to attach numerous files and images. Sesame can launch computations that either utilize local computers or distributed computer clusters. The system has the capability of printing and reading barcodes relevant to various parts of the pipeline. Sesame can create reports and output data as XML files. Full details about Sesame can be found at http://www.sesame.wisc.edu/. Sesame generates CESG’s weekly contributions to the TargetDB and is capable of generating the full set of data to be deposited in the new PEPCdb. Sesame modules currently deployed (for molecular interaction screening) and under development with separate funding (for mass spectrometry proteomics, small molecule screening, and metabolomics) promise to enable seamless integration of structural genomics data with a larger domain of genome driven biology.  Sesame is now in use by the Center for Structural Genomics of Pathogenic Protozoa, the National Magnetic Resonance Facility at Madison, the University of Wisconsin-Madison Molecular Interactions Facility, BioMagResBank, and is under consideration by other structural genomics centers and facilities.

Head-to-Head Comparison of Protein Production from E. coli Cells and Wheat Germ Cell-Free Extracts

CESG has carried out a detailed comparison of its protein production pipelines based on E. coli cells and on wheat germ cell-free translation.  The E. coli cells portion of the experiment made use of CESG's standard maltose binding fusion containing a (His)6-tag and a TEV protease cleavage site. All targets found to be produced as a soluble, cleaved product were prepared with uniform [15N]-labeling. The cell-free part of the experiment compared two constructs for each of the 96 targets: one with a non-cleavable N-terminal (His)6-tag, and one with a cleavable (PreScissionTM Protease, Amersham Biosciences) N-terminal GST-tag. These constructs were first screened on a small scale (50-mL) to determine the level of protein produced and its solubility. Targets that produced soluble protein were then produced as [U-15N]-proteins in larger scale (4 to 12 mL) cell-free translation reaction mixtures that contained 15N-labeled amino acids. The proteins produced by each method were analyzed by 1H-15N correlation NMR spectroscopy for folding, aggregation state, and stability. This project, which was completed in August 2004, will provide a rich source of information. We are only beginning to mine the results to learn what they tell us about these different approaches.  When the success rates of individual steps, supplies, and labor are taken into account, the costs for making labeled proteins for NMR structure determinations by the two platforms (E. coli cells and wheat germ cell-free) are equivalent.  The potential advantage of the cell-free approach is that nearly twice as many of the protein targets in this study prepared by cell-free than by E. coli cells yielded samples suitable for NMR structure determination. However, when the E. coli cell-free pipeline works, the yields of labeled proteins are higher.

Integrated Robotic Crystallization and Optimization

CESG has developed a tightly coupled system for initial screening and optimization of crystallization conditions that utilizes a uniform set of stock solutions and methods for robotic handling.  Outcomes of initial crystallization screening experiments are currently recorded in CrystalScore™ databases, and in the Well module of the Sesame LIMS system.  The ‘Well’ module increasingly serves as the hub of a data- and robot-rich environment.  Well is used to describe all crystallization optimization experiments.   Additionally, it writes CrystalScore template files describing crystallization conditions for the CrystalScore imaging system.   Well also automatically generates control files for operation of tasks on the Tecan Genesis and C-250 robotic crystallization systems using our established, versatile and extensible rack, screen, droplet and plate table framework.  Future releases of the Well module will enable mass-import of images and scores from the CrystalScore and CrystalFarm™ imaging systems, closing the loop in flow of information from screen descriptions, to generating physical screens, to associating those screens with images, and finally returning scores to well.  The result is a system that will reduce inconsistencies between initial screening and larger crystal development and should increase both the success rate and efficiency of developing crystals suitable for structure determination.

NMR Spectroscopy

Sample preparation and labeling. As stimulated by a cooperative agreement with Cell-Free Sciences (Yokohama, Japan) and Ehemi University (Matsuyama, Japan), CESG’s wheat germ cell-free protein production pipeline is quickly becoming the default method for screening targeted ORFs coding for proteins <20 kDa for protein production and solubility, for preparing [15N]-labeled proteins to determine their suitability for NMR structural studies, and for preparing [13C,15N]-labeled proteins for structure determinations. Five structures have been solved with proteins made by this approach. CESG is evaluating the replacement of manual screening by automated screening with a Cell-Free Sciences GeneDecoder 1000™ robot and is routinely using a Cell-Free Sciences Protemist™ robot for large-scale protein production. These two robotic systems are the first of their kind installed outside Japan. CESG has prepared its first protein labeled with ‘SAIL’ (stable isotope array isotope labeled) amino acids. The SAIL approach, which requires cell-free protein production, promises to speed up the determination of high-quality structures of smaller proteins (those <20 kDa) and to enable high-throughput determination of structures of protein up to 35 kDa.

Software. In collaboration with the National Magnetic Resonance Facility at Madison, CESG has developed a software package (‘PISTACHIO’) for the automated assignment of NMR spectral data (with a probabilistic analysis of the results for both backbone and side chain atoms). From recent CESG peak lists previously analyzed by hand, the software assigned signals from 94-100% of the backbone residues with >95% confidence and with 90-100% agreement with hand assignments; side-chain assignments agreed 86-95% with hand assignments. A second software package (‘PECANS’) determines the secondary structure of the protein based on the assigned chemical shifts and protein sequence at an accuracy of 90% for structured regions (as determined from a database of 800 proteins with NMR assignments and known three-dimensional structures). PECANS and PISTACHIO are available for use from the CESG web site.

NMR data collection. Also in collaboration with the National Magnetic Resonance Facility at Madison, CESG is developing a novel strategy (‘HiFi-NMR’) for speeding up the collection and analysis of NMR data sets needed for NMR assignments and secondary structure determinations. By determining peak positions as part of the data collection of each NMR experiment, the approach is designed to interface directly with PISTACHIO and PECANS.

This page last updated November 19, 2008