Skip navigation links
 
NIGMS Home | Site Map | Staff Search

The Midwest Center for Structural Genomics


PI:  Andrzej Joachimiak, Ph.D., Argonne National Laboratory


Better Tools and Better Knowledge for Structural Genomics

Express Primer Application

 A web-based tool to design primers specifically for the generation of expression clones for both lab scale and high-throughput projects has been developed. (http://tools.bio.anl.gov/bioJAVA/jsp/ExpressPrimerTool/) The Express Primer Tool is a web-based application that was designed to allow the user complete flexibility to specify primer design parameters while minimizing the amount of manual intervention needed to generate a large number of primers for simultaneous amplification of multiple target genes. The web-based approach provides a multi-user capability that enables researchers to access the program by logging onto a web site running on a single server rather than each individual downloading and installing the application locally.  The Express Primer Tool is used by MCSG and many outside users.

High-Throughput Gene Cloning and Expression for Structural Genomics

We have developed automated methods using microwell plates that enable the production of large numbers of expression clones and ultimately purified proteins.  This approach uses LIC cloning technology that employs multiple hosts and vectors to enhance the probability for obtaining an expression construct producing soluble protein.  The strategy is designed to identify the maximum number of targets suitable for bacterial expression before resorting to more time consuming and cost intensive systems. It is used for HTP cloning at the MCSG and similar approach was adopted at CESG and SGPP consortia.

High-Throughput Domain Based Cloning Methods

We have developed a high throughput domain-based cloning and expression approach for high molecular weight proteins and putative soluble domains of membrane proteins.  Our production pipeline incorporate novel informatics tools to enable primer design for secretory/periplasmic proteins and a tool to facilitate primer design for domain permutations (http://tools.bio.anl.gov/bioJAVA/jsp/ExpressPrimerTool/).  The bioinformatics tools are compatible with plate-based cloning and expression pipeline operating at the MCSG.  This approach has been validated by the application of plate-based methods to over 200 secretory and membrane protein targets from B. subtilis.  The approach is being used at the MCSG.

A Set of New LIC Vectors Designed Specifically for Purification of Proteins for SG.

We developed an efficient LIC vector tailored specifically for the restraints of crystallography. The vector, pMCSG7, encodes an N-terminal his-tag followed by a spacer and a TEV proteases site that overlaps the LIC site. This design puts the TEV site as close as possible to start of the cloned protein, and results in only the innocuous amino acid sequence SNA- being appended to target proteins after protease treatment. The vector was used routinely, and has generated over 150 structures deposited in the PDB. We have also constructed a series of pMCSG7 derivatives that append helper peptides or proteins, such as MBP, to the leader sequence of expressed targets. Having transferred these derivatives into a vector with a different origin of replication to allow coexpression of target proteins in a single host, and have made four additional vectors to improve tandem purification of complexes, to aid in robotic screening protocols, or to enhance robotic protein purification.  New LIC vectors have been distributed to over 40 laboratories in US, Canada and Europe.

LIC Vectors and High-Throughput Cloning Strategies for Membrane Proteins

We have developed an efficient LIC vector for the cloning of membrane protein genes for expression in Rhodobacter.  The expression vector is based on the broad-host-range vector pRK404, and utilizes the oxygen-inducible puf promoter.  We have recently moved membrane protein gene cloning into high-throughput mode by utilizing the LIC vector with plate-based automated methods for generation of compatible overhangs, annealing, transformation, clone selection, and transfer to a host strain that can mobilize plasmids for transfer during conjugation.  Expression from membrane protein constructs generated in this fashion is indistinguishable from those produced via traditional ligation methodologies.  Fusions encoded by the LIC vector facilitate semi-automated purification of membrane proteins via affinity chromatography, thus enabling their rapid, reproducible, and cost-effective production. The approach is used at the MCSG.

Novel Approaches for the High-Throughput Production of Cultures Expressing Recombinant Proteins

We have reduced the effort and time required to produce cultures by adopting commercial 2L plastic PET bottles as culture vessels. With large inoculate and antibiotics, sterilization is not needed. Disposal of the bottles eliminates cleaning of culture vessels. The cost and time for producing cultures was reduced more than 2-fold by implementation of this approach. The approach is used at the MCSG, adopted at CESG other labs.

Automated Protein Purification for Structural Genomics

We have developed protocols for automated protein purification that yield “structural-biology-grade” proteins. The purification procedure is compatible with LIC expression system, highly reproducible, yields a homogeneous protein in milligram quantities of native protein and its selenium derivative. These protocols have been implemented on AKTAexpress, AKTA EXPLORER 3D and AKTA FPLC 3D workstations capable performing multidimensional chromatography. The automated chromatography has been successfully applied to over 1,000 soluble proteins, membrane associated proteins and soluble domains of membrane proteins of microbial origin. The approach is used at MCSG. The AKTAexpress platform is being distributed worldwide. 

Semi-Automated Purification of Membrane Proteins

We have developed methods for the efficient purification of integral membrane proteins expressed recombinantly in induced membranes of Rhodobacter.  These semi-automated protocols utilize AKTA FPLC 3D system for multi-dimensional chromatography.  The methods are compatible with a variety of detergents.  Up to 120 mg of crystallization quality, membrane protein samples can be purified per run.  The Rhodobacter membrane protein expression system coupled with the semi-automation of purification steps represents an advance towards the development of a strategy for obtaining structures for integral membrane proteins at a more rapid pace. The approach is used at the MCSG.

Automation of Protein Crystallization for the Production of X-ray Quality Crystals

We have developed automated crystallization strategy for protein crystallization using 96-well format. A set of commercial robotic stations are used for preparing and pipeting solution, setting up crystallization and crystal visualization. For initial crystallization screening, commercially available crystallization formulations and kits are being used. The robotic system allows setting up of 5760 individual crystallizations per eight hrs. Initial screening is integrated with crystal optimization, which is accomplished by automated generation of custom screens. Component variation, combinatorial optimization and screen of cryo additives are incorporated into the system. The approach is used at the MCSG.

Adaptation of Robotic Protocols for Use With Detergents and Amphophiles for Crystallization of Integral Membrane Proteins

We have adapted the protocols used on a commercially available robot to accommodate the requirements for addition of detergents and amphophiles in crystallization trials with membrane proteins.  These samples are often difficult to obtain in large scale and the methods facilitate the efficient use of the samples that can be generated by minimizing the amount of protein that is required for the vapor-diffusion drop.  This technology and our methods allow us to more quickly scan the additional parameter space required in membrane protein crystallization.  The approach is used at the MCSG.

MCSG Database

The MCSG Structural Genomics Projects Database (SGPDB). It is a central database containing information from all the steps of the project, needed to produce the final results and to coordinate the work. The SGPDB interfaces with PDB and many public databases. The SGPDB includes tracking of the progress and sends an automatic weekly reports by e-mail. The status of all MCSG targets can be remotely accessed in real-time through the Internet.  The current status of MCSG pipeline is weekly updated together with information about all other structural genomics centers. The approach is used at the MCSG.

Gene Cloning and Protein Purification Database

The Gene Cloning & Protein Purification Database (GCPPD) is built on Oracle and Windows platforms, which utilizes web technology. GCPPD is a data repository for all experiment information. It is designed for real time data gathering, less user interaction, production trend analysis. The database allows monitoring of production processes to ensure production occurs within the specified metrics. The PPD Web-based “Notebook” user interface shows links between File Repots of different workstations: images of chromatograms, electrophoregrams etc. can be unloaded and stored in the PPD”Notebook” database. The database is fully searchable throughout results of purification. The GCPPD communicates directly with MCSG SGPDB.  The approach is used at the MCSG.

Automated Structure Determination at the Synchrotron Beamlines

The new approach integrates data collection, reduction, phasing and model building and significantly accelerates the process of structure determination and reduces the average number of data sets required for a structure solution. The current system allows for reliable determination of the substructure and the handiness of the solution. The substructure can be analyzed with the help of graphical tools. The NCS symmetry can be established by analysis of substructure.  The utilization of a database technique allows for a rapid generation of electron density maps during the crystallographic experiment. This approach transforms the on-line result of a diffraction experiment from a set of measured intensities to an interpretable electron density map, and in case of smaller structures into a partially built model.  The system is interfaced to MCSG SGPDB, PDB, Swissprot and other generally available databases. The approach is used at the MCSG and the alfa version is installed at the Structural Biology Center user facility.

Automated Crystallography System (ACrS)

The automated structure determination software has been developed.  The ACrS has been designed to be flexible, to allow straightforward incorporation of new algorithms, to be scalable by being highly distributive, and to be integrated with a database that provides a mechanism for analysis of the structure determination process and deposition of results to the PDB. When a data set is submitted, the present system launches several series of programs that carry out the various computational steps needed for crystallographic structure determination.  Each of the series of programs, or paths, obtains information from the database to generate input files and the results are extracted from log files for incorporation into the database.  This system presently initiates three paths for the calculations when a data set is deposited. The ACrS is being tested by investigators from the MCSG, BSGC, CESG, SGPPC pilot projects and the Oxford Protein Production Facility

PDBsum Analyses of Structures

Each structure is imported into the PDBsum database and that database's standard analyses are generated and made available as a series of Web pages including diagrams, RasMol scripts, etc. Additionally, the residue conservation scores of the protein’s sequence are calculated and mapped onto the 3D structure. This provide a powerful means of locating the functionally important part(s) of the protein, especially when viewed in conjunction with analyses of the clefts and cavities in the protein’s 3D structure, as computed using the SURFNET algorithm. The software is freely accessible.

Analysis of Functional Templates

New methods have been developed for searching a 3D structure against databases of 3D structural templates to identify functionally important motifs. The database consists of 189 hand-built active site templates. It is supplemented by new databases of automatically generated templates of ligand- and DNA-binding motifs. A fast search program can rapidly scan thousands of templates against a given structure in a matter of seconds. New algorithms have been developed to reliably distinguish true from false template matches. Furthermore, a new technique has been developed which can locate proteins in the PDB that match the local regions of the query structure, be they functionally or structurally important regions, and can successfully locate even very distant homologues. The software will became available to the public as a part of ProFunc Server.

The ProFunc Server

A web interface has been design, called ProFunc, which, when supplied with a 3D structure, will run a number of analyses, one after another, and combine and abstract the results to give a summary of the protein’s most likely function. Public ProFunc Server is being established.

Binding Site Analysis

Analysis of the clefts and cavities in the protein’s 3D structure can identify its potential binding sites and can sometimes provide clues to its function. The clefts and cavities in the protein's surface and interior are computed using the SURFNET algorithm and an analysis of the cleft sizes, and the types of residues lining each cleft, and their conservation, is tabulated. Visualization of the clefts and their surface properties helps pinpoint the regions that are most likely to be functionally important. The software will became available to the public as a part of ProFunc Server.

Automated Figure Generation

A web-based system have been developed for automatically generating for the consortium's standard-format publications, where Figure 1 is a multiple alignment of the target sequence highlighting the most highly conserved residues, Figure 2 shows a MolScript-style diagram of the protein's 3D structure, and Figure 3 gives a GRASP-like rendering of the protein's surface highlighting the most highly conserved regions. The software will became available to the public as a part of ProFunc Server.

Tar-Get Database - Navigation, Selection, Characterization and Annotation of the Protein Targets for 3D Determination

The MCSG Tar-Get database  (http://compbio.mcs.anl.gov/target/) is using an Oracle relational database system that interfaced with MCSG database. It allows efficient navigation and updating of the targets. The web-based user interface allows interactive analysis of the potential targets using variety of bioinformatics tools and tools for visualization of the data. A set of interactive tools include Psort, Gtop, CATH, and TMHMM. TarGet database is updated weekly with the similarity information against PDB database and the Protein Target Database in the National Institute of General Medical Sciences (NIGMS). Users are notified regarding changes in protein target status by automated email system. Tar-Get Database is open to public.

Automated Homology Modeling of Relatives

An automated pipeline has been developed which builds homology models for close relatives (>=30%id) of each target structure solved by the MCSG consortium. These models will be displayed on a web site GEMMA (genome modeling and model annotation). All sequence relatives for the target structure are identified in GenBank using BLAST. Models are built using COMPOSER and their quality assessed using PROSAII.  Public GEMMA Server is being established.

Small Scale Automation on the Bench

The interaction between various lab equipments, a bar code writer, and Personal Digital Assistant (PDA) through a wireless computer network allows for inexpensive, small-scale automation.  A Personal Digital Assistant (PDA), which has an integrated barcode reader, can communicate with all lab equipment through the 802.11b wireless protocol. For example, production of a stock solution requires only an input of desired solution type, its molarity and approximate volume. The information about reagents is downloaded from the lab database, the solution is prepared, and information about the solution produced is uploaded to the same database. The solution is made semi-automatically with all individual steps generated by the pen-tap of the PDA screen. The final step of this operation is a barcode label, which has to be attached manually to the solution container. All data about the solution and the process are stored in a SQL database. This approach not only increases efficiency of wet-lab procedures, but also, more importantly, creates reliable audit trail known in the past as a laboratory notebook. The system allows for consistency and reproducibility that can be applied in a very small laboratory, as well as in a large proteomics center.

Automatic Model Building for Low-Resolution Data

It builds main chain very well even in low-resolution maps. It has been used in the structure solution of five targets. The procedure uses different library fragments for different resolutions of the data, so the optimization of the model building is different for various resolution data. The mail purpose of the system is the optimization of model building and initial refinement of low resolution when only low-resolution data are available.

Development of Database/Expert System

The main issue when incorporating numerical programs into an automated system is finding the generally valid rules of using them. The initial rules need to be re-evaluated after an extensive numerical experimentation with various data sets. The process of developing such a set of rules (expert system) will continue through the next grant period. The general rules discussed in the literature and at meetings have all kinds of exceptions in circumstances that will be encountered. Such cases can be handled manually or by programs from outside the system. A basic interface that will simplify such interactions (e.g. format changes) will be developed. However, it is expected that over the time, these exceptions will become cases covered by specialized rules and the need for manual steps and external programs will diminish

This page last updated October 20, 2007