Mysteries of Life:
From Molecules to Mice

Human genome
The human genome is composed of approximately 3.5 billion base pairs of DNA. Each base pair has the chemical letters A and T (for adenine and thymine), which are colored blue and orange; or it has the chemical letters C and G (for cytosine and guanine), which are colored red and green. Scientists are trying to determine the order of these base pairs in long stretches of human DNA.
Understanding how proteins work is a key to unlocking the secrets of life and health. Nothing happens in our bodies without them. As enzymes, proteins catalyze the living cell's chemistry. As hormones, these molecules regulate the body's development, direct our organs' activities, and organize our thoughts. As antibodies, they defend us against infection, but in their mutant forms or as coats on viruses, they help cause diseases such as sickle-cell anemia, cancer, or AIDS. What makes proteins so specific in observed functions are their unique shapes, which can range from ellipsoids to saucers to dumbbells.

Each type of protein has a highly specific, three-dimensional (3D) structure that determines its biological activity—that is, its function in each body cell. Each protein is a product of a specific gene, so to understand the function of each of the 80,000 to 100,000 genes in the human genome, it helps to know the shapes and activities of the proteins encoded by each gene. To understand how a cell works, it is crucial to know the 3D structures of its proteins.

A protein starts out as a string of amino acids (a combination of any of 20 different ones). The sequence of the amino acids is dictated by the order of the DNA bases in the gene that directs the protein's synthesis. The amino-acid string folds reproducibly to produce the protein's functional 3D shape. It's like bending a flexible wire connecting Ping-Pong balls of different colors to form a 3D complex shape that puts Ping-Pong balls of certain colors close together.

The protein's function depends largely on how it is folded to give it a specific 3D form. For example, folding brings together widely separated amino acids to form an active site—the catalytic region of an enzyme that binds with a biochemical substance to cause a specific activity in the body, such as digestion.

ORNL uses several technologies to determine the sequence of bases in genes and the structures of proteins, especially in the mouse (which is related genetically to humans). The ORNL-developed lab on a chip, mass spectrometry, and high-speed sequencing robots are being used to determine the order of bases in DNA sequences thought to contain genes. X-ray crystallography and mass spectrometry are used to decipher the structure of mouse proteins, including those involved in inflammation, a characteristic of diseases found in both mice and humans. Another approach at ORNL is to predict protein structure using computer modeling.


Predicting Protein Shapes

Although the amino-acid sequences of tens of thousands of proteins have been determined, the 3D structures of only about 1500 different proteins are known today. Amino-acid sequencing is a fairly rapid process, whereas determining the 3D structure of a protein is very time consuming and expensive. It can take a year for a crystallographer to determine the structure of a protein. Considerable time and money would be saved if the 3D structure of every protein could be predicted from its amino-acid sequence. Some researchers believe that, by 2005, computer modeling will accurately predict the structures of 75 to 100 unknown protein sequences a day. Then therapeutic drugs to block disease-causing proteins by matching their shapes might be developed more quickly.

The Computational Protein Structure Group in the Computational Biosciences Section of ORNL's Life Sciences Division has developed a suite of computational tools for predicting protein structure. The group, led by Ying Xu, includes Oakley Crawford, Ralph Einstein, Michael Unseren, Dong Xu, and Ge Zhang. Their computer package, called the Protein Structure Prediction and Evaluation Computer Toolkit (PROSPECT), allows a user to predict the detailed 3D structure of an unknown protein, including its shape and the location of each of its amino acids.

Using PROSPECT, the ORNL group has made predictions for all 43 target proteins in an international contest for protein structure predictions, called CASP-3. ORNL placed in the top 5% of about 100 groups worldwide.

Three-dimensional structure of a protein
Using a computer program such as PROSPECT, ORNL researchers can predict the likely three-dimensional structure of a protein from the order of the amino acids in the "target sequence."
One approach the group uses is "protein threading," a term suggested by embroidery in which a thread is pulled through a predetermined design. In this case, the thread is a string of amino acids. ORNL scientists computationally superimpose the same amino-acid sequence in 1000 different representative protein structures to determine the structure that is the best fit. They do calculations to determine which structure aligns the amino-acid atoms at their lowest energy level (where the atoms want to be) and in positions where they are compatible with their neighbors. The representative protein structure that best fits a target amino-acid sequence is predicted to be the target's approximate structure.

"We also use an approach called homology modeling to fine-tune the predicted structure," says Ying Xu. "We computationally ‘tweak' the structure of the new protein by calculating the detailed forces between atoms and making adjustments in the final predicted structure to minimize the atoms' energies."

Research groups from the National Institutes of Health, the Department of Energy's Lawrence Berkeley Laboratory, Amgen, and Boston University have expressed interest in using PROSPECT in their research and in collaborations with ORNL to further develop the computer toolkit. By folding their ideas together, the collaborators may soon solve a classic problem.


Computing the Genome

A team of researchers in Europe spent two years searching for the gene responsible for adrenoleukodystrophy, a disease described in the movie Lorenzo's Oil. The team tried the standard experimental techniques of mapping and sequencing. The researchers fragmented the chromosome believed to harbor the gene, producing ordered pieces of a manageable size. They placed these fragments into high-throughput sequencing machines. They obtained the order of the chemical bases in the entire chromosome. But they still couldn't find the gene. So in 1995 they e-mailed information on the sequence to the Oak Ridge computer containing the ORNL-developed computer program called Gene Recognition and Analysis Internet Link (GRAIL™). Within a couple of minutes, using statistical and pattern-recognition tools, GRAIL™ returned the location of the gene within the sequence.

The ability of computing to find patterns in a flood of data gathered through mapping and sequencing is being increasingly appreciated by biologists. In the next four years, a new sequence of approximately 2 million DNA bases will be produced every day. Each day's sequence will represent about 75 to 100 new genes and their respective proteins. This information will be made available immediately on the Internet and in central genome databases.

The DNA building blocks of several living organisms—a methane-producing microorganism from deep-sea volcanic vents, an influenza virus, yeast, and the round worm (C. Elegans)—have been completely sequenced. The sequencing of other organisms (e.g., the fruit fly) will be completed soon. The three million links in the human genome chain are expected to be completely sequenced by 2003, and 10,000 of our 80,000 to 100,000 genes will be identified then. Plans call for the order of DNA bases in the mouse genome to be determined by 2005.

But a complete set of sequence data for any organism may not be very useful to medical researchers, molecular biologists, and environmental scientists without organized and comprehensive computational analysis. Such comprehensive genome analysis is needed to help researchers understand the basic biology of humans, microbes, plants, and other living organisms.

To provide a comprehensive genome-wide analysis of genome sequence data from different organisms and help integrate biological data around a genome-sequence framework, ORNL and a team of researchers at the DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Baylor College of Medicine, Hospital for Sick Children (Toronto), Johns Hopkins University, Washington University, University of California at Santa Cruz, University of Pennsylvania, and the National Center for Genome Resources have constructed a computational resource that uses GRAIL-EXP, GENSCAN, and a suite of other tools to annotate genome sequences. Annotation is the process of organizing biological information and predictions in a sequenced genome framework (e.g., linking what a gene does to its structure). (See http://compbio.ornl.gov/gac/index.shtml).

The team has developed a plan and has built a first prototype of the needed genome analysis framework and toolset. The prototype can do the following:

  • Retrieve biological data and assemble genomes;
  • Compute genes, proteins, and genome features from sequences and experimental data (e.g., the group that developed PROSPECT is predicting protein structure from amino-acid sequences available on ORNL computers);
  • Compute homology and function among genomes, genes, and gene products (e.g., proteins);
  • Model the three-dimensional structure of gene products; and
  • Link genes and gene products to biological pathways and systems.

"We have made considerable progress in addressing some data management, data storage, and data access issues," says Ed Uberbacher, head of the Computational Biosciences Section in ORNL's Life Sciences Division. "For example, we developed a unique information resource and Web browser called the Genome Channel, which is available on the Internet. It gathers the results from sequencing centers around the world. It provides a fully assembled view of what is known about the human genome and its chromosomes, sequences, and experimentally cloned genes. It also provides information on computationally predicted genes. The Genome Channel is currently being used by the worldwide genome community to identify and predict gene and protein sequences of interest."


Producing and Screening for Mouse Mutations

Ed Michaud
Ed Michaud watches the activity of normal and mutant mice in large beakers at the Mammalian Genetics Section research laboratory.
Because mice and humans are genetically so similar, biologists can study genetic diseases in mice to better understand similar disorders in humans. DOE considers the mouse to be the most important mammalian model organism, and DOE's Human Genome Project has proposed to devote 10% of its efforts in DNA sequencing to the mouse genome. ORNL is playing a major role in determining the functions of mouse genes as a part of the Human Genome Project.

ORNL's Mammalian Genetics Section of the Life Sciences Division, with a large capacity for mouse production and a long history in mouse genetics and mutagenesis, has taken the lead in mouse functional genomics for DOE. The long-term goal of functional genomics at ORNL is to develop and employ the fastest, smartest, cheapest, and most efficient, high-throughput methods for generating and analyzing mouse mutations to help discover the functions of all 70,000 to 100,000 mouse genes.

Starting in the late 1940s, ORNL researchers led by Bill and Liane Russell developed mutant strains of mice as they studied the genetic effects of radiation and chemical exposures on the animals. Using these experimentally induced mutations as a starting point, current studies are designed to find not only obvious changes in characteristics (phenotypes), such as altered coat color, but also more subtle disease phenotypes caused by a change in or deletion of a gene (genotype). The mutant stocks generated in ORNL's historic program to assess the genetic risks of exposing mammals to radiation make ideal targets for current mutagenesis efforts. For example, some of these mutant stocks contain deletions of a known section of a chromosome, and when combined with a chemically induced single-gene mutation in the same section of the paired chromosome, results in a mouse lacking any normal copies of that single gene. Using this approach, the mouse lacking the normal gene will reveal the function of the gene based on the resulting disease phenotype, and the chromosome deletion serves to identify the approximate physical location of the gene.

Eugene Rinchik, one of the staff scientists leading research in this program, focuses on making mutations in mice and then using various techniques to discover the resulting phenotypes. Among the techniques employed, in addition to simple observation of the animals that could carry mutations, are tests for motor ability and behavior, as well as analysis of body fluids and tissues to detect subtle differences. The mice are also scanned in ORNL's newly developed MicroCAT device (see next section) to see internal changes such as fat deposits and enlarged organs. In these ways, mutations that cause, for example, diabetes, obesity, depression, anemia, kidney disease, nervous disorders, or stomach problems may be detected. Once a mutation is confirmed, various molecular mapping techniques are used to isolate the chromosome region and then the actual gene causing the disorder in the mouse. By linking the disorder to the mutated gene, the normal function of the gene can then be deduced. For example, by locating a mutated gene that causes cleft palate in mice, ORNL's Cymbeline Culiat was able to analyze the normal gene that assists the closing of the palate in the developing mouse.

One way to make mutations in single genes is to inject male mice with ethylnitrosourea (ENU), a powerful chemical mutagen discovered by ORNL's Bill Russell in 1979. ENU causes the substitution of one chemical base for another in the DNA of male spermatogonial stem cells, which continuously produce mature sperm. When the ENU-treated male mouse is mated with an untreated female mouse, some offspring may have new mutations. Over the past 10 years ORNL's Eugene Rinchik and Don Carpenter isolated 31 new mutations in more than 4500 pedigrees from one large ENU experiment. In a second ENU experiment focusing on a different section of the mouse genome, they have so far isolated 19 new mutations from 1250 pedigrees tested, have mapped their positions on the target mouse chromosome, and have begun cloning the genes responsible for four of the new mutations.

Mouse mutations, then, have historically been made by treating live mice with mutagens and breeding offspring to look for mutations. Now, mutations can also be made very efficiently in a culture dish using special cells from early mouse embryos; these embryonic stem cells have not yet differentiated into specific cell types but retain the potential to become any kind of cell in the mouse. After using molecular techniques to replace a particular normal gene with a mutant one, or to produce a deletion or rearrangement of a whole section of chromosome in the embryonic cell, ORNL's Ed Michaud and his colleagues can use the specifically altered cell to produce a live mouse carrying the desired genetic change. If the new mouse exhibits a mutation, such as epileptic seizures, then the engineered genetic change is assumed to have caused the seizures.

The ORNL researchers also have the capability to make different types of mutations in the same gene to see the whole spectrum of functions in which a gene might be involved. Different gene mutations may completely turn the gene off so it produces no protein, lower the quantity of protein the gene produces, or alter the normal structure of the protein, causing a disease or disorder. According to Rinchik, a slightly injured gene resulting in a slightly altered mutant protein may help us understand the origin of a disease, because most human genetic diseases can be tied to a subtle alteration in a gene rather than a complete loss of gene function.

Mice on Rotor-Rod
Normal mice maintain their balance on the rapidly turning Rotor-Rod. Because mice having certain mutations lack the coordination and balance of normal mice, they can be identified in the Rotor-Rod test because they fall off the rotating rod more quickly.
ORNL researchers Dabney Johnson, Karen Goss, Jack Schryver, and Gary Sega have developed high-throughput biochemical and behavioral screening tests for the detection of subtle mutations in mice. These tests are routinely performed on 100 mice per week. For example, one test used in screening measures how long mice can maintain balance on a rotating dowel rod in a test for neuromuscular coordination, while another instrument quantifies the startle response to a sudden sound.

To increase the breadth and accuracy of screening for mutant mouse phenotypes at what Johnson calls the Screenotype Center, ORNL has organized the Tennessee Mouse Genome Consortium (TMGC). The TMGC taps into the expertise of academic and clinical researchers across the state; membership consists of the University of Tennessee at Knoxville, UT-Memphis, St. Jude Children's Research Hospital, Vanderbilt University, and Meharry Medical College. The TMGC participates both in screening mice for new mutations and in more detailed analysis of confirmed mutations. If, for example, a mutant strain has epileptic seizures, ORNL sends mice or samples from mice to consortium members qualified to determine if the cause is neurochemical or neurophysical and if this mouse is a good model for some form of human epilepsy. Currently, consortium members are helping ORNL screen mice for vision and hearing problems, brain and other organ malfunctions, neurotransmitter content in the brains, and the normal production of sperm cells.


MicroCAT "Sees" Hidden Disorders
in Research Mice

A mouse may be able to hide from a cat, but some types of genetic disorders hidden in mice can now be seen by the MicroCAT miniature X-ray computerized tomography (CT) system devised by Mike Paulus, Hamed Sari-Sarraf, and Shaun Gleason, all of the Instrumentation and Controls (I&C) Division. This high-resolution X-ray imaging system, a kind of CT scanner for mice, allows biologists to see a detailed, three-dimensional image of the internal structure of a mouse in just a few minutes. Traditionally, determining if mice carry subtle anatomical disorders has been a slow, labor-intensive, manual process. Now, this new tool greatly cuts the time needed to determine accurately if a mouse has internal malformations not visible upon external inspection. Thus, it may speed the process of finding cures for some human diseases. For example, imaging of specific fat deposits in an anesthetized mouse allows ORNL researchers to track both the accumulation of fat in a mouse that carries mutant genes involved in obesity and the result of dietary or other obesity treatments.

The I&C group is writing software to allow the computer to inspect and analyze the images to alert researchers to possible abnormalities of interest. The MicroCAT tool has already attracted the attention of researchers around the country who would like to image their own research animals using the Oak Ridge prototype.


Mouse Gene for Stomach Cancer
Identified at ORNL

In a search for a gene thought to cause some mice to be born deaf, an ORNL researcher determined that the same gene can cause stomach cancer in mice. The discovery could speed up understanding of how both mice and humans get stomach cancer.

The research was performed by Cymbeline (Bem) Culiat, a staff molecular biologist with the Mammalian Genetics Section in ORNL's Life Sciences Division, in collaboration with former ORNL researcher Lisa Stubbs, now with DOE's Lawrence Livermore National Laboratory (LLNL).

Former ORNL biologist Walderico Generoso had induced the deafness mutation, designated 14Gso, in mice by irradiating male mice with X rays and then mating them with untreated female mice. Unlike normal mice, 14Gso mouse pups were not startled by loud noises, their heads persistently bobbed, and they frequently ran in circles in their cage. These behaviors suggested defects in the inner ear, where hearing and balance are controlled. Studies of the inner ear structures of these mutant mice showed they were too defective to allow sounds to be heard.

In an attempt to locate the gene believed responsible for deafness in these mutant mice, Culiat focused on the tips of two of their chromosomes (7 and 10). Through microscope studies of stained chromosomes, ORNL's Nestor Cacheiro found evidence that genes on both tips had been disrupted and their parts exchanged. Culiat began hunting for the deafness gene in the tip of chromosome 7, which is mapped more extensively than chromosome 10 in the mouse.

Bem Culiat
Bem Culiat washes DNA samples of cloned mouse genes isolated and purified from bacterial cultures where multiple copies of the genes are made.
Using various genetic and molecular mapping techniques, Culiat localized the mutated region in chromosome 7 to a DNA segment containing muc2 (intestinal mucin 2), a gene coding for a major protein in the mucus lining of the intestine. A literature search indicated that one end of the protein produced by the human MUC2 gene is very similar to another protein associated with deafness in humans, thereby making muc2 a candidate gene for the inner ear defects observed in 14Gso mice.

"I checked the expression of this muc2 gene in the deaf mice by measuring their levels of RNA, which carry the gene's instructions for synthesizing protein," she says. "The gene is normally expressed in the intestine and kidney, but I found it was overexpressed in the stomach and lungs and showed a loss of expression in kidneys of the mutant mice. In humans, the overexpression of muc2 in the stomach is associated with chronic gastritis leading to gastric lymphomas and adenocarcinomas. Therefore, we predicted the same defects will occur in the mutant mice."

Stomach pathology studies and examination of the gastrointestinal systems of 14Gso mice by Xiaochen Lu, a researcher in Stubbs' LLNL laboratory, showed inflamed stomachs (gastritis), ulcers, and gastric cancer (lymphomas and adenocarcinomas), the same defects found in humans. "This mutant mouse," Culiat says, "is a good mouse model for studying how gastritis progresses to stomach cancer in both mice and humans."

So far examination of the mutant mice has revealed no abnormal expression of muc2 in inner ears. More detailed analysis of this large gene and analysis of the mutated region of mouse chromosome 10 are both needed to confirm or rule out the involvement of muc2 in the inner ear defect of 14Gso mice.

"If muc2 turns out to be the deafness gene in our mutant mice," Culiat says, "then we may be able to determine if there are mutations in this gene in certain groups of deaf people."

Culiat performed most of this research at ORNL as a postdoctoral scientists working with Stubbs. She was supported by the Alexander Hollaender Postdoctoral Fellowship Program of the Oak Ridge Institute for Science and Education.

Certain segments of the gene muc2 have been cloned and sequenced at ORNL. The sequencing and cloning of this very large gene will be completed at LLNL under the direction of Stubbs. The cloning and characterization of the chromosome regions containing the 14Gso mutation are goals of a continuing collaboration between Stubbs and Culiat.

By identifying and characterizing genes and proteins using various technologies and mouse experiments, Oak Ridge researchers are finding clues that could lead to cures for human diseases.


Green Genes: Genetic Technologies for the Environment Table of Contents Search the ORNL Review Site Comments to Editor ORNL <i>Review</i> Home Page ORNL Home Page