Review banner

bulletORNL Review Home Page
bulletFeatured in This Edition
bulletLast Article
bulletNext Article
bulletSearch the ORNL Review Site
bulletComment on this article

A computational analysis of the human genome by ORNL and UT researchers provides insights into what our genes do.

Human Genome Analyzed Using Supercomputer

The human genome has 100,000 genes. One gene makes one protein. Humans and bacteria have entirely different genes.

These common beliefs were shattered earlier this year by the findings of the International Human Genome Sequencing Consortium, which includes the Department of Energy Joint Genome Institute (JGI) to which ORNL contributes computational analysis. On February 15, 2001, three days after a major announcement, the consortium published the paper "Initial Sequencing and Analysis of the Human Genome" in the journal Nature. The paper states that the human genome has "about 30,000 to 40,000 protein-coding genes, only about twice as many as in worm or fly"; each gene codes for an average of three proteins; and it is possible that hundreds of genes were transferred from bacteria to the human genome.

Ed Uberbacher shows a printout that depicts human chromosome 20's genes
Ed Uberbacher shows a printout, spread over a conference table, that depicts human chromosome 20's genes as bars of different colors and lengths. In the background are genes from microbes. (Photo by Curtis Boles.)

Ed Uberbacher, head of the Computational Biology Section in ORNL's Life Sciences Division, was one of the hundreds of contributors to this landmark paper. He and his ORNL colleagues performed computational analysis and annotation of the DNA data produced by JGI to uncover evidence of the existence of genes about which little or nothing was known.

Uberbacher and his colleagues also performed an analysis of the complete, publicly available, human genome. The analysis, funded by DOE, was performed by ORNL, University of Tennessee, and University of Pennsylvania researchers using three computational methods, the GenBank database, and the IBM RS/6000 SP supercomputer at DOE's Center for Computational Sciences at ORNL. One of the computational methods used was the latest version of the Gene Recognition and Analysis Internet Link (GRAIL), which was developed by Uberbacher and others at ORNL in 1990 and rewritten as GrailEXP for parallel supercomputers.

"We have found experimental and computational evidence for some 35,000 genes," says Uberbacher. "We have also provided information on how many genes are expressed in different tissues and organs of the body. For example, we determined that more than 20,000 genes are expressed in the central nervous system. About 10% of all human genes are expressed only in the brain."

The IBM RS/6000 SP supercomputer at ORNL was used for human genome analysis
The IBM RS/6000 SP supercomputer at ORNL was used for human genome analysis.
(Photo enhanced by Gail Sweeden.)

The researchers found 728 cell-signaling genes that tell cells when to divide and when to grow. They identified "zinc fingers"—regulatory proteins that bind to DNA bases composing genes to turn them on or off. These cell-signaling genes and zinc fingers are unique to the human genome.

GrailEXP located almost 2600 genes exhibiting "alternative splicing"—the ability to produce two or more proteins by combining the gene's dispersed protein-coding regions (exons) in different ways.

Each human gene contains multiple exons separated by noncoding regions called introns. Cellular machinery called a spliceosome strips out all the introns and joins the exons together. Sometimes certain exons are skipped.

"We found a gene with 10 exons, but in different human tissues different individual exons are not read, so part of the code is left out that directs the cell to make a protein," Uberbacher says. "This gene could have 10 different protein products."

Some of the genes are known, and detailed information on their sequences is found in GenBank. Other genes are less well characterized but are similar to genes found in model organisms, such as the mouse. Still other genes are inferred based on expressed sequence tags (ESTs). An EST is a unique stretch of DNA within a coding region of a gene that can be used to identify full-length genes. ESTs were used in computational predictions to locate additional genes and predict the makeup, structure, and function of the proteins they encode.

In addition to genes, the researchers found many DNA sequences that are repeated in the human genome. This "junk" DNA may have a purpose: It lowers the probability that random mutations in DNA strike the coding sections of important genes. "Although the human and mouse genome are about the same size," Uberbacher says, "we found longer stretches of repeated DNA sequences, making up 40 to 48% of the human genome, separating clusters of genes like vast deserts between metropolitan areas."

Beginning of Article

Related Web sites

ORNL Life Sciences Division
ORNL Life Sciences Division Computational Biology Section
DOE's Center for Computational Sciences
DOE Joint Genome Institute

Protein Prediction Tool Has Good Prospects Table of Contents Search the ORNL Review Site Comments to Editor ORNL Review Home Page ORNL Home Page