|
Insights from the Human DNA Sequence
Genomics
and Its Impact on Science and Society: The Human Genome Project and
Beyond
The first panoramic views of the human genetic
landscape have revealed a wealth of information and some early surprises.
Much remains to be deciphered in this vast trove of information; as the consortium
of HGP scientists concluded in their seminal paper, “. . .the more we learn about the human genome, the more there is to explore.” A few highlights from the first publications analyzing the sequence.
- The human genome contains 3.2 billion
chemical nucleotide bases (A, C, T, and G).
- The average gene consists of 3000 bases, but
sizes vary greatly, with the largest known human gene being dystrophin
at 2.4 million base pairs.
- Functions are unknown for more than 50%
of discovered genes.
- The human genome sequence is almost (99.9%)
exactly the same in all people.
- About 2% of the genome encodes instructions
for the synthesis of proteins.
- Repeat sequences that do not code for proteins
make up at least 50% of the human genome.
- Repeat sequences are thought to have no direct
functions, but they shed light on chromosome structure and dynamics. Over
time, these repeats reshape the genome by rearranging it, thereby creating
entirely new genes or modifying and reshuffling existing genes.
- The human genome has a much greater portion
(50%) of repeat sequences than the mustard weed (11%), the worm (7%), and
the fly (3%).
- Over 40% of the predicted human proteins share
similarity with fruit-fly or worm proteins.
- Genes appear to be concentrated in random
areas along the genome, with vast expanses of noncoding DNA between.
- Chromosome 1 (the largest human chromosome)
has the most genes (3168), and the Y chromosome has the fewest (344).
- Particular gene sequences have been associated with
numerous diseases and disorders, including breast
cancer, muscle disease, deafness, and blindness.
- Scientists have identified millions of
locations where single-base DNA differences occur in humans.
This information promises to revolutionize
the processes of finding DNA sequences associated
with such common diseases as cardiovascular disease, diabetes, arthritis,
and cancers.
Organism |
Genome Size
(Base Pairs) |
Estimated
Genes |
Human (Homo sapiens) |
3.2 billion |
25,000 |
Laboratory mouse (M. musculus) |
2.6 billion |
25,000 |
Mustard weed (A. thaliana) |
100 million |
25,000 |
Roundworm (C. elegans) |
97 million |
19,000 |
Fruit fly (D. melanogaster) |
137 million |
13,000 |
Yeast (S. cerevisiae) |
12.1 million |
6,000 |
Bacterium (E. coli) |
4.6 million |
3,200 |
Human immunodeficiency virus (HIV) |
9700 |
9 |
The estimated number of human genes is only one-third as
great as previously thought, although the numbers may be revised as more
computational and experimental analyses are performed.
Scientists suggest that the genetic key to human complexity lies not in
gene number but in how gene parts are used to build different products
in a process called alternative splicing. Other underlying reasons for
greater complexity are the thousands of chemical modifications made to
proteins and the repertoire of regulatory mechanisms controlling these
processes.
|
|