The Emergence of Bioinformatics

There was a time when biology happened mostly in dissection labs, test tubes, and under microscopes. But the discovery of DNA and the ability to sequence genomes changed all that. Biology evolved into molecular biology — the push to understand biological process at the level of chemistry encoded by the nucleotide bases of DNA.

Brookhaven National Laboratory bioinformaticist Sean McCorkle.

Though biologists quickly developed methods to tag the four bases (A, T, G, and C), “manually” reading hundreds of sequences — each nearly a thousand base-pairs long — soon became inefficient, or impossible. The biologists needed help sifting through the volume of data.

Fortunately, Brookhaven Lab’s resident physicists have plenty of experience dealing with enormous data sets. “In physics experiments, computer programs analyze millions of particle collision events, for example, to find those events that are interesting or significant,” says Brookhaven’s Sean McCorkle. McCorkle, now a bioinformaticist, moved to the Biology Department from Physics, where he had designed data-acquisition systems, when the need for computer-programming expertise in life sciences became apparent.

Bioinformaticists like McCorkle handle the nuts and bolts of data — maintaining databases and writing the software to “mine” that data for useful information. The biological information encoded in DNA is a natural for this kind of analysis, McCorkle says, because “DNA is an informational molecule, the computer program of the cell.”

McCorkle also runs “in silico” experiments, or computer simulations. “Before we could do any of these gene-tagging experiments in reality, we ran simulations to prove that they would work,” McCorkle says.

Data mining can also be applied to proteins, the molecules that carry out the work of cells. McCorkle recently collaborated with Brookhaven biochemist John Shanklin on a program designed to identify the parts of proteins most likely to control their functions. The program presents the data in a visually intuitive way.

“With more and more protein-sequence data available, computers become an important tool for sorting the informational ‘wheat’ from the ‘chaff,’” Shanklin says.

Without programs like the ones McCorkle designs, efforts to identify and use relevant biological information would be enormously time-consuming — and expensive.