he Drosophila genome that has stormed in at #1 is a stunning vindication of the whole genome shotgun (WGS) approach to sequencing adopted by J. Craig Venter and Celera Genomics. Instead of the painstaking chromosome-by-chromosome approach of conventional sequencing, WGS blasts the entire genome into small, easily sequenced fragments and then uses powerful computers to assemble the fragments into a complete sequence. In 1998 Venter announced that his private company would use WGS to beat the publicly funded Human Genome Project at its own game. To prove the approach, which skeptics said could never work on a large genome, Celera set its sights on the fruit fly. The crucial issue for WGS was how it would deal with the long stretches of simple repeated sequences that pepper the eukaryotic genome. These offer the computer programs plenty of opportunity for ambiguity and mis-assembly. Celera's answer was to create libraries of three different fragment sizes–2, 10 and 150 kb–and sequence the ends of these. The overlapping and oriented fragments of different length could be brought together into increasingly dense and interlinked scaffolds on which the rest of the sequence data could be conveniently and accurately assembled. Celera was not alone. It collaborated with the Berkeley Drosophila Genome Project (BDGP) and the European Drosophila Genome Project which, with other researchers, had already created several important genetic resources and sequenced about 29 Mb of the total 180 Mb. These data proved important not only to assemble the short sequences but also as a check that the project was accurate and on track. It was. After a remarkably short time, just six months, the sequence was in. That created the next big headache; how to identify all the genes and work out what they did, a process called annotation. The project directors decided to mimic the shotgun approach. Just as they had sequenced the entire genome at once, so they would annotate it in one brain-bursting session. Celera invited 45 experts to its Maryland campus and more or less locked them up with the data and as much computational power and expertise as they needed. Not that the scientists needed much locking up. For 11 days they scanned the sequences looking for genes they knew and genes they didn't. Each night Celera's programmers used the previous day's findings to re-examine data. The discoveries of Celera's annotation jamboree filled the best part of Science's issue of 24 March 2000, and results are continuing to emerge. Drosophila contains many fewer genes than the worm Caenorhabditis elegans, about 14,000 compared to the worm's 18,400, despite having ten times more cells in its body. But Drosophila seems to make up for that by using the same sequence several ways. The worm has four different myosin genes, while the fly has one gene that can be assembled in many different ways. Vertebrates have three different DNA sequences that produce three different aldolase enzymes; Drosophila has one sequence that can produce three different enzymes. This may be a general phenomenon, that Drosophila has a single sequence capable of multiple assemblies, but there are other areas in which Drosophila is overendowed. It has 352 genes for zinc-finger proteins, while Caenorhabditis has only 152. No general theory has emerged to account for these differences, much less to understand them. The similarities are important too. Drosophila shares many genes with vertebrates in general and humans in particular. That suggests new ways to understand human diseases. Having identified the Drosophila homolog of a gene associated with disease in humans, especially a poorly-understood gene, the full arsenal of genetic techniques developed for the fruit fly can be brought to bear on it. The expression of the gene during development, the impact of both under- and over-expression of the gene, and the other genes it interacts with can all be studied much more easily. Insulin genes, which scientists have long been seeking, are there, as are receptors for many important hormones. Other valuable genes newly revealed by the sequence are homologs of tau and Parkin involved in
Parkinsonism, and the p53 tumor supressor gene. Science
writer Dr. Jeremy Cherfas
|
Search | Jan/Feb 2001 Index | Archives | Contact | Home
|
|
|
|
|
Science
Watch® is an editorial component of Essential
Science Indicators |
|
|
|
(c) 2008 The
Thomson Corporation. |