Science Watch® - Tracking Trends and Performance In Basic Research
May/June 1999


High Interest Rate for Riches Stored in GenBank by Jeremy Cherfas




WHAT'S HOT IN BIOLOGY...

Rank Paper Citations
This Period
Jan-Feb
99
Rank
Last Period
Nov-Dec
98
1 S. F. Altschul, et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res., 25(17):3389-3402, 1 September 1997. [NIH, Bethesda, MD; Pennsylvania St. U., University Park] *XU793 144 1
2 F.R. Blattner, et al., "The complete genome sequence of Escherichia coli K-12," Science, 277(5331):1453-74, 5 September 1997. [U. Wisconsin, Madison; U. Michigan Sch. Med., Ann Arbor; FMC Bioproducts, Rockland, ME; U. Natl. Autonoma Mexico, Moreles] *XV429 80 4
3 J. Yang, et al., "Prevention of apoptosis by Bcl-2: Release of cytochrome c from mitochondria blocked," Science, 275(5303):1129-32, 21 February 1997. [Emory U., Sch. Med., Atlanta, GA:] *WJ503 75 3
4 R.M. Kluck, et al., "The release of cytochrome c from mitochondria: a primary site for Bcl-2 regulation of apoptosis," Science, 275(5303):1132-6, 21 February 1997. [La Jolla Inst. Allergy and Immunol., San Diego, CA] *WJ503 73 2
5 F. Kunst, et al., "The complete genome sequence of the Gram-positive bacterium Bacillus subtilis," Nature, 390(6657):249-56, 20 November 1997. [46 institutions worldwide] *YG667 61 9
6 P. Li, et al., "Cytochrome c and dATP-dependent formation of Apaf-1/Caspase-9 complex initiates an apoptotic protease cascade, " Cell, 91(4):479-89, 14 November 1997. [Howard Hughes Med. Inst., U. Texas Southwest. Med. Ctr. Dallas; Thomas Jefferson U., Philadelphia, PA] YG492 55 6
7 J.-F.Tomb, et al., "The complete genome sequence of the gastric pathogen Helicobacter pylori," Nature, 388(6642):539-47, 7 August 1997. [6 U.S. and Swedish institutions] *XP722 53 5
8 H. Zou, et al., "Apaf-1, a human protein homologous to C. elegans CED-4, participates in cytochrome c-dependent activation of caspase-3," Cell,90(3):405-13, 8 August 1997. [U. Texas Southwestern Med. Ctr. Dallas; Genentech, South San Francisco, CA] *XQ063 44 7
9 M. Enari, et al., "A caspase-activated Dnase that degrades DNA during apoptosis, and its inhibitor ICAD," Nature, 391(6662):43-50, 1 January 1998. [Osaka U. Med. Sch., Japan; Kirin Brewery Co., Kanagawa, Japan; Osaka Biosci. Inst., Japan] *YP888 43 10
10 D.A. Benson, et al., "GenBank, " Nucleic Acids Res., 26(1):1-7, 1 January 1998. [Natl. Library Med., NIH, Bethesda, MD] *YV004 39

SOURCE: ISI's Hot Papers Database.  Read the full legend.

   If you use GenBank as a tool in your published research, we ask that this paper be cited." Are there any molecular biologists who don't use GenBank, the congressionally-mandated public database of all known nucleotide and protein sequences? No wonder the 1998 review of GenBank's activities has stormed into the bottom (see paper #10) of an otherwise barely changed Top Ten.

   GenBank is the main product of the National Center for Biotechnology Information (NCBI), which the U.S. Congress created in 1988 to underpin efforts to sequence the genome. It consists of an intricate set of interdependent databases and the tools to interrogate them, the raw information streaming in from laboratories around the world. (Each day, GenBank in Bethesda talks to the European Bioinformatics Institute and the DNA Databank of Japan to ensure that all three of them have the latest versions of the sequences each has collected.)

   Researchers submit sequences to GenBank, which tries to make sense of them. It sorts out sequences according to what species they come from, what proteins they code for, what other sequences they resemble, and whether they are unique. All that information is then made available to members of the molecular biology community (and anyone else who cares to point a browser at http://www.ncbi.nlm.nih.gov) who can use it to attempt to understand the endless letters that pour out of DNA sequencers.

   The sheer scale of the enterprise is astounding. The #10 paper by the Dennis A. Benson group–Benson insists on sharing credit, telling Science Watch "it's humbling to be the first author, an honor I got by virtue of the alphabet"–enumerates 690,000 new sequences in the previous year, with more than 1 billion bases from 1.6 million sequences. The database was doubling in size every 18 months, but lately the doubling time has fallen to 15 months. (The whole thing used to be available on a CD-ROM, but ease of access across the Internet, coupled with the fact that the database now spans several CD-ROMs, prompted the NCBI to abandon that format.)

   Lest you think that this is no more than a story of some big machines and clever programs, Benson tells Science Watch that there are highly trained staff, known as sequence annotators, who review the entire input stream of sequence data. "Approximately 20 individuals, with masters and Ph.D.s in molecular biology...are responsible as the gatekeepers for the database, reviewing records and interacting with scientist-submitters."

   What is more, the NCBI is more than just a data warehouse. "A lot of computational biology research goes on here," says Benson. "It's not informatics for just informatics’ sake–we try to be very biologically oriented in the computer work we do." To what extent people use the additional facilities is not yet clear, but something like Entrez, which integrates DNA sequence information with published references, taxonomic information and protein data–all over the Internet–is clearly very handy.

   The #10 paper is actually the second in what has become a series of annual reports from GenBank. The latest, third, version (Nucleic Acids Research, 27:12-17, 1999) reads in part like boilerplate, updated on the fly. Where the 1998 version added 690,000 sequences, by 1999 the figure was 770,000. Likewise, 1999's edition holds 1.6 billion bases from 1.6 million sequences. And the number of species represented jumped from 30,000 to 40,000.

   Only one figure is significantly down: the number of complete genomes currently being sequenced of which GenBank is aware. In 1998 it was 32, but the 1999 paper mentions only 20. As it happens 10 complete genomes were added in 1998 (compared to two in 1996 and six in 1997) so the drop in organisms being sequenced probably represents a real move away from the smaller "model" genomes and into the "real" genomes of mouse and human. Even now the Delphic injunction to "know thyself" is fully reflected in GenBank's memory; well over half of all the sequences come from our own, singular species.

   The role of GenBank and other sequence databases in the "genome revolution" has been largely ignored. In a sense that is as it should be. Engines that power enquiry are taken for granted no less than engines that power automobiles; as long as they work, who cares exactly how? From time to time, though, it is good to peer beneath the hood and pay tribute to a fine piece of engineering.

   Oh, and there's another complete sequence at #11. It doesn't cite GenBank.End

div-330x7--.gif (872 bytes)
Science writer Dr. Jeremy Cherfas
works with the Biotechnology and Biological Sciences
Research Council of the U.K., Swindon.

Science Watch®, May/June 1999, Vol. 10, No. 3
Citing URL: http://www.sciencewatch.com/may-june99/sw_may-june99_page8.htm

Search | May/June 1999 Index | Archives | Contact | Home

What's New in Research - (Updated weekly) - What's NEW in Research
The Most-Cited Researchers in...
  |  Analysis Of...  |  Site Map by Field | ! QUICK SCIENCE !
Alphabetized List of All Essential Science Indicators Editorial Features/Interviews


Science Watch® is an editorial component of Essential Science Indicators. RSS Feeds for Essential Science Indicator's editorial Web sites
Visit other editorial components of ESI: "in-cites" and "Special Topics."
Write to the Webmaster with questions or comments about this site. Terms of Usage.
View all the products of the Research Services Group from Thomson Scientific.


(c) 2008 The Thomson Corporation.
Thomson Scientific