Science Watch® - Tracking Trends and Performance in Basic Research
July/August 2000



Sense from Sequences: Stephen F. Altschul on Bettering BLAST

GO TO: The Interviews  Nature has been kind to molecular biologists, or at least relatively so. Although their automated sequencers churn out streams of genetic code by the genome-full, little of the data would make sense without the simple fact that nature finds it easier to conserve genes and proteins, rather than all the time inventing new ones. The best method to identify the function of a gene or a protein is to find a related gene or protein, or an entire family, whose function is already known. That challenge is the kind that computer scientists have wrestled with for decades–identifying similarities in strings of data–and has spawned an entire field of computational molecular biology and a host of computer tools that are racking up citations by the thousands.

Stephen F. Altschul

"PSI-BLAST is like the Model T Ford of this kind of sequence comparison," says Stephen F. Altschul of the National Center for Biotechnology Information in Bethesda, Maryland. "There were a lot of cars before the Model T, and perhaps even better cars, but the Model T was accessible to everyone."

Foremost is a program known as BLAST (for "basic local alignment search tool"), published in 1990 by a collaboration of researchers led by the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland. Now cited more than 10,000 times (see the table on the next page, paper #1), the BLAST paper was the most highly cited paper published in the 1990s and is only in danger of being supplanted in the next decade by the 1997 paper describing the improved version of the program–Gapped BLAST and PSI-BLAST (next page, paper #2). Having enjoyed a long streak as the hottest paper in the biology Top Ten, from the summer of 1998 until its recent "retirement" (after passing Science Watch's two-year age limit on Hot Papers), this report is now at 2,000-plus citations and counting.

The original BLAST program was the brainchild of David Lipman, director of the NCBI, whose name, by virtue of seniority, appeared as the last author on the paper. The first author was Stephen F. Altschul, an NCBI researcher, who says he earned the position because the remaining authors were listed in alphabetical order. He was also first author on the PSI-BLAST paper, although in this case, he says, "I really coordinated the work and originated most of the ideas behind it."

Altschul, 43, graduated summa cum laude in mathematics from Harvard in 1979. After two years teaching in Rome, he returned to the Massachusetts Institute of Technology, where he got interested in sequence comparison, and worked predominantly with Bruce Erickson and Peter Sellers at Rockefeller University. After obtaining his doctorate in mathematics in 1987, Altschul took a post-doc with Lipman at the National Institutes of Health and moved over with him to the National Library of Medicine (NLM) in 1989, when Congress created the NCBI under the umbrella of the NLM. Since 1994, Altschul has been a senior investigator at the NCBI. 

Altschul spoke to Science Watch correspondent Gary Taubes from his office in Bethesda.

SW How exactly did BLAST get started, considering the wide array of collaborators?

Altschul: The work on it really began in the first few weeks we were here at the NCBI. We had a visiting scientist named Gene Myers, who was then at the University of Arizona and is now vice-president of informatics at Celera Genomics. He was working on some ideas about how to do fast sequence comparison, and was talking to David Lipman about it. Combining this with knowledge of some work I was doing at the time with Sam Karlin, of Stanford University, on the statistics of local alignments, David came up with the main algorithmic idea behind BLAST. He hashed it out with Webb Miller, a computer scientist at Penn State; Warren Gish, now at Washington University, did most of the actual implementation and added some important algorithmic ideas as well. In addition to elaborating the statistical issues, I wrote the paper and invented the acronym.

SW What did BLAST offer molecular biologists that FASTA, the existing sequence-comparison program, didn’t?

Altschul: When BLAST first came out, it did two things that FASTA didn’t, and FASTA did one major thing that BLAST didn’t: BLAST ran a lot faster than FASTA–probably three to four times faster. That was one key factor If you were searching a database, it might take ten minutes with FASTA and two minutes with BLAST. Since then, the times have remained more or less constant, because the size of the databases has grown as the computer speeds have increased.
   The other thing BLAST could give you were the statistics of the sequence comparison. So rather than just ranking things by score and more or less leaving it up to you to figure out when a match was significant and when it wasn’t, BLAST could give you a solid number to tell you which matches were worth looking at. The disadvantage was that the original BLAST didn’t allow gaps in the alignments. It tried to get around this by frequently producing several different alignments for a given pair of sequences. These would be of sections that didn’t include gaps. But FASTA might string them all together by allowing gaps in one sequence or the other. So people tended to use both programs, but they’d likely start with BLAST because it was much faster and was also running on the NCBI machines, making it very convenient to use.

SW Does the new version of BLAST take care of the gap problem?

Altschul: Yes. We figured out a way to allow gaps and to speed up the program at the same time. Meanwhile, FASTA, which is the work of Bill Pearson–and also, originally, David Lipman–added statistical analysis, so FASTA produces good statistics now.
continued


Science Watch®, July/August 2000, Vol. 11, No. 4
Citing URL: http://www.sciencewatch.com/july-aug2000/sw_july-aug2000_page3.htm

Search | July/August 2000 Index | Archives | Contact | Home

What's New in Research - (Updated weekly) - What's NEW in Research
The Most-Cited Researchers in...
  |  Analysis Of...  |  Site Map by Field | ! QUICK SCIENCE !
Alphabetized List of All Essential Science Indicators Editorial Features/Interviews


Science Watch® is an editorial component of Essential Science Indicators. RSS Feeds for Essential Science Indicator's editorial Web sites
Visit other editorial components of ESI: "in-cites" and "Special Topics."
Write to the Webmaster with questions or comments about this site. Terms of Usage.
View all the products of the Research Services Group from Thomson Scientific.


(c) 2008 The Thomson Corporation.
Thomson Scientific