Science Watch® - Tracking Trends and Performance in Basic Research
July/August 2000


  continued from
GO TO: The Interviews

SWWhat exactly is PSI-BLAST, which seems to be the primary advance in the heavily cited 1997 paper?

Altschul: That is the most interesting aspect of that paper. It’s an acronym for "Position Specific Iterated" BLAST. And it's really, underneath, a quite different sort of program. Basically if you have a multiple alignment of a number of related sequences, and it's an accurate alignment, PSI-BLAST will make it much easier to find distant relatives. The reason is that you can look in a given column of that alignment and see that a certain residue is very highly conserved. You might see, for instance, that there’s always a glycine at a certain position. Whereas in another position, the original sequence might have glycine, but you see from the multiple alignment that virtually any other amino acid can go there. PSI-BLAST exploits this with a scoring system that, in the position where glycine is completely conserved, gives a very high score for aligning a glycine and a large negative score for everything else. Whereas in the position where there’s great variability, pretty much every residue gets a neutral score. That ends up being much more sensitive for finding related sequences.


High-Impact Papers by Stephen F. Altschul,
Published Since 1990
(Ranked by average citations per year)

Rank Paper Total
Citations
Average
cites
per
year
1 S.F. Altschul, et al., "Basic local alignment search tool," J. Molec. Biol., 215(3):403-10, 1990. 10,639 1,106
2 S.F. Altschul, et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucl. Acids Res., 25(17):3389-3402, 1997. 2,166 798
3 K.W. Kinzler, et al., "Identification of FAP locus genes from chromosome 5q21," Science, 253(5020):661-5, 1991. 909 104
4 G.D. Schuler, S.F. Altschul, D.J. Lipman, "A workbench for multiple alignment construction and analysis," Proteins, 9(3):180-90, 1991. 671 72
5 S.F. Altschul, et al., "Issues in searching molecular sequence databases." Nature Genetics, 6(2):119-29, 1994. 325 52

SW Is PSI-BLAST the only program with this capability?

Altschul: This idea has actually been around since the mid-1980s, and a lot of people have developed programs based on it. As I’ve said a number of times, PSI-BLAST is like the Model T Ford of this kind of sequence comparison, in that there were a lot of cars before the Model T, and perhaps even better cars, but the Model T was accessible to everyone. There have been a number of programs similar to PSI-BLAST, but they have tended to require a fair amount of expertise to use, and to take a long time to run. What we did with PSI-BLAST is to completely automate the process. With PSI-BLAST, you put in your sequence and run a regular BLAST search, which finds sequences likely to be related. Then the program constructs a multiple alignment, creates a position-specific scoring system from it, and searches the database again for more distantly related sequences. And this can be iterated an arbitrary number of times, with no user intervention.

SW Now that you’ve created what is essentially the most useful tool in biology these days, where do you go next?

Altschul: There are now a number of efforts by different groups, including our own, to build databases of multiple alignments or of patterns. So that rather than having to search a complete database of individual sequences, you will search a database of patterns or of domains. The hope is that the universe of protein domains is relatively small–that even though we keep getting more sequences from different organisms, the number of new domains or new patterns is not going to grow much. So rather than comparing your new sequence to all sequences known, you’ll compare it just to a database of domains. A number of people have already created such databases. And we have a recent paper in Bioinformatics on a program to search the sort of pattern that PSI-BLAST generates. [Note: see A.A. Schäffer, et al., "IMPALA: Matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices," Bioinformatics, 15(12):1000-11, 1999.] This program has already been used to analyze chromosome 4 of Arabidopsis.

SW How does this new program work?

Altschul: When you use PSI-BLAST to search a database, it generates Position Specific Scoring Matrices, which can then be built into a database of patterns. Then you just search one of these databases with a new sequence. One of the difficulties in doing this is curating the database. In a regular sequence database, you just keep throwing in new sequences, whereas with one of these pattern databases, you have to periodically go back and redo the patterns and try to consolidate them and so forth. It takes a lot of effort to keep up to date.

SW Is the NCBI working on such a pattern database?

Altschul: A couple have already been produced as research projects, and Steve Bryant is coordinating an effort to build one that will be maintained over the long term.

SW What’s your long-run prognosis for computational biology? Do you think you’ll still be working on sequence comparison five years from now?

Altschul: It’s not clear to me that I will be. It is a pretty well-plowed field at this point–although I've thought that in the past and people keep finding new things to do. On the other hand, there is a lot of excitement in trying to figure out how to analyze the expression data that is now being generated. And this is a virgin field. There are few really good ideas on how to analyze the data being generated by the new expression chips or expression arrays.

SW What do you mean precisely by the term "expression data"?

Altschul: It's generated to analyze which genes are turned on and off in different cells. There is now technology that allows you to look, for instance, at cancer cells versus normal cells, or normal cells versus cells exposed to certain drug, and see what genes are expressed–whether and to what degree they are making messenger RNA. You can analyze data for all the genes in the cell simultaneously. There’s a lot of excitement in this area, because by seeing how genes are regulated, you can hope to find which ones are important in different diseases or growth processes. There is a huge amount of expression data that’s going to be flowing from this, and people are just beginning to think about how to analyze it.

SW Have you been actively working on this?

Altschul: I have done some work on it. It is certainly interesting, and whether I get more involved really depends on whether I can come up with some good ideas.End of article


Science Watch®, July/August 2000, Vol. 11, No. 4
Citing URL: http://www.sciencewatch.com/july-aug2000/sw_july-aug2000_page4.htm

Search | July/August 2000 Index | Archives | Contact | Home

What's New in Research - (Updated weekly) - What's NEW in Research
The Most-Cited Researchers in...
  |  Analysis Of...  |  Site Map by Field | ! QUICK SCIENCE !
Alphabetized List of All Essential Science Indicators Editorial Features/Interviews


Science Watch® is an editorial component of Essential Science Indicators. RSS Feeds for Essential Science Indicator's editorial Web sites
Visit other editorial components of ESI: "in-cites" and "Special Topics."
Write to the Webmaster with questions or comments about this site. Terms of Usage.
View all the products of the Research Services Group from Thomson Scientific.


(c) 2008 The Thomson Corporation.
Thomson Scientific