What prompts a close look now? This, from the paper at #8: "We found a surprisingly high error rate in Swiss-Prot, where, for example, of the order of 7% of the Gram-positive entries had either wrong cleavage site position and/or wrong annotation of the experimental evidence." One molecular tool dissing another molecular tool, even in the most polite terms, is bound to evoke a fresh response, and so it did. Four of the 10 most-cited papers are about tools for making sense of molecular data. Sequences are now a commodity, to be obtained as cheaply and quickly as possible. Scientists add value by interpreting the sequence to make biological sense of it. Michael Zuker’s Mfold web server, at #1, predicts how DNA and RNA will fold up and hybridize, allowing researchers to design antisense sequences that will block particular messages, among other things. Torsten Schwede and his group created SWISS-MODEL at #3, which does a similar kind of job for proteins, predicting the structure of a protein from its amino acid sequence. At #10, Matthew Stephens and Peter Donnelly discuss different statistical methods for predicting the higher-level structure of the chromosomes from knowledge of the detailed sequence. And at #8, Søren Brunak and his colleagues at the Technical University of Denmark and at Stockholm University describe the latest improved version of their system for predicting the presence of genes that code for signal proteins. Brunak’s group teaches its software, called SignalP, to recognize signal peptides. Show it the sequence of several peptides known to be signals. Let the program work out their salient features. Now show it an unknown sequence and ask SignalP to decide whether it represents a signal or something else. The known signals that make up the lessons are obviously crucial. So the Scandinavian team went over a whole slew of purported signal sequences in databases such as Swiss-Prot (which also regularly features in the highly cited list) to make absolutely certain that they were what they claimed to be. Some were not, hence the mention of errors. Not that this is a big deal. Researchers do not get upset at this kind of checking, and the community as a whole gains from a more accurate dataset. It gains, too, from the new version of SignalP, which is now somewhat more adept at spotting signal sequences. To deal with issues about the accuracy of sequences, and especially of the annotations that make sense of the sequence, several research communities have created curated databases. Human curators check all the data submitted, so what the database lacks in quantity it more than makes up for in quality. But while a web search reveals many such curated databases, where are the peer-reviewed papers describing them? By rights, they ought to be highly cited, but don’t seem to be. Michael Zuker notes that his #1 paper is "the first and only paper describing in detail what was already a popular web site," originally launched in 1996. He added that he asks people to cite the article if they publish articles containing useful results obtained on the Mfold web server. The score so far is roughly a citation a day. But researchers are querying the Mfold server about 800,000 times a month, and many of those are student bioinformaticians whose teachers have set them problems to solve. When they are publishing, will they be citing the tools they have used? Or will we all be so habituated to the idea of web servers that make sense of sequences that there will be no point? Indeed, in the rapidly evolving world of web services and web publishing, will citation counts remain the best indication of a result’s importance? Dr. Jeremy Cherfas is Science Writer at the
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Search | Nov/Dec 2005 Index | Archives | Contact | Home
|
|
|
|
|
Science
Watch® is an editorial component of Essential
Science Indicators |
|
|
|
(c) 2008 The
Thomson Corporation. |