GeneWise is a very robust method which has been around for about 10 years
now. Only about five years ago did we publish it correctly, which is
outlined in this paper. It handles many of the scenarios people want to
deal with robustly—such as that of homology-based gene structure
prediction in the presence of draft sequence data, i.e., where sequencing
error occurs at an appreciable rate.
Having been around for a while, a lot of groups are comfortable with its
method and so it also gets used a little out of place; I am embarrassed to
say that one person even runs it "because they like the way it formats the
output" and not for the algorithm. Bioinformatics tools always end up
having this behavior of groups being comfortable in how they run and quirks
of the program meaning that once successful they hang around for far longer
than you expect. But genewise is a fundamentally correct algorithm (if
rather slow).
Would you summarize the significance of your
paper in layman's terms?
We present two algorithms in this paper: GeneWise, which predicts gene
structure using similar protein sequences, and genomewise, which provides a
gene structure final parse across cDNA- and EST-defined spliced structure.
GeneWise is heavily used by the Ensembl annotation system and many other
places worldwide; Genomewise has never really been used for a serious
length of time.
Finding genes in genomes is a complex task, and one of our most informative
pieces of information is related genes in other organisms. GeneWise takes a
related gene (in fact its protein) and places it onto a new genome,
accounting for splicing and sequencing errors. The algorithm was developed
from a principled combination of hidden Markov models (HMMs). The genewise
algorithm is highly accurate and can provide both accurate and complete
gene structures when used with the correct evidence.
How did you become involved in this research
and were there any particular problems encountered along the
way?
I was originally trained (in my undergraduate days) as a Biochemist, but
moved quickly into bioinformatics. I published my first set of programs
(Pairwise and Searchwise)in 1994, while I was an undergraduate at Oxford,
and GeneWise is really this program written correctly.
I benefitted greatly from a year at Adrian Krainer's Lab at
Cold Spring Harbor Laboratory (CSHL), time spent
with Toby Gibson (at EMBL) and also at Iain Campbell's lab at Oxford. I
did my Ph.D. with Richard Durbin at the Sanger Institute, and have
collaborated with him since that time.
In 2000, I joined the European Bioinformatics Institute (EBI) as a Team
Leader, and I am now a Senior Scientist in the European Molecular Biology
Laboratory (EMBL is the parent organization of EBI) and as part of the EBI
Senior Management. I am best known as the head of the EBI side of the
Ensembl project and my group has recently merged with
Rolf
Apweiler's group spanning DNA and protein sequence data. My own
research continues to be focused on algorithms in bioinformatics.
Where do you see your research leading in
the future?
GeneWise is being replaced progressively with more modern schemes, in
particular Exonerate from Guy Slater in my group. Exonerate
basically can do most things that genewise does but 1,000-fold faster.
However, there is a lot of resistance, for understandable reasons, for
moving away from code systems which work in robust pipelines, which is
why genewise keeps on being used.
Ewan Birney, Ph.D.
Head of Nucleotide Data
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton, Cambridge, UK
Keywords: GeneWise, genewise algorithm, genomewise, gene structure, Ensembl
annotation system, hidden Markov models, European Bioinformatics Institute
(EBI), European Molecular Biology Laboratory, Ensembl project, Rolf
Apweiler, DNA, protein sequence data, Exonerate, Guy Slater.