Geoffrey J. Barton talks with
ScienceWatch.com and answers a few questions about
this month's New Hot Paper in the field of Computer
Science.
Article Title: Jalview Version 2-a multiple sequence
alignment editor and analysis workbench
Authors: Waterhouse, AM;Procter, JB;Martin, DMA;Clamp,
M;Barton, GJ
Journal: BIOINFORMATICS, Volume: 25, Issue: 9,
Page: 1189-1191
Year: MAY 1 2009
* Univ Dundee, Sch Life Sci Res, Coll Life Sci, Dow St,
Dundee DD1 5EH, Scotland.
* Univ Dundee, Sch Life Sci Res, Coll Life Sci, Dundee
DD1 5EH, Scotland.
* Broad Inst, Cambridge, MA 02142 USA.
Why do you think your paper is highly
cited?
The paper describes the latest version of the
Jalview multiple sequence alignment editor and
analysis workbench. Jalview is one of the most powerful tools available
for manipulating sequence alignments and integrating annotations from
biological databases around the world. As a consequence, many thousands
of scientists make use of Jalview in their daily work.
The Jalview software is installed on over 20,000 computers worldwide and is
also available as an applet that is installed on over 100,000 web pages
including those run by major international databases of sequence alignments
such as Pfam. Analysis of alignments by Jalview is often of
key importance in a scientific publication and so this leads to
citations of the paper describing Jalview.
Sequences of DNA, RNA, and proteins are the fundamental currency of modern
biological and medical research. Sequences link the different levels of the
biological hierarchy, from gene to three-dimensional structure.
"Sequence analysis is central to all modern biological
research, whether in agriculture, biotechnology, or the
study and treatment of human disease. Jalview is in use
daily by scientists working in all these fields and, since
it makes it possible for them to work more efficiently, has
direct impact on the many social and political issues that
their research influences."
Multiple Sequence Alignments (MSAs) arrange sequences that are similar as a
table that highlights which amino acids or nucleotides are common across
all sequences. For proteins, MSAs permit the identification of common
features between species or identify functionally important amino acids.
MSAs provide the basis for a spectrum of computational methods, including
the prediction of protein secondary structure and solvent accessibility,
functional sites, and interaction sites. MSAs are also the essential first
step in studying molecular evolution and are core to the identification of
genomic rearrangements.
In journal publications, MSAs provide a convenient way to display common
features and complex annotations relating to sequences and their functions.
Although there are many programs that generate multiple alignments from
unaligned sequences, none give a perfect result in all circumstances.
Jalview provides a convenient way to generate alignments by a variety of
methods and then to edit them to correct errors or choose the most
informative subsets of sequences.
Does it describe a new discovery, methodology, or
synthesis of knowledge?
The paper describes significant updates to the Jalview system. Updates
include more sophisticated editing functions and new visualization methods,
the ability to recall computer-intensive alignment and analysis methods on
remote servers from within the program and provide access to over 50
different types of annotation provided by computer servers worldwide. These
new enhancements, together with updates to core file management and format
conversions, have made the program useful to a larger number of potential
users.
Would you summarize the significance of your paper
in layman's terms?
Medical and biological research is producing enormous quantities of data
about the DNA and protein molecules that make up all living organisms and
are central to understanding function and disease. While generating data is
getting easier and cheaper, the volume of data presents big problems for
scientists to visualize, edit, and analyze.
This paper describes a powerful tool for visualizing, aligning, and
analyzing very large sets of DNA and protein sequences. It is effectively a
specialized word processor, web-browser and desktop publishing package for
sequences rolled into one.
The Jalview software described in this paper makes it easier to carry out
common analyses on biological sequences, but most importantly makes
possible analyses that would otherwise be too difficult or impossible to
do.
How did you become involved in this research, and
were there any problems along the way?
By the mid-1990s we had worked for more than 10 years on the generation and
analysis of protein multiple sequence alignments
(view). Visualization of alignments was always a
problem, so we first developed a program ALSCRIPT (1) that allowed
flexible annotation of a static alignment:
(view).
However, to work more efficiently, we required an interactive tool to edit
alignments and display the results of some of our other techniques such as
AMAS (2) sub-family analyses
(view), and JPred (3) secondary structure predictions
(view).
Jalview was developed as a successor to ALSCRIPT, although ALSCRIPT still
has a strong following. The main problem we encountered was the
continuation of funding for Jalview. However, recent initiatives by the UK
Biotechnology and Biological Sciences Research Council (BBSRC) have been
friendly to the project. Thanks to BBSRC, core funding is now secure until
2014.
Where do you see your research leading in the
future?
Next-generation sequencing technology has become available over the last
three years and has led to an explosion in the volume of sequence data
available. This volume of data presents significant challenges for
alignment visualization and analysis. Already, some protein families have
over 100,000 members and this will be the norm for most families of
proteins within the next five years.
Accordingly, we will be developing Jalview to work efficiently with such
large sequence families as well as longer sequences such as complete
genomes. We will also be making it easier for other scientists to add new
features to the program that are specific to their needs.
Do you foresee any social or political
implications for your research?
Sequence analysis is central to all modern biological research, whether in
agriculture, biotechnology, or the study and treatment of human disease.
Jalview is in use daily by scientists working in all these fields and,
since it makes it possible for them to work more efficiently, has direct
impact on the many social and political issues that their research
influences.
Geoff Barton, Ph.D.
The Barton Group
Professor of Bioinformatics
College of Life Sciences
University of Dundee
Scotland, UK Web
References:
Barton, GJ, "ALSCRIPT - A Tool to Format
Multiple Sequence Alignments," Prot. Eng. 6, 37-40,
1993.
Livingstone, CD and Barton, G, "Protein
Sequence Alignments: A Strategy for the Hierarchical Analysis of
Residue Conservation", Comp. Appl. Bio. Sci. 9, 745-56,
1993.
Cole, C., Barber, JD and Barton, GJ, "The
JPRED 3 Secondary structure prediction server," Nucleic Acids
Research, doi: 10.1093/nar/gkn238, 2008.
KEYWORDS: SECONDARY STRUCTURE; STRUCTURAL BIOLOGY; PROTEIN SEQUENCES;
ACCURACY; TOOLS; SYSTEM.