Minoru Kanehisa talks with
ScienceWatch.com and answers a few questions about
this month's Emerging Research Front in the field of
Computer Science.
Field: Computer Science Article: Prediction of protein subcellular
locations by support vector machines using compositions of
amino acids and amino acid pairs Authors:
Park,
KJ;Kanehisa
, M
Journal: BIOINFORMATICS, 19 (13): 1656-1663 SEP 1
2003
Kyoto Univ, Chem Res Inst, Bioinformat Ctr, Kyoto 6110011,
Japan.
Kyoto Univ, Chem Res Inst, Bioinformat Ctr, Kyoto 6110011,
Japan.
(addresses may have been truncated)
Why do you think your paper is highly
cited?
First, I think our laboratory has established a tradition. We formulated
the problem of predicting protein subcellular locations from the amino
sequence information in the early 1990s. It was Kenta Nakai’s thesis
work reported in two papers, which are still among the most cited papers in
my laboratory. This time, Keun-Joon Park has made an elaborate extension in
his thesis work. I am happy to hear that his paper is also highly cited.
Second, the prediction of subcellular locations is becoming practically
more important. It is a hot area right now where more people are involved
and more papers are being published.
Does it describe a new discovery, methodology, or
synthesis of knowledge?
This represents a synthesis of knowledge taking place within my laboratory.
Around 1990-1992 Kenta Nakai, who is now Professor in the Human Genome
Center of the University of Tokyo, developed the original method to predict
protein subcellular locations and the PSORT program. Then, in 2001-2002,
Jean-Philippe Vert, who is a mathematician and is now Director of the
Center for Computational Biology in the Ecole des Mines de Paris, stayed in
my lab as a postdoctoral fellow and introduced kernel methods, including
support vector machines, to various biological problems. It was therefore
quite natural for Keun-Joon Park, who is now at the Korea National
Institute of Health, to combine both works. His paper resulted from the
culture of my lab where people from different backgrounds intermingle.
Would you summarize the significance of your paper in
layman’s terms?
"The power of bioinformatics is
the ability to integrate different types of data and
knowledge at different levels: molecular, cellular, and
organism levels."
The Human Genome Project was completed in 2003, but better sequencing
technologies still continue to be developed, enabling, for example, the
sequencing of any individual genome. However, the DNA sequence information
alone does not tell much about the phenotype, particularly as regards
health conditions. Computational technologies need to be developed to fill
the gap between sequence information and higher-level information in the
cell or the organism.
This paper presents a new method for linking the amino acid sequence of a
protein, which is encoded in the DNA sequence of a gene, to the subcellular
location where the protein is transported after biosynthesis. Proteins are
the main players involved in various cellular functions, and subcellular
location is the key information required for the understanding of such
functions.
How did you become involved in this research and were
any particular problems encountered along the way?
This research is part of our effort to develop computational methods for
analyzing biological sequencing data, which I began a long time
ago—in the early 1980s, when I was working in the US and involved in
the establishment of the GenBank database.
The particular problem of subcellular location prediction was started
around 1990 because we were interested in identifying characteristic
subsequence patterns, called motifs, and relating them to specific protein
functions. In fact, sorting motifs are like tags which identify appropriate
subcellular locations.
At that time, I was hoping to catalog all motifs and all motif-functional
relationships. However, motifs represent sites of interactions with other
molecules. I was soon more interested in developing computational methods
for protein interaction networks and pathways, as exemplified by our Kyoto
Encyclopedia of Genes and Genomes (KEGG) database project. I feel a bit
regretful for not being as active in the area of protein motifs.
Where do you see your research leading in the
future?
I would like to comment not only on the protein sorting problem, but also
on bioinformatics technology development in general. The power of
bioinformatics is the ability to integrate different types of data and
knowledge at different levels: molecular, cellular, and organism levels.
Computational methods are already well developed for molecular-level
information, being actively developed for cellular-level information, yet
still need to be developed for organism-level information. We are
definitely moving towards organism-level information, especially in the
understanding of the causes and links to human diseases.
Do you foresee any social or political implications for
your research?
Generally speaking, we used to develop bioinformatics resources and
technologies just for the basic sciences. However, as we move towards
higher-level information, we are becoming closer to the real world. The
results of our research projects, especially the
KEGG project, are made available through our
website. We are making this site more accessible to
the general public, integrating research-oriented information and more
practical dialogue about diseases, drugs, and environmental compounds.
Minoru Kanehisa
Director and Professor, Bioinformatics Center
Institute for Chemical Research, Kyoto University
Uji, Kyoto, Japan