Archive ScienceWatch



Minoru Kanehisa talks with and answers a few questions about this month's Emerging Research Front in the field of Computer Science.
Kanehisa Field: Computer Science
Article: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs
Authors: Park, KJ;Kanehisa , M
Journal: BIOINFORMATICS, 19 (13): 1656-1663 SEP 1 2003
Kyoto Univ, Chem Res Inst, Bioinformat Ctr, Kyoto 6110011, Japan.
Kyoto Univ, Chem Res Inst, Bioinformat Ctr, Kyoto 6110011, Japan.
(addresses may have been truncated)

Why do you think your paper is highly cited?

First, I think our laboratory has established a tradition. We formulated the problem of predicting protein subcellular locations from the amino sequence information in the early 1990s. It was Kenta Nakai’s thesis work reported in two papers, which are still among the most cited papers in my laboratory. This time, Keun-Joon Park has made an elaborate extension in his thesis work. I am happy to hear that his paper is also highly cited.

Second, the prediction of subcellular locations is becoming practically more important. It is a hot area right now where more people are involved and more papers are being published.

Does it describe a new discovery, methodology, or synthesis of knowledge?

This represents a synthesis of knowledge taking place within my laboratory. Around 1990-1992 Kenta Nakai, who is now Professor in the Human Genome Center of the University of Tokyo, developed the original method to predict protein subcellular locations and the PSORT program. Then, in 2001-2002, Jean-Philippe Vert, who is a mathematician and is now Director of the Center for Computational Biology in the Ecole des Mines de Paris, stayed in my lab as a postdoctoral fellow and introduced kernel methods, including support vector machines, to various biological problems. It was therefore quite natural for Keun-Joon Park, who is now at the Korea National Institute of Health, to combine both works. His paper resulted from the culture of my lab where people from different backgrounds intermingle.

Would you summarize the significance of your paper in layman’s terms?

The Human Genome Project was completed in 2003, but better sequencing technologies still continue to be developed, enabling, for example, the sequencing of any individual genome. However, the DNA sequence information alone does not tell much about the phenotype, particularly as regards health conditions. Computational technologies need to be developed to fill the gap between sequence information and higher-level information in the cell or the organism.

This paper presents a new method for linking the amino acid sequence of a protein, which is encoded in the DNA sequence of a gene, to the subcellular location where the protein is transported after biosynthesis. Proteins are the main players involved in various cellular functions, and subcellular location is the key information required for the understanding of such functions.

How did you become involved in this research and were any particular problems encountered along the way?

This research is part of our effort to develop computational methods for analyzing biological sequencing data, which I began a long time ago—in the early 1980s, when I was working in the US and involved in the establishment of the GenBank database.

The particular problem of subcellular location prediction was started around 1990 because we were interested in identifying characteristic subsequence patterns, called motifs, and relating them to specific protein functions. In fact, sorting motifs are like tags which identify appropriate subcellular locations.

At that time, I was hoping to catalog all motifs and all motif-functional relationships. However, motifs represent sites of interactions with other molecules. I was soon more interested in developing computational methods for protein interaction networks and pathways, as exemplified by our Kyoto Encyclopedia of Genes and Genomes (KEGG) database project. I feel a bit regretful for not being as active in the area of protein motifs.

Where do you see your research leading in the future?

I would like to comment not only on the protein sorting problem, but also on bioinformatics technology development in general. The power of bioinformatics is the ability to integrate different types of data and knowledge at different levels: molecular, cellular, and organism levels.

Computational methods are already well developed for molecular-level information, being actively developed for cellular-level information, yet still need to be developed for organism-level information. We are definitely moving towards organism-level information, especially in the understanding of the causes and links to human diseases.

Do you foresee any social or political implications for your research?

Generally speaking, we used to develop bioinformatics resources and technologies just for the basic sciences. However, as we move towards higher-level information, we are becoming closer to the real world. The results of our research projects, especially the KEGG project, are made available through our website. We are making this site more accessible to the general public, integrating research-oriented information and more practical dialogue about diseases, drugs, and environmental compounds.

Minoru Kanehisa
Director and Professor, Bioinformatics Center
Institute for Chemical Research, Kyoto University
Uji, Kyoto, Japan

2008 : February 2008 : Minoru Kanehisa