Archive ScienceWatch

 ScienceWatch
Ramanathan Sowdhamini, Ponnuthurai Nagaratnam Suganthan, Ganesan Pugalenthi, Ke Tang & Govindaraju Archunan talks with ScienceWatch.com and answers a few questions about this month's New Hot Paper in the field of Computer Science.
Ramanathan Sowdhamini Article Title: A machine learning approach for the identification of odorant binding proteins from sequence-derived properties
Authors: Pugalenthi, G;Tang, K;Suganthan, PN;Archunan, G;Sowdhamini, R
Journal: BMC BIOINFORMATICS, Volume: 8, Year: no.-351 SEP 19 2007
* Natl Ctr Biol Sci, UAS GKVK Campus,Bellary Rd, Bangalore 560065, Karnataka, India.
* Natl Ctr Biol Sci, Bangalore 560065, Karnataka, India.
* Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore.
* Univ Sci & Technol China, Dept Comp Sci & Technol, NICAL, Hefei 230026, Anhui, Peoples R China.
* Bharathidasan Univ, Dept Anim Sci, Tiruchchirappalli 620024, India.

 Why do you think your paper is highly cited?

Olfaction is an essential mechanism in vertebrates and insects for behavioral response, survival, and reproduction. Odorant binding proteins (OBPs) play a major role in the olfaction and are thought to act as a carrier for odorants and carry odorant from the environment to the nasal epithelium in vertebrates and sensillar lymph in insects.


Coauthor
Ganesan Pugalenthi


Coauthor
Ponnuthurai Nagaratnam Suganthan


Coauthor
Ke Tang


Coauthor
Govindaraju Archunan

In addition, OBPs are also reported to be involved in many other functions such as odorant recognition, deactivation of odorants, etc. Presence of OBPs in the non-sensory tissues of insects suggests their non-sensory roles. Over the past three decades, OBPs have received much attention due to its multifunctional nature, diversity in sequence, and three-dimensional structure.

There could possibly be four main reasons for the high citation rate.

  1. Identification of OBPs is a challenging task, since OBPs show very low sequence similarity between species, or even within the same species.

  2. Our work describes a new methodology, named "regularized least squares classifier" (RLSC), for identification of OBPs and the prediction performance was examined using a leave-one-out cross-validation technique.

  3. The features used in this study are new and provide a significant contribution to the classification. These features can also be used for other similar sequence-based prediction works.

  4. Our method is quick and quite useful for the identification of new OBPs drawn from sequence databases.

 Does it describe a new discovery, methodology, or synthesis of knowledge?

OBPs are very diverse in sequence as well as in structure. For example, vertebrate OBPs share an 8-stranded ß-barrel, whereas insect OBPs contain an alpha helical barrel and six highly conserved cysteines. Sequence similarity based search methods such as PSI-BLAST and HMM methods fail due to poor sequence similarity. Thus, identification of OBPs is a difficult task and requires an efficient method, irrespective of the sequence similarity.

Our RLSC approach extracted knowledge from protein sequence in the form of features for machine learning algorithms. Each protein sequence was represented by 1,463 feature indices. The features used in this study are different from other sequence-based prediction works. Generally, the frequencies of dipeptides and tripeptides are obtained from 20 amino acids. In our study, we employed a new approach to derive such combinations. First, 20 amino acids were reduced into 11 groups on the basis of their physico-chemical properties. The di-peptide and tri-peptide combinations were derived from 11 groups. This also reduces the dimensionality of feature space.

From the computer science viewpoint, RLSC is a well-established technique whose history can be traced back to the early years of the 20th century. This work also shows how synthesis of knowledge from different areas (in our case, biology and computer science) can promote scientific research.

 Would you summarize the significance of your paper in layman's terms?

OBPs are important elements in olfaction mechanism. The biological functions of OBPs are still unclear and more OBP sequences are required for the complete understanding of the olfaction mechanism. This approach can be used to annotate and identify new OBPs from sequence databases. Since our approach does not depend on the sequence similarity, it is very useful for the identification of OBPs which have very poor sequence similarity. Our methodology can speed up the process of OBP annotation and subsequent relevant discoveries. The features used for the classification can be applied to other sequence-based prediction work.

 How did you become involved in this research, and were there any problems along the way?

Ramanathan Sowdhamini's group has a long-standing interest in distant relationships, superfamilies, and also in the development of sequence search strategies. The authors got involved due to our continuing interest in olfaction and OBPs that we have with Professor Govindaraju Archunan's laboratory in the Animal Science Department of Bharathidasan University. Furthermore, the visit of Ganesan Pugalenthi to Professor Ponnuthurai Nagaratnam Suganthan's laboratory brought the field of machine learning closer to the detection of odorant binding proteins due to the strong expertise of this lab in machine learning approaches.

Yes, the feature selection and improvement of accuracy was quite challenging to arrive at. The technique of RSLC for this application was quite novel and the detected gene products had to be carefully validated. We did not have role models in any of the three areas—these were some of the problems faced.

 Where do you see your research leading in the future?

Till now, identification of OBPs has been carried out using sequence similarity search methods such as PSI-BLAST, HMM, etc. Since OBPs are very dissimilar in sequence, successful identification can be achieved only by using methods that do not rely on sequence similarity. This work reports a well established RLSC method which is more accurate, fast, and efficient for the OBP identification. Considering the biological implications of OBPs, we believe more research attention will be diverted towards the identification of OBPs and of new OBP families. Newer methods for mathematically representing protein sequences may be proposed to further increase prediction accuracy

 Do you foresee any social or political implications for your research?

The complete understanding of OBPs and their role in olfaction mechanism requires more sequence data. Our work speeds up the OBP annotation process, which may help the medical community, particularly those individuals working in the fields of olfaction.

Insects have numerous uncovered mysteries such as metamorphosis, cyclical parthenogenesis, social behaviors, and a complicated chemical communication system. Some insect species, such as mosquitoes, plant hoppers, and aphids, may cause damage to human health or agricultural products; whereas others, such as the honeybee and the silkworm are valuable agricultural or industry resources for human life. Better understanding of odorant/pheromone binding proteins and their role in the chemical communication of insects is helpful not only to develop efficient strategies for the control of pests, but also for the progress of basic research in the life sciences. Knowledge of odorant binding and transport mechanism is also useful in olfactotherapy.

Prof. Ramanathan Sowdhamini
Computational Biology group
National Centre for Biological Sciences,
TATA Institute of Fundamental Research (TIFR)
UAS-GKVK campus
Bangalore, India

Prof. Ponnuthurai Nagaratnam Suganthan
Machine Learning Lab
School of Electrical and Electronic Engineering,
Nanyang Technological University
Singapore

Ganesan Pugalenthi
Research Staff
Machine Learning Lab
School of Electrical and Electronic Engineering
Nanyang Technological University
Singapore

Dr. Ke Tang
Nature Inspired Computation and Applications Laboratory (NICAL)
Department of Computer Science and Technology
University of Science and Technology of China
Hefei, Anhui, China

Dr. Govindaraju Archunan
Center for Pheromone Technology
Department of Animal Science
Bharathidasan University Tiruchirappalli,
Tamilnadu, India

KEYWORDS: SUPPORT VECTOR MACHINES; AMINO-ACID-COMPOSITION; SUBCELLULAR LOCATION PREDICTION; PHEROMONE-BINDING; ENSEMBLE CLASSIFIER; PATTERN-RECOGNITION; MULTIPLE SITES; DATABASE; CLONING; EXPRESSION.

Download this article



2009 : March 2009 - New Hot Papers : Ramanathan Sowdhamini, Ponnuthurai Nagaratnam Suganthan, Ganesan Pugalenthi, Ke Tang & Govindaraju Archunan