James Cole on a Tool for the Taxonomic Classification of rRNA Genes
Emerging Research Fronts Commentary, December 2011
Article: NaÏve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy
Authors: Wang, Q;Garrity, GM;Tiedje, JM;Cole,
James Cole talks with ScienceWatch.com and answers a few questions about this month's Emerging Research Front paper in the field of Microbiology.
Why do you think your paper is highly cited?
The paper presented the right tool at the right time, as emerging high-throughput sequencing technologies started to be applied to rRNA sequencing. These new technologies were able to produce the large numbers of rRNA sequences needed to address important questions in microbial ecology but few tools were able to handle the increased numbers of sequences.
Does it describe a new discovery, methodology, or synthesis of knowledge?
It describes a new tool, the Ribosomal Database Projects's (RDP) Naïve Bayesian Classifier, employing a machine learning classification method that's well known in computer science but not commonly applied in the field of microbiology at the time.
Would you summarize the significance of your paper in layman's terms?
"The paper presented the right tool at the right time, as emerging high-throughput sequencing technologies started to be applied to rRNA sequencing."
The ribosomal RNA gene is the most commonly used taxonomic marker gene in microbiology. Our work introduced a fast and accurate tool to automate the taxonomic classification of rRNA genes. It is able to handle the high-throughput available from modern sequencing technologies that is needed to answer important questions in microbial ecology.
How did you become involved in this research, and how would you describe the particular challenges, setbacks, and successes that you've encountered along the way?
RDP has provided data and analysis tools for researchers worldwide since 1992. We initially developed the RDP Classifier to handle increasing public sequences for the RDP monthly release. As sequencing technology improved, and as rRNA sequencing became a more established standard in bacterial characterization, the rate of rRNA sequencing outpaced our ability to process the sequences with the tools available at that time.
The tool was originally designed only for the 16S rRNA gene, but we have successfully expanded this tool to other genes. Since the tool can be easily retrained using a different taxonomic hierarchy and different types of molecular sequences, biologists have started to adapt the tool for their own needs.
Where do you see your research leading in the future?
Ever-newer sequencing methods produce more sequence data at lower cost, creating new analysis challenges. We are working to extend this tool and apply additional machine learning to more genes of special interest to microbial ecologists.
Do you foresee any social or political implications for your research?
Our tools are being used by scientists involved in areas as diverse as the environmental response to climate change and the role of commensal microbes in human health. We hope our tools will be able to play a small part in solving these important problems.
Center for Microbial Ecology
Michigan State University
East Lansing, MI, USA
KEYWORDS: NAÏVE BAYESIAN CLASSIFIER, rRNA SEQUENCES, BACTERIAL TAXONOMY, MICROBIAL COMMUNITIES, DATABASE PROJECT, GENE SEQUENCE, ALICYCLOBACILLUS, INFORMATION, DIVERSITY, PROPOSAL, SYSTEM, rDNA.