Darren Martin on RDP3, a Program for Analyzing Recombination
Fast Breaking Papers Commentary, December 2011
Article: RDP3: a flexible and fast computer program for analyzing recombination
Authors: Martin, DP;Lemey, P;Lott,
M;Moulton, V;Posada, D;Lefeuvre, P
Darren Martin talks with ScienceWatch.com and answers a few questions about this month's Fast Breaking Paper paper in the field of Computer Science.
Why do you think your paper is highly cited?
It describes the latest version of a nucleotide sequence analysis program that was already quite popular. This is the reason that the paper has had so many early citations (i.e., people were using the program before the paper was even published).
Does it describe a new discovery, methodology, or synthesis of knowledge?
The program described in the paper implements a variety of new sequence analysis methods but in 95% of cases these are not the reason the paper is being cited. It is being cited primarily because people are using the computer program for the older methods it implements. Lots of people find my program easier to use than other available programs that do approximately the same thing.
Would you summarize the significance of your paper in layman's terms?
The computer program it describes has, in a very no-fuss way, made available a variety of reasonably robust and useful methods for detecting and characterizing genetic recombination events in nucleotide sequences. When such recombination events go unaccounted for in nucleotide sequence analyses they can in many cases cause these analyses to yield misleading results. The computer program has a number of different tools for trimming down nucleotide sequence datasets so that they will contain only minimal traces or recombination.
"The computer program it describes has, in a very no-fuss way, made available a variety of reasonably robust and useful methods for detecting and characterizing genetic recombination events in nucleotide sequences."
So besides the program being useful to people who are simply interested in tracing patterns of nucleotide sequence transfer between organisms (which is, in essence, used to trace past patterns of sexual and para-sexual reproduction amongst these organisms), it also appeals to people who are only interested in recombination to the point that they would like to remove all traces of it from their datasets.
How did you become involved in this research, and how would you describe the particular challenges, setbacks, and successes that you've encountered along the way?
In ~1997 I was doing my Ph.D. on the evolution of a virus that it was believed might be capable of recombination. I found that there were no user-friendly tools for analyzing this important evolutionary process, so I decided to make one as a hobby. I completed the first version of a user-friendly recombination detection program (called RDP) in 2000 and, following continual work on the program since that time, have published two further updates: One in 2005 and another in 2010.
The hardest challenge was getting the initial 2000 paper describing the software published as an application note in Bioinformatics. As an enthusiastic amateur I lacked credibility in the field and it took a couple of back-and-forths between myself, the reviewers, a lot of head-scratching, and a few heated arguments with the editor of Bioinformatics before it was finally accepted.
Getting the first paper published is also almost certainly my biggest success. Besides providing some much needed credibility and emboldening me to carry on developing it, the recognition it provided spawned numerous very productive collaborations and ensured continuous funding of my work throughout the 2000s.
Where do you see your research leading in the future?
I will obviously continue developing my software and supporting it into the foreseeable future. I would like to make it work on much larger datasets. At the moment the upper cap is ~50 Megabases of sequence, but I'd like to increase this at least 100 fold to the point where it can be used to analyze full eukaryotic chromosomes. I'm also developing new tests that will hopefully be applied to figuring out the underlying causes of non-random recombination breakpoint distributions.
Do you foresee any social or political implications for your research?
Not directly. There are, however, loads of indirect implications, particularly in areas such as figuring out how novel diseases suddenly emerge from nature, why disease resistance suddenly breaks down in certain crop species, why drug-resistant viruses and bacteria evolve, how viruses like HIV are able to persistently evade our immune systems, and how to design vaccines, drug treatment strategies, and transgenic virus-resistant crops that will better endure pressures from constantly evolving pathogens.
Computational Biology Department
Institute of Infectious Diseases and Molecular Medicine
University of Cape Town
Cape Town, South Africa
KEYWORDS: RDP3, RECOMBINATION, SEQUENCE ALIGNMENTS, VIRUS, IDENTIFICATION, ALGORITHM, PATTERNS, GENOMES, MODEL.