David Posada: Working With Sequence Data

Emerging Research FRonts Commentary, June 2011

David Posada

Article: jModelTest: Phylogenetic model averaging


Authors: Posada, D
Journal: MOL BIOL EVOL, 25 (7): 1253-1256, JUL 2008
Addresses: Univ Vigo, Fac Biol, Dept Genet Bioquim & Inmunol, Vigo 36310, Spain.
Univ Vigo, Fac Biol, Dept Genet Bioquim & Inmunol, Vigo 36310, Spain.

David Posada talks with ScienceWatch.com and answers a few questions about this month's Emerging Research Front paper in the field of Biology & Biochemistry.


SW: Why do you think your paper is highly cited?

Basically because it describes a useful computer application.

SW: Does it describe a new discovery, methodology, or synthesis of knowledge?

In fact it announces a new version of the program ModelTest, published in 1998, and cited more than 10,000 times since then. The main purpose of this program is to facilitate the statistical selection of models of nucleotide substitution, which are used in phylogenetics to calculate probabilities of change in multiple sequence alignments. Nowadays, model selection is routine in phylogenetic inference, and a number of people choose this software to carry out this task.

SW: Would you summarize the significance of your paper in layman's terms?

Figure 1:

Screenshot of the results for the Akaike Criterion Information (AIC) selection.

View larger image & complete description in tab below.

This paper describes a computer program which helps identify which model best explains change through time in a set of DNA sequences. Such models are used to estimate evolutionary relationships among species (i.e., phylogenetic trees), and associated divergence times, corrected genetic distances, or to test, for example, whether different lineages or sequences evolve at the same rate.

SW: How did you become involved in this research, and how would you describe the particular challenges, setbacks, and successes that you've encountered along the way?

I started worked on phylogenetic model selection during my Ph.D. thesis with Keith A. Crandall at Brigham Young University. Since then I have explored different methodological aspects of this whole process and potential applications. I could not say I have encountered particular challenges along the way. jModelTest is a straightforward piece of code and in fact most hard work is carried out by an external application, Phyml, written by Stephane Guindon and Olivier Gascuel.

SW: Where do you see your research leading in the future?

Like most people working with sequence data, I am changing scale, from working with few genetic regions to dealing with multigene datasets and/or whole genomes. That is, I am moving from phylogenetics to phylogenomics, which at the same time implies understanding how genomes change through time. The massive amount of data coming from next-generation sequencing (NGS) techniques is already providing many opportunities to reformulate old questions and to come up with new ones. At the same time, handling all these data will be very challenging, and we will have to come up with new, more efficient methods.

SW: Do you foresee any social or political implications for your research?

Not really. This is basic science and it will be useful only for other scientists interested in or using tools from molecular evolution and phylogenetics.End

David Posada
Facultad de Biologa
Campus Universitario
Universidade de Vigo
Vigo, Spain


ADDITIONAL INFORMATION:

KEYWORDS: MODEL SELECTION, LIKELIHOOD RATIO TESTS, AIC, BIC, PERFORMANCE-BASED SELECTION, STATISTICAL PHYLOGENETICS, NUCLEOTIDE SEQUENCES, MAXIMUM LIKELIHOOD, EVOLUTIONARY DISTANCES, INFORMATION CRITERION, MITOCHONDRIAL DNA, SELECTION, SUBSTITUTION, UNCERTAINTY, PERFORMANCE, ASSUMPTIONS.

 
Click tab above for larger view and description of the figure above.

Figure 1:

David Posada

Figure 1:

Screenshot of the results for the Akaike Criterion Information (AIC) selection. In the model names (first column) +I indicates a proportion of invariable sites and +G indicates gamma distribution rate variation among sites. Other columns are -lnL: negative log likelihood; K: number of estimated parameters; AIC: Akaike Information Criterion; delta: AIC difference; weight: AIC weight; cumWeight: cumulative AIC weight.

 

   |   BACK TO TOP