In the ScienceWatch.com interview below,
Dr. Stephen Goff talks about his paper, "A draft
sequence of the rice genome (Oryza sativa L. ssp
japonica)" (Goff SA, et al., Science
296[5565]: 92-100, 5 April 2002). Currently, this
paper is the #2 Highly Cited Paper in the field of
Plant & Animal Science in Essential Science
IndicatorsSM from Thomson Scientific, with
949 citations to date. Dr. Goff’s record in the
database includes 17 papers cited a total of 1,474
times between January 1, 1997 and October 31, 2007. Dr.
Goff is a Senior Fellow at Syngenta Biotechnology Inc.
in Research Triangle Park, North Carolina.
Would you please sum up your 2002 Science
paper, "A draft sequence of the rice genome (Oryza sativa L. ssp
japonica)," for our readers?
Syngenta’s 2002 Science paper on the draft sequence of the
rice genome describes the creation and analysis of a DNA sequence dataset
covering greater than 99% of the rice genome. It also reviews the assembly
of that dataset into mapped sequences representing the majority of protein
coding genes. Rice was the first crop species to be sequenced to this
degree. The approach the authors used demonstrated that lower cost "draft"
sequencing methods are applicable to relatively large genomes.
Another group, the Beijing Genomics & Bioinformatics Institute (BGI)
used a similar approach to create the draft sequence of another rice
variety which was published in the same issue of Science. The
International Rice Genome Sequencing Project (IRGSP, an international
effort being executed in the public sector) was projected to be complete by
2008 and was using an approach that resulted in more complete and more
accurate coverage of the genome.
What were the major findings and implications from this
paper?
There were three major findings from this publication. The first was about
the technical approach chosen. Before this research project was executed
and published, it wasn’t clear if a random fragment "draft"
sequencing method could be used successfully on such a large genome. Prior
to this effort, the so-called "shotgun sequencing method" had only been
applied to smaller genomes, like those of single-celled organisms. This
publication brought the understanding that the majority of the genome,
regardless of size, could be sequenced and mapped using a random-fragment
approach at a significant cost savings over map-based approaches. Such
random-fragment approaches are now the most commonly used method in
sequencing large plant genomes.
"Rice was the first crop species
to be sequenced to this degree."
The second major finding was the fact that the gene number was considerably
lower than initially estimated for rice. Original estimates were based on
the limited amount of genome sequence information generated by the IRGSP on
rice chromosome 1. The predicted gene numbers reported in preliminary
project summaries on the internet were as high as 90,000. Our analysis
predicted the gene number to fall between 32,000 and 50,000 genes,
depending on informatics criteria and confidence levels used. IRGSP also
lowered the estimated number from 90,000+ genes in its early publications
to ~50-60,000 genes and lower as data became available and the final
publication of the map-based sequence was released. This discrepancy or
variation in gene number was later reported by Jeff Bennetzen’s group
to be due to counting repetitive elements as genes. Some specific
repetitive elements can pick up portions of genes and amplify them in the
transposition process.
A third important finding of this research was the similarity in the
classification of rice genes relative to Arabidopsis genes. The
two plants had a very similar distribution of genes in various functional
classes. In addition, the classification of transcription factors encoding
proteins that regulate gene expression was very similar between
Arabidopsis and rice even though it was distinct from other
sequenced species like nematodes, fruit flies, and yeast.
Our analysis suggested very similar genes would be found in all cereals,
and further suggested that the extent of colinearity of the cereal genomes
would be quite high, as predicted earlier from genomic mapping efforts. The
similarity between the genes of various cereal species has since been
supported by the fact that many genes, as well as regulatory regions,
retain their function when transferred between related crops.
What are the applications for rice genome data?
Corn, wheat, and rice represent approximately 70% of total global crop
production with over half a billion tons generated annually for each crop.
Needless to say, these are very important crops to mankind. Given a growing
human population, the yield of these crops will need to be improved on an
ever-decreasing acreage of cultivatable land. Currently the yield gains for
corn are around 1-2% annually, and these gains haven’t changed
significantly for several decades despite higher levels of investment and
manpower.
All cereals, such as corn, wheat, rice, and barley, share a common ancestor
and are therefore closely related in genes that control traits of interest
to producers and consumers. The genome of rice will help elucidate gene
function in all cereals and will facilitate more rapid adoption of
molecular breeding technology. Together these advances should help maintain
or accelerate the yield gains needed to keep up with the growing population
and the need for crop products in renewable energy developments.
How was this paper received by the community?
This publication was received by the community with mixed responses. The
rice genome sequence data from the Syngenta project was not immediately
released to the public, but instead donated to the IRGSP to allow the
public project to be accelerated and released as a single higher-coverage
assembly. The sequence was also provided to the BGI in Beijing at a later
date to allow them to analyze the assembly, compare the japonica
rice variety to the indica variety, and publish their results
(along with a deposit of all Syngenta raw data into the GenBank trace file
database). Individual academic researchers could gain access to the
Syngenta sequence assembly under a specific agreement.
Some academic researchers preferred that the data for the entire sequence
would have been submitted immediately to GenBank. Others understood the
approach in which a private company provided data and funding which
resulted in the acceleration of the availability of both draft and final
sequence data. In the end, the IRGSP effort was both altered to produce a
draft sequence and accelerated to completion several years ahead of the
planned completion date of 2008.
Both the Syngenta draft japonica sequence and the BGI
indica draft sequences were reported to be incomplete in the final
map-based sequence of the rice genome published by the IRGSP. This claim
has since been refuted by a recent study comparing the draft sequence with
the map-based sequence (Matsumoto T, et al., "The map-based
sequence of the rice genome," Nature 436[7052]: 793-800, 11 August
2005). Both the random fragment and map-based sequencing approaches have
specific inherent disadvantages. However the speed and cost-efficiency of
the random-fragment sequencing approach has made it the method of choice
for all new large genome projects since the completion of the human,
Arabidopsis, and rice genomes.
What initially sparked your interest in this line of
research?
The main driving force behind Syngenta’s interest in this project was
the lack of genome sequence information for basic crops at the time this
project was started in 1998. Even though the public rice genome project had
been initiated in the early 1990s, only 0.5% of the genome had been
released to public databases at the time this project was started, and even
less for corn and wheat. The desire to adopt molecular breeding technology
and enable reverse genetics approaches in cereals drove this project into a
high-priority position. Rice has a much smaller genome than corn, and its
genome is also very small compared to wheat, so it was chosen as the model
crop species for cereal genomics efforts. The commercial applications,
however, were mainly intended for corn.
Where have you taken this work since this paper? Where
do you see this research going in the next 10 years?
This work has been used within Syngenta to identify candidate genes and
molecular markers useful for commercial crop enhancement. It has also
allowed us to gain experience in cereal genomics with a smaller genome
model. The rice genome work helped Syngenta scientists develop specific
gene expression microarrays that have been used for hundreds of experiments
internally and with academic collaborators looking at development and
responses to the environment in various cereals of commercial interest.
Over the next 10 years the completed rice genome sequence (both the
Syngenta project as well as the public projects) will be used to help
identify genes from other cereals and validate how these genes function.
Since the genome of corn will be released in February of 2008, the
conserved regions of rice and corn will help in identification of the
complete set of genes present in these important cereal crops.
The conserved non-coding regions will be identified through comparative
genomics approaches. It is likely that the knowledge of these conserved
regions will change our understanding of what a plant gene really is, just
as studies of the human genome have done for mammals. The genome sequences
will also enable efficient whole genome expression technologies, enhanced
molecular breeding for improved traits, and more efficient forward and
reverse genetics. Genome sequences will serve a foundation role in many new
approaches toward elucidation of the role of genes in development and
environmental responses. The precise function of any given gene will still
remain very challenging and require a large research effort to complete,
but the genome sequence will certainly facilitate that
effort.
Stephen A. Goff
Senior Syngenta Fellow
Syngenta Biotechnology Inc.
Research Triangle Park, NC, USA
Dr. Stephen
Goff's most-cited paper with 949 cites
to date:
Goff SA, et al., “A draft sequence of the
rice genome (Oryza sativa L. ssp japonica),”
Science 296(5565): 92-100, 5 April 2002. Source:
Essential Science IndicatorsSM from
Thomson
Scientific.