WUSTL’s Richard K. Wilson on Genome Sequences and Their Payoffs
Scientist Interview: July 2011
A pufferfish and a platypus. Diatoms. Bacteria that cause tuberculosis, cholera, typhus, the flu, anthrax, and ulcers, to name but a few. E. coli, not surprisingly. A host of parasites and a score of viruses. The chimp, the orangutan, the cow, the chicken, the guinea pig, and the giant panda. Rats, roundworms, and baker’s yeast. The black cottonwood tree, the grapevine, the onion, and corn. Mosquitoes and, of course, the fruit fly. A standard poodle and a boxer named Tasha. The domestic cat and the African elephant. And the human.
Since geneticists started sequencing whole genomes in the 1990s, the number of fully sequenced genomes seems to be climbing exponentially, driven in large part by an equally exponential drop in cost. The original Human Genome Project cost some $3 billion to complete. Today whole human genomes can be sequenced for less than 1/10,000th of that amount, while bacterial genomes can be sequenced, and have been, by a suitably motivated graduate student.
With this technological revolution driving a scientific revolution, researchers have learned an enormous amount about the nature of living things and the evolutionary forces underlying health and disease. One researcher who has helped to lead the way in both these revolutions is Richard K. Wilson, director of The Genome Institute at Washington University in St. Louis. Wilson has been a major player in genetic analysis for a quarter-century, since he first started working with Leroy Hood, then at Caltech, on the first machines to automate what was then merely the art of sequencing genes.
According to Essential Science IndicatorsSM from Clarivate, Wilson currently ranks among the top ten most-cited authors in molecular biology & genetics, based on papers published over the last decade. Reflecting more recent work, his contribution to eight Hot Papers over the last two years resulted in his inclusion in this publication’s latest annual listing of "hot" authors earlier this year ( March/April 2011). In all, Wilson is the co-author of five papers with more than 1,000 citations each, including, not surprisingly, the 2001 Nature report on "The initial sequencing and analysis of the human genome" ( E.S. Lander (see also), et al., 409[6822]: 860-921, 2001), which has since garnered well over 8,000 citations. (See also the adjoining table of comparatively recent papers.)
Wilson, 52, received his B.A. degree in microbiology from Miami University in Ohio in 1981 and his doctorate in chemistry and biochemistry five years later from the University of Oklahoma. For the next four years he worked with Leroy Hood as a research fellow before moving to Washington University, where in 1993 he became co-Director of the Genome Sequencing Center and in 2002, director of The Genome Institute and a professor of genetics at the university.
Can you compare the vision you had of the future gene sequencing in the mid-1990s—when you were busy on the then six-year project of sequencing the C. elegans genome—to how it’s panned out since then?
I think the vision actually goes back to around 1989, when we started thinking about what we were going to learn if we sequenced the human genome. We knew back then that if you were interested in a particular disease—for example, cystic fibrosis, which Francis Collins had spent a lot of time on, it would be a decade-long enterprise and cost millions and millions of dollars to identify the gene. But we also knew that if we had an encyclopedia of the genome, it would give us economies of scale and would obviate the need for all the really expensive, time-consuming gene searches. And it has.
So if you just look at cancer genomics now, not only can we sequence entire human genomes, but we have the reference genome, and we’re able essentially to compare a patient’s genome against that reference and find all the genes that differ. We can compare a tumor and normal tissue, to find what mutations have occurred in the tumor that might be relevant to the course of the disease. So, in terms of what we had hoped to accomplish, that’s spot-on. And cancer is just one example. We’d figured that for any of the other human diseases for which there are causative genes, we would be able to use the reference sequence to look and discover for mutations in those, and that’s what’s happening.
Are there ways in which you’ve been disappointed by the progress, or by a shortfall in what you hoped to learn?
Highly Cited Papers by Richard K. Wilson and
Colleagues, Published Since 2005 (Listed by citations) |
||
Rank | Paper | Citations |
---|---|---|
1 | D. Altshuler, et al., "A haplotype map of the human genome," Nature, 473(7063): 1299-1320, 2005. | 2,345 |
2 | ENCODE Project Consortium ( E. Birney, et al.), "Identification and analysis of functional elements in 1% of the human genome," Nature, 447(7146): 799-816, 2007. | 1,066 |
3 | T. S. Mikkelsen, et al., "Initial sequence of the chimpanzee genome and comparison with the human genome," Nature, 437(7055): 69-87, 2005. | 696 |
4 | A. Siepel, et al., "Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes," Genome Res., 15(8): 1034-50, 2005. | 656 |
5 | L. Chin, et al., "Comprehensive genomic characterization defines human glioblastoma genes and core pathways," Nature, 455(7216): 1061-8, 2008. | 540 |
SOURCE: Thomson Reuters Web of Science®. |
I would say no. There are a lot of people who want to take shots at the human genome sequence and say it hasn’t paid off. I would disagree. I can think of a lot of examples of diseases that we are now able to better treat, better diagnose and, in some cases, cure, because of lessons that have come out of the genome sequences. No, I’m not disappointed at all. The one thing that’s disappointed me, and maybe I’m just a stodgy academician about it, is my feeling that a lot of people have gotten fame and fortune from genome sequencing, which has tainted it a little bit. But for most part, for biology and biomedical research, it has really paid off.
What’s more, we spent a decade evolving the technology and the computer tools to be able to complete the genome sequence the first time. And that spawned a whole lot of additional technology development that brought us to the point we are now, where we really can sequence human genomes in a month at a pittance, compared to the former cost. And the genome project should get credit for that.
What are the two or three most interesting genomes you’ve sequenced in your lab and, for lack of a better word, what are the "coolest" things you’ve learned?
The human genome has been our bread and butter, and our major focus now is on human health and disease. That reference sequence, which we continue to try to make perfect to this day, has really been where all the great stuff has come from. And there are a couple of other landmark genomes we’ve been involved in—the mouse genome, for instance, which has been very useful in allowing us to look at some of the genes we’ve discovered in the human genome and see exactly what happens when those genes are mutated. That’s been very powerful.
But the two I would put under the "coolest" classification are the chimpanzee and the platypus. The chimpanzee probably speaks for itself. You want to understand the genome that is evolutionarily closest to ours, and we found a lot of things we weren’t surprised to find and a few that we were. We’re still trying to explain just exactly how close the chimpanzee genome is to ours.
What surprised you about the chimp genome?
Well, there are regions of the human genome that are missing in the chimp, and regions of the chimp genome that are missing in the human. If you just compare the genome sequences that are shared, they’re 99% similar. But if you take into account bits of the human that are missing relative to the chimp and vice-versa, they’re only 97% similar.
Are these mostly coding or non-coding regions of the genome?
A little bit of both. We’re still trying to understand what these mean. We know about some important differences, biologically, between ourselves and chimps. Chimps don’t get epithelial cancer quite so easily, and so we’re still trying to figure out exactly what bits of the genome may or may not be involved in that. And chimps don’t get Alzheimer’s disease and other sorts of neurological disorders that we’re subject to. So there’s stuff to look at there.
And the platypus—what did you learn from that genome?
That’s a fun story. After the first English explorer found platypuses and sent a couple of specimens back to London, nobody believed they were real. They thought they were a taxidermist’s creation: bits of reptile and bits of bird and bits of mammal all put together. The interesting thing is that the genome essentially reflects that. You’ve got bits of sequence that look reptilian, and other bits that look very much like bird genes, some of which are involved in egg laying. And then, of course, you have genes that are mammalian. One of the most curious things about platypuses is that, instead of having two sex chromosomes like pretty much every other mammal, they have ten. They can have ten X chromosomes or they can have five X’s and five Y’s. Their Y chromosomes are much larger and more complex, in terms of gene content, than humans.
How do you now make sense of the relatively small number of genes in the human genome? This was considered something of a revelation, if not a disappointment, when the human genome was first sequenced.
We had this idea beforehand that the human genome would contain about 100,000 genes, and that was satisfying to people. Then we sequenced the worm and found that it only had about 20,000 or 25,000 genes. After the human genome was sequenced, the number of genes kept dropping and dropping. Right now we’re at somewhere around 21,000 or 22,000. People get all worked up about that: "How can we have fewer genes than a worm or a lower mammal?" And the answer is that basically the definition of "gene" has evolved along with the genome. Human genes are much more complex, so we really don’t need to have 100,000. We do just fine, and we can be much more complex organisms, with 25,000 genes than a worm that might have the same number.
"There are a lot of people who want to take shots at the human genome sequence and say it hasn’t paid off," says Richard K. Wilson, of Washington University, St. Louis. "I would disagree."
In a lesser mammal, there might be a single gene for one particular function. A higher mammal, meanwhile, might have multiple copies of that same gene, with each being slightly different. These genes have been duplicated and saved over millions of years of evolution. So there might be a gene that is similar to something in a lesser mammal but with added bits of coding regions, more exons, and it can splice into multiple forms, therefore obviating the need for several copies or several genes.
Your institute is involved in quite a few major sequencing projects, including the 1000 Genomes Project Consortium. What is the purpose of that project, and what other projects do you have going?
There are indeed quite a few large projects. One is the Cancer Genome Atlas Project, sequencing tumor genomes. Another is called the Human Microbiome Project., which essentially aims to use genomics to look at the population of bacteria in various bodily systems like the GI tract, and to examine how that population changes in health, in disease, in situations where nutrition is poor, in folks who are taking particular medications, etc.
The idea with the 1000 Genomes Project Consortium is that once we finished the first of the human genomes, we knew we had to look at many other human genomes to get some idea of the variation that exists in the population. If you’re going to start looking for genes that might represent heritable things like susceptibility to cancer, you need to know what the baseline is, what normal is. So the idea is to look across many ethnic groups, in many people, and sequence a thousand individual genomes. If we do that, we can start to develop that baseline catalog of variations. That’s the beginnings of it.
How would you describe what you’ve learned by sequencing cancer genomes, and has this led to clinical advances?
We’re developing this collection of little success stories. So, although we’d love to be able to say that we’ve looked at leukemia, breast cancer, lung cancer, prostate cancer, etc., and that there’s one overriding message that’s going to be beneficial for all types of cancer, we’re not there yet. Rather, all these different types of cancer have their own little secrets that we’re starting to learn.
If I had to pick one particularly significant finding, it would be in the leukemia work we’ve done so far (e.g., E.R. Mardis, et al., "Recurring mutations found by sequencing an acute myeloid leukemia genome," New Engl. J. Med., 361[11]: 1058-66, 2009). Perhaps because we started with leukemia, it’s the farthest along. We’ve been able to look at the most patients, more than for any other cancer type so far. For example, we know that for about 60% of the acute leukemia patients who come into the clinic, you can perform cytogenetics—i.e., you can look, under the microscope, at the chromosomes in their cancer cells—and there are no indicators as to whether they have a good-outcome or a poor-outcome case of disease. And therefore oncologists treat all of them with essentially the same chemotherapy regimen. But as an oncologist, you’d like to know if you’re going to be able to successfully treat the cancer with chemotherapy drugs alone, or if you need to go to the more expensive and risky option of bone marrow transplantation.
And the indication now is that the mutations we’ve discovered in a few key genes will tell us if the patient has a particularly high-risk form of leukemia. Simply by sequencing that gene in every new patient, we’ll be able to make a treatment decision available early, and the patient can either continue to treat with chemo drugs or take the big step to a bone marrow transplant. That’s where you want to be. Eventually you’d like to be able to deliver a targeted drug, developed for each different type of mutation. We’re just at the tip of the iceberg on this.
Last question: have you had your own genome sequenced?
I have not. I’m a bit resistant to it. Right now, I think there are only two reasons to have your genome sequenced. One is that you want to be able to brag about it, which I don’t. And the other is you have cancer and you’re trying to find some clue with which your oncologist might better treat your disease. We now know enough about many different types of cancer to the point that there are actionable findings that can come from genome sequencing. There’s a lot of work to do, but it’s really promising.