One of the concerns with the microarray revolution is that researchers will generate reams of data but little understanding. How do you deal with that possibility?
Most-Cited Papers by Patrick O. Brown
Published Since 1995
(Ranked by total citations)
| Rank |
Paper |
Total
Citations |
| 1 |
M. Schena, et al., "Quantitative monitoring of gene expression patterns with a complementary DNA microarray," Science, 270(5235):467-70, 20 October 1995. |
1,013 |
| 2 |
J.L. DeRisi, V.R. Iyer, P.O. Brown, "Exploring the metabolic and genetic control of gene expression on a genomic scale," 278(5338):680-6, 24 October 1997. |
872 |
| 3 |
M.B. Eisen, et al., "Cluster analysis and display of genome-wide expression patterns," Proc. Natl. Acad. Sci. USA, 95(25):14863-8, 8 December 1998. |
662 |
| 4 |
J. DeRisi, et al., "Use of cDNA microarray to analyse gene expression patterns in human cancer," Nature Genetics, 14(4):457-60, December 1996. |
514 |
| 5 |
A.A. Alizadeh, et al., "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, 403(6769):503-11, 3 February 2000. |
458
|
| 6 |
M. Schena, et al., "Parallel human genome analysis: Microarray-based expression monitoring of 1,000 genes," Proc. Natl. Acad. Sci. USA, 93(20):10614-9, 1 October 1996. |
439 |
|
|
Brown: More data is good. I'm going to go off on a short tangent to make my point. If you look at the names of genes in the yeast genome, what you find is that the genes are given names based on whatever genetic screen was used when the gene was first discovered. So we have genes that, in retrospect, have completely ludicrous names because they happen to correspond to what the experimenter was looking for the first time someone saw a mutant phenotype for these genes. The yeast genome is rife with that kind of phenomenon. It's almost laughable. But it was perfectly sensible at the time. A simple experiment may seem easier to understand because its interpretation is framed in a simple, but completely misleading, way by the question we thought we were asking in the design of the experiment. So we're just recognizing that a lot of what we see in a given gene-expression experiment is likely to be misunderstood, or misinterpreted, if we only look at it in terms of the model that we had going into the experiment. But the important thing is we're doing it—we're collecting data in a uniform, systematic way, documenting it carefully, and interpreting it as thoughtfully as we can. And the broad vision is that by taking this kind of systematic approach, we're continually building our understanding and our ability to make sense of each successive experiment that we do.
Potential problem number two: with inexpensive microarrays proliferating, it seems likely that a lot of the data generated will be rife with artifacts, and that a lot of quick, dirty, and wrong papers will be published.
Brown: The studies that tend to really have lasting value tend to be the ones where the approach taken at the outset was systematic and the kind of data collected was broader than might have been dictated by a particular narrow question being asked, so that they provide a systematic framework for interpreting the results and recognizing potential artifacts.
How do you calibrate all the data so that what one laboratory produces can be fit into the mass of data generated by all the labs?
Brown: That’s a significant problem at many levels, and not only at the very basic level of the actual gene-expression measurements themselves and how you standardize them and get them in a common quantitative framework. At least there are some natural ways of addressing that problem. The bigger challenge comes with the efforts to develop gene-expression databases that can serve the entire scientific community in some useful way. There has never really been any obvious incentive or real tradition for standardizing the way people describe a lot of very basic biological phenomena. If you sequence a genome, it’s sufficient to say, in effect, "This is the strain of the organism, and it is available at this repository, if anyone wants to confirm the sequence." It doesn’t matter what medium you grew it in. There aren't that many parameters that are really important to how that sequence manifests itself, or to characterizing it in such a way that you can really properly interpret the data.
When you're talking about gene-expression data, however, you have to worry not only about the identity of the organism, but everything about the identity of the particular cell or tissue and its situation at the time you take the snapshot of gene expression. Its entire history is potentially relevant to understanding and interpreting what you see. The problem is, how do we come up with a systematic description of all the parameters that we must take into account? That's a tough problem, and it's something we really need to address. We have to develop some standardization of the vocabulary, and this isn’t just a semantic problem; it’s also a problem that comes from the limits of the knowledge we would need to properly specify all the things that can influence what we see in the gene-expression program. I'm not saying standards should be imposed, but it's an issue that the community has to figure out.
Lately, you've been working on developing protein
microarrays. Are you optimistic that the challenges can be overcome?
Brown: The simplest thing we should be able to achieve with protein microarrays is to have a tool that allows us to make quantitative measurements of abundance of all the proteins in a sample. The catch is that for a DNA or RNA molecule, if you want a specific binding reagent for assaying abundance, you just need the complementary sequence and, presto, you're done. For a protein, it's not so simple. There's no comparable magic trick that you can do to get it. It's much more of an ad hoc problem for each protein. At the moment we're just using this old-fashioned solution that nature has arrived at with billions of years of evolution: in other words, making high-affinity specific binding reagents for a protein by using the immune system and raising mono-specific antibodies. That’s just one approach, though, and there are certainly others. One thing you can say for certain is it's absolutely, completely doable, and the ability to make quantitative measurements of thousands of proteins in a single sample will be a relatively routine tool in laboratories and the clinic. We have a decent solution for every technical piece of the problem. That's not to say we have an optimal solution or that implementing it is trivial, but we don't need a new physical law or some fundamental new technology to know that we can make this
happen.
Science
Watch®, May/June 2002, Vol. 13, No. 3
Citing URL: http://www.sciencewatch.com/may-june2002/sw_may-june2002_page4.htm |
|