| ISB’s Ruedi Aebersold on the Challenges of Proteomics |
There are biological systems that are inspiring in their complexity,
and there are systems that not only inspire awe but threaten to surpass
the pale of the imagination. The proteome of even single-cell organisms
would fall into the latter category, which leaves an insufficiency of
words to describe the complexity of the human proteome. Nonetheless,
biologists now talk about deciphering this constantly changing,
near-infinite complexity of interrelating proteins to be found in human
cells the way they talked about decoding the human genome a couple of
decades ago. And just as they needed new technologies to elucidate the
sequence of human DNA, they will assuredly need new technological
capabilities to decipher the secrets encoded in our proteome.
|

"I think
that within five years we will be able to do clinical studies
using proteomic approaches," says Ruedi Aebersold,
currently of the Institute for Systems Biology, Seattle,
Washington. |
|
Photo
by Manuello Paganelli.
© All rights reserved. |
This effort has made proteomics the newest buzzword in molecular
biology, with entire companies and academic institutions springing up to
take on the challenge. At the leading edge of this endeavor is Swiss
biologist Ruedi Aebersold of the Institute for Systems Biology in
Seattle, Washington. Last year, in one bimonthly update of the Hot
Papers database, Aebersold fielded six hot reports published over the
preceding two years, and in the last decade he has published 25 papers
that have each been cited more than 100 times. He is a double threat,
with as many high-impact articles characterizing the nature of proteins
themselves as articles describing the technologies that make the
elucidation of protein, sequence, structure, and function possible. His Nature
paper, "Molecular characterization of mitochondrial
apoptosis-inducing factor," has received more than 900
citations in barely five years, while his seminal paper
"Quantitative analysis of complex protein mixtures using
isotope-coded affinity tags," published in Nature Biotechnology,
has racked up more than 600 (see table).
Aebersold, 50, received his undergraduate degree in biology from the
University of Basel in Switzerland in 1979 and his Ph.D. in biology from
the same institution four years later. Through 1988, he worked in Leroy
Hood’s laboratory at the California Institute of Technology, and then
spent five years at the University of British Columbia in Vancouver as
an assistant professor and senior investigator at the Biomedical
Research Center. In 1993, he joined the University of Washington, where
he was a full professor and associate director of the Science and
Technology Center for Molecular Biotechnology until 2000, when he became
a co-founder of the Institute for Systems Biology. Later in 2004 he will
move to the Swiss Federal Institute of Technology in Zurich.
Aebersold spoke to
Science Watch from his office at the ISB.
You
first started sequencing proteins 20 years ago at Caltech. What did you
see as your goal at the time?
The goal was always to sequence one protein that
was isolated to virtual homogeneity and to do that at ever-increasing
sensitivity and precision. This was the mindset that always applied. You
would isolate a protein, purify it, and then sequence it. And we learned
to do it better and better. Eventually, around 1990, mass spectrometry
became really useful for the analysis of proteins. That was due to a
number of innovations—in particular, the development of ionization
sources, which was actually awarded the Nobel Prize in chemistry in
2002. So all of a sudden we could generically ionize relatively large
molecules such as peptides and proteins. There was also a convergence at
the time of various advances having to do with our ability to analyze
data: a convergence of computer science, engineering, physical
chemistry, and instrumentation.
For
the uninitiated, could you explain how mass spectrometry works and about
the important about ionization?
The simplest possible way to think of mass
spectrometry is as a balance that measures the mass of a molecule in an
ionized form. So you have to attach a proton or an electron to your
protein or peptide, then bring it into a gas phase in a high vacuum.
This is what this ionization method allowed us to do. You can’t read
out the sequence of peptide from the intact ion, but advanced
instruments developed during that time in different laboratories gave us
the ability to fragment a peptide ion. Basically, you accelerate the
protein or peptide ion and run it into a wall of glass. This shatters
the protein or peptide into fragments, then you use the mass
spectrometer to measure all the masses of these fragments. From these
masses we can then figure out the amino acid sequence of the peptide.
How
do you go from masses to protein sequences?
It’s via a somewhat lucky coincidence that
these peptide ions don’t fragment at random. They always fragment with
a certain order, and that order is reasonably well understood. It’s
not like when you throw a piece of glass against the wall and it
fragments randomly. If you were to fragment the same peptide 1,000
times, you would get essentially the same fragments each time. Once you
understand the rules of this fragmentation process, you can start to
piece together the amino acid sequence of the peptide from the masses of
the fragment ions.
Can
you give us an idea of how protein sequencing alone has advanced since
the mid-1980s when you started at this game?
If you were to sequence a protein using the
chemical methods developed in the 1980s, it would take about a day to
generate a sequence. With the mass spectrometer and today’s
technology, we can do about one per second. Once we had the mass
spectrometry methods better established, we realized we didn’t have to
sequence proteins one a time but could work with large numbers of
proteins simultaneously.
When
did the idea of sequencing every protein in the human body start
percolating into molecular biology as a potential reality?
High-Impact Papers by Rusdi Aebersold,
Published Since 1999
(Ranked by total citations)
| Rank |
Paper |
Citations |
| 1 |
S.A. Susin, et al., "Molecular
characterization of mitochondrial apoptosis-inducing factor," Nature,
397(6718): 441-6, 1999. |
950 |
| 2 |
S.P. Gygi, et al., "Quantitative
analysis of complex protein mixtures using isotope-coded affinity tags,"
Nature Biotechnology, 17(10): 994-9, 1999. |
615 |
| 3 |
S.P. Gygi, et al., "Correlation
between protein and mRNA abundance in yeast," Mol. Cell. Biol.,
19(3): 1720-30, 1999. |
531 |
| 4 |
T. Ideker, et al., "Integrated
genomic and proteomic analyses of a systematically perturbed metabolic
network," Science, 292(5518): 929-34, 2001. |
295 |
| 5 |
S.P. Gygi, et al., "Evaluation of
two-dimensional gel electrophoresis-based proteome analysis technology,"
Proc. Natl. Acad. Sci. USA, 97(17): 9390-5, 2000. |
272 |
|
|
|
That actually goes way back to the 1970s. It was
an idea a little ahead of its time, pushed by Leigh Anderson and Norman
Anderson, father and son. They were advocating a concept they called the
"human protein index," and they wanted to essentially
establish a database that would describe every human protein. It didn’t
happen, simply because it wasn’t feasible at the time. It still hasn’t
happened, but now it’s getting feasible.
Does
it help to have the human DNA sequence available?
Absolutely. Now, rather than having to do a de
novo sequence for every protein every time, you can take the
information obtained from the mass spectrometer and correlate it with
the human-genome database, using database search algorithms. Without
these sequence databases, the field of proteomics would still be in the
dark ages.
There
seems to be a wide spectrum of definitions for the term "proteomics."
How do you define it?
All of the definitions floating around center on
one idea, which is to identify and analyze all the proteins expressed by
a cell or a tissue. For me, the key to proteomics is the systematic
analysis of proteins present in a sample. Rather than picking out one
protein or a few, you want to systematically measure and analyze all
proteins. In our group we want to do these measurements quantitatively.
We can then use the quantitative results to detect the differences in
various proteomes—for instance, between the proteome of a healthy cell
and a diseased cell, or an activated cell and a resting cell. And to do
that we need very accurate quantification of all the proteins in the
cell.
Where
do isotope-coded affinity tags come into this?
Well, with the mass spectrometer we are very good
at identifying the proteins present and we can do it in a very short
time, maybe a few seconds per protein. What we’re not as good at is
providing quantitative information on this protein, unless we apply a
certain trick. This trick is the isotope-coded affinity tag, and it’s
basically what my colleagues and I contributed to the field. The idea is
to generate two molecules that are chemically the same but which can be
differentiated based on their mass. This is easily done by incorporating
heavy stable isotopes in one molecule and light isotopes in the other.
So we generate two molecules that are effectively indistinguishable by
their polarity or anything else, except their mass. Then, since the
molecules are in every other respect identical, we can assume that the
signal detected from the heavy or light form of the particular molecule
in the mass spectrometer is a true representation of its abundance. This
trick is called stable isotope dilution, and it allows us to turn the
mass spectrometer into a quantitative device. And that gives us a
generic method to study the change in protein profile in proteomes of
different cells or tissues.
What
are the major bottlenecks hindering the advance of proteomics?
Right now every technique has its bottlenecks, or
its limitations. One obvious one at the moment is in our capacity to
analyze data. We can now generate huge amounts of data, and currently
there is an enormous challenge to figure out how to actually analyze
this data and generate real biological insights. That’s not to say
that we don’t need better and faster ways to actually acquire the
data, but in practical terms the major bottlenecks today are related to
data analysis. The whole idea behind the Institute for Systems Biology
is to create an environment where computer scientists and biologists and
the people who collect data can work closely together, so they can
develop the necessary analytical tools that will help interpret the data
and help put them in biological context. I think that’s absolutely
necessary and, actually, it’s been very successful so far.
What
do you think are realistic accomplishments that we can expect from this
proteomics/systems-biology effort in the short term, say, the next five
years?
Well, in five years I hope that we will be able
to very easily and very quickly and with very good precision analyze
proteomes of any complexity. I think that within five years, for
instance, we will be able to do clinical studies using proteomic
approaches. Say you go out and collect blood samples or sera samples or
spinal fluid samples or tissue samples from biopsies of a large number
of patients and controls, and then you go through the samples and
extract new insights into which proteins might be of diagnostic value
for a particular disease. I think that’s entirely feasible and will
probably happen within the next few years. That’s a near-term goal and
it is achievable.
Another goal that I think is not quite so near-term but is certainly
achievable or, at least, worth the attempt, is to use proteomics to
eventually get enough information that we can understand how different
biological processes are controlled within the cell and how they
interconnect with each other. Eventually I think we will have sufficient
data that we can construct computer models of how cells operate.
Are
you talking about one pathway within a cell, or all pathways?
All pathways. That is really an explicit goal of
proteomics. That’s what we mean when we talk about systems biology.
The idea is to go beyond one pathway to study the interconnectedness of
different pathways concurrently active in a cell. There is lot of
evidence indicating that cells don’t operate in isolated pathways and
processes, but that all processes are interlinked and co-regulated. For
example, it makes absolutely no sense for a cell to start dividing if it
doesn’t have sufficient energy to do that, and doesn’t have
sufficient molecules, say, to build new DNA strands, which are a
requirement for cell division. So cell division is coupled to
availability of energy, to availability of precursors to build DNA,
proteins, and other biopolymers, etc. So the cell evolves checkpoints to
make sure a particular status has been reached, and only then will it
proceed. This is a clear indication that processes that we traditionally
studied as isolated, independent pathways are actually all connected and
interdependent in a living cell. That’s what systems biology and
proteomics and also genomics are trying to address: Can we get a global
overview that shows us all this interconnectivity and dependency?
The problem, of course, is the data-analysis challenge that I
mentioned, which is enormous. Then there are the specific challenges
solely related to proteins. So far I’ve just discussed the analysis of
proteins as polypeptides. But proteins are actually more than that. The
essence of a protein is a polypeptide chain. But these are modified and
processed in complicated ways as they are synthesized. We are just
starting to see techniques that allow us to also systematically study
the processing and modification events to which proteins are subjected—glycosylation,
phosphorylation, lipid attachments to the proteins. More than 200
different types of modifications have been described in the literature.
It will be extremely important to study these modifications on something
like a proteomic scale and in a quantitative way. How do these
modifications change over the lifetime of the proteins and how do they
affect their function? Methods to ask such questions are just beginning
to appear.
Then proteins interact with other molecules, with DNA, with other
proteins and lipids. There is a huge amount of important information
contained in these interaction maps and protein networks, and, again,
techniques to systematically analyze these interactions are just
beginning to be useful or developed. And so there are enormous technical
challenges that will have to be solved in this field.
It
sounds like the more you achieve, the greater the data-analysis problems
you create. Is that accurate?
Yes, data analysis will get infinitely more
complicated. Somehow we have to figure out how to take the data that
result from all these different types of measurements and integrate them
together, and that’s largely an unsolved problem. On the other hand,
the most fun thing about this work is that we are constantly doing
things that have never been done before or that we never thought were
possible.
Science
Watch®, May/June 2004, Vol. 15, No. 3
Citing URL:
http://www.sciencewatch.com/may-june2004/sw_may-june2004_page3.htm |
|