Michael Elad Talks About the field of Sparse Approximation

Fast Breaking Commentary, October 2010

Michael Elad

Article: From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images


Authors: Bruckstein, AM;Donoho, DL;Elad, M
Journal: SIAM REV, Volume: 51, Issue: 1, Page: 34-81, Year: MAR 2009
* Technion Israel Inst Technol, Dept Comp Sci, IL-32000 Haifa, Israel.
* Technion Israel Inst Technol, Dept Comp Sci, IL-32000 Haifa, Israel.
* Stanford Univ, Dept Stat, Stanford, CA 94305 USA.

Michael Elad talks with ScienceWatch.com and answers a few questions about this month's Fast Breaking Paper paper in the field of Mathematics.


SW: Why do you think your paper is highly cited?

There are several reasons for the high-citation count we get.

First and foremost, this paper is a review, providing a wide-angle view of a rapidly growing field of research, called "Sparse Approximation" or "Sparse and Redundant Representation Modeling." This topic has grown dramatically in the past decade, with many papers offering surprising theoretical results, interesting numerical algorithms, and many applications in various disciplines that benefit from this theory. Therefore, there are a lot of researchers, all around the world, who are directly interested in this field, and need this review paper to give them better context to their research work.

Second, the paper proposes an appealing and compelling story about the path from theory to applications in this field. We show that a long series of seemingly different papers in this arena are in fact all tied, serving one clear objective, which is modeling of low-dimensional data embedded in high-dimensional space.

An illustration of the performance obtained using this model for compression of face images - Work done by Ori Bryt and Michael Elad. The results obtained with the JPEG and JPEG-2000 algorithms are compared to the ones obtained using trained dictionaries and sparse approximation. The numbers appearing on each picture are the (per pixel root-mean-squared-) compression-decompression errors.
An illustration of the performance obtained using this model for compression of face images - Work done by Ori Bryt and Michael Elad. The results obtained with the JPEG and JPEG-2000 algorithms are compared to the ones obtained using trained dictionaries and sparse approximation. The numbers appearing on each picture are the (per pixel root-mean-squared-) compression-decompression errors.

View this and two additional images with descriptions in the tabs below.

Third, in 2006, a topic called "Compressive-Sampling" or "Compressed-Sensing" (CS) emerged from the above-mentioned field. It was conceived in parallel by David Donoho | see also (Stanford), and by Emmanuel Candes (Stanford, previously CalTech) and Terrence Tao (UCLA, Field's medalist). There is a lot of confusion between the general "Sparse Approximation" theory and CS that suggests a new sampling paradigm, as the two build on the same axiomatic assumptions about sparsity of data to be handled. CS has contributed a lot to the general interest in Sparse Approximation, which may also explain the high citations we get.

One last thing, and this has to be said: One of the authors of the paper is David Donoho. Prof. Donoho is a worldwide leading mathematician and statistician, and his work is followed closely by many researchers all around the world.

SW: Does it describe a new discovery, methodology, or synthesis of knowledge?

As a review work, the paper mentions and refines many recent discoveries (both theoretical and applicative) and puts order to them. The novelty of this paper is in the way these are connected to tell one coherent story about this field. This is important for researchers, as it clarifies the research activity covered by hundreds of papers, showing major trends and ideas.

SW: Would you summarize the significance of your paper in layman’s terms?

In a nutshell: You are given a set of 100 "numbers" obtained from some data source—it could be a piece of an image, a segment of an audio track, financial or traffic information, etc. The main idea in this field is the understanding or assumption that such set of numbers (and many like it from the same source) do not fill the 100-dimensional space. Rather, this data tends to concentrate on a much lower-dimensional manifold embedded in the 100-dimensional space.

It turns out that if we know this manifold shape, many tasks can be solved efficiently. These include cleaning of noise, filling-in missing values (interpolating), recovering the data form various degradations, compressing this information, recognition, detection of anomalies, separation, classification, and more. In the past two decades there is a growing interest in "manifold learning," i.e., methods that aim to grasp some knowledge about the data manifold from examples.

The field of "Sparse Approximation" offers a mathematical model that tries to describe this low-dimensional data structure. The model suggests a use of a dictionary—a set of prototype data vectors, each of length 100 in our case—which we refer to as "atoms." The model assumption is that every data vector from the source could, in principle, be described as a linear combination of FEW atoms from the dictionary. This way, we provide a "chemistry" interpretation for the data—the data vector is considered as a molecule, and it is decomposed into its few building fundamental elements.

The field of "Sparse Approximation" deals with ways to perform atom decomposition, namely finding the atoms building the data vector. This seemingly simple problem is in fact NP-Hard, and approximation algorithms are used to obtain possible solutions. One of the most surprising things in the field of "sparse approximation" is the ability to claim that such approximation algorithms are guaranteed to find the ideal decomposition, provided that there are only few atoms involved in the mixture.

Another key problem dealt with in this field is the identification of the dictionary from a large corpus data set, also known as dictionary learning. Beyond these, there is the obvious interest in ways to harness all this into applications in data, signal, and image processing.

SW: How did you become involved in this research, and how would you describe the particular challenges, setbacks, and successes that you've encountered along the way?

"...this paper is a review one, providing a wide-angle view of a rapidly growing field of research, called "Sparse Approximation" or "Sparse and Redundant Representation Modeling".

In the year 2000, David Donoho came to visit the Technion (Haifa, Israel) and gave a lecture on his findings in this emerging field. I could not attend this talk, but a good friend and close colleague of mine, Freddy Bruckstein (a co-author) attended it and told me about it with a lot of excitement. Freddy felt that one of the questions posed in the lecture was not answered well, and he suggested that we work on it.

At the time, I was working in a high-tech company, managing a team of 25 researchers and engineers (it was three years after my Ph.D. graduation). Still, I kept my interest in basic research and allocated time for it, and we started working together on this problem.

Half a year later, we had a new and exciting result, and we submitted it for publication. This was the first among many papers that I wrote in this field. In parallel, I felt that the work in the industry did not appeal to me anymore, and I decided to go for a post-doc, with the (overly optimistic) hope to take an academic position at the Technion later on.

Naturally, I contacted David Donoho and expressed my interest to work with him. I spent two amazing and very productive years in Stanford University, working with David Donoho, Gene Golub, and Peyman Milanfar (UC Santa Cruz), and much of what I am today as a researcher started back then, in that post-doc period.

SW: Where do you see your research leading in the future?

The knowledge accumulated so far is amazing, given the fact that this field is so young. I believe that there are three main tracks in which this field will probably expand in the new future:

1. Improving the model in order to better describe a given data set. Among other things, this will require introduction of connections between the atoms as part of the model, the ability to handle very high-dimensional data using multi-scale techniques. It is also very likely that we will see new and alternative models emerging, completing with the existing ones.

2. Dictionary learning is a prominent part of using this model. There are various dictionary learning algorithms, and much work is needed to theoretically understand their behavior and guarantees. On the more practical side, new algorithms will have to be developed to handle high-dimensional data. I expect a convergence between this task and wavelet theory and harmonic analysis.

3. New applications are entering this field every day, and with them, new theoretical and numerical ideas are brought to this field. In the past few years we see a use of sparse approximation in medical imaging, computer-vision, and machine-learning. This trend is important and will play a vital role in expanding this field and its effect.End

Michael Elad, Professor
Computer Science Department
The Technion - Israel Institute of Technology
Haifa, Israel


ADDITIONAL INFORMATION:

  • View a August 2006 Emerging Research Front commentary, and a March 2002 interview with David Donoho.

KEYWORDS: ROBUST UNCERTAINTY PRINCIPLES; LARGE UNDERDETERMINED SYSTEMS; MINIMAL L(1)-NORM SOLUTION; NONLINEAR APPROXIMATION; GREEDY ALGORITHMS; WAVELET-DOMAIN; LEAST-SQUARES; BASIS PURSUIT; OVERCOMPLETE REPRESENTATIONS; REDUNDANT REPRESENTATIONS.

 
Click the tab(s) above to view fgures and descriptions.

Figure 1:

This figures illustrates how the model would operate on image patches - every given patch is assumed to be a linear combination of few (three in this case) atom-patches. A dictionary of 256 atoms is used to describe every given patch this way.

Figure 1: This figures illustrates how the model would operate on image patches - every given patch is assumed to be a linear combination of few (three in this case) atom-patches. A dictionary of 256 atoms is used to describe every given patch this way.

Figure 2:

The field of "Sparse & Redundant Representation Modeling" is based on a simple underdetermined linear system of equations. The columns of D are the atoms, the vector x stands for the given data, and we are interested to find the sparsest possible solution to this system, i.e., the ones with the fewest non-zeros. This problem is NP-hard in general, but there are approximation algorithms with provable performance guarantees.

Figure 2: The field of "Sparse & Redundant Representation Modeling" is based on a simple underdetermined linear system of equations. The columns of D are the atoms, the vector x stands for the given data, and we are interested to find the sparsest possible solution to this system, i.e., the ones with the fewest non-zeros. This problem is NP-hard in general, but there are approximation algorithms with provable performance guarantees.

Figure 3:

An illustration of the performance obtained using this model for compression of face images - Work done by Ori Bryt and Michael Elad. The results obtained with the JPEG and JPEG-2000 algorithms are compared to the ones obtained using trained dictionaries and sparse approximation. The numbers appearing on each picture are the (per pixel root-mean-squared-) compression-decompression errors.

Figure 3: An illustration of the performance obtained using this model for compression of face images - Work done by Ori Bryt and Michael Elad. The results obtained with the JPEG and JPEG-2000 algorithms are compared to the ones obtained using trained dictionaries and sparse approximation. The numbers appearing on each picture are the (per pixel root-mean-squared-) compression-decompression errors.

 

   |   BACK TO TOP