## Michael Elad Talks About the field of Sparse Approximation

#### Fast Breaking Commentary, October 2010

Article: From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
Authors: Bruckstein, AM;Donoho, DL; |

Michael Elad talks with ScienceWatch.com and answers a few questions about this month's Fast Breaking Paper paper in the field of Mathematics.

**Why do you think your paper is highly
cited?**

There are several reasons for the high-citation count we get.

First and foremost, this paper is a review, providing a wide-angle view of a rapidly growing field of research, called "Sparse Approximation" or "Sparse and Redundant Representation Modeling." This topic has grown dramatically in the past decade, with many papers offering surprising theoretical results, interesting numerical algorithms, and many applications in various disciplines that benefit from this theory. Therefore, there are a lot of researchers, all around the world, who are directly interested in this field, and need this review paper to give them better context to their research work.

Second, the paper proposes an appealing and compelling story about the path from theory to applications in this field. We show that a long series of seemingly different papers in this arena are in fact all tied, serving one clear objective, which is modeling of low-dimensional data embedded in high-dimensional space.

An illustration of the performance obtained using this
model for compression of face images - Work done by Ori Bryt and Michael
Elad. The results obtained with the JPEG and JPEG-2000 algorithms are
compared to the ones obtained using trained dictionaries and sparse
approximation. The numbers appearing on each picture are the (per pixel
root-mean-squared-) compression-decompression errors.

View this and two additional images with descriptions
in the tabs below.

Third, in 2006, a topic called "Compressive-Sampling" or "Compressed-Sensing" (CS) emerged from the above-mentioned field. It was conceived in parallel by David Donoho | see also (Stanford), and by Emmanuel Candes (Stanford, previously CalTech) and Terrence Tao (UCLA, Field's medalist). There is a lot of confusion between the general "Sparse Approximation" theory and CS that suggests a new sampling paradigm, as the two build on the same axiomatic assumptions about sparsity of data to be handled. CS has contributed a lot to the general interest in Sparse Approximation, which may also explain the high citations we get.

One last thing, and this has to be said: One of the authors of the paper is David Donoho. Prof. Donoho is a worldwide leading mathematician and statistician, and his work is followed closely by many researchers all around the world.

**Does it describe a new discovery, methodology, or
synthesis of knowledge?**

As a review work, the paper mentions and refines many recent discoveries (both theoretical and applicative) and puts order to them. The novelty of this paper is in the way these are connected to tell one coherent story about this field. This is important for researchers, as it clarifies the research activity covered by hundreds of papers, showing major trends and ideas.

**Would you summarize the significance of your paper
in layman’s terms?**

In a nutshell: You are given a set of 100 "numbers" obtained from some data source—it could be a piece of an image, a segment of an audio track, financial or traffic information, etc. The main idea in this field is the understanding or assumption that such set of numbers (and many like it from the same source) do not fill the 100-dimensional space. Rather, this data tends to concentrate on a much lower-dimensional manifold embedded in the 100-dimensional space.

It turns out that if we know this manifold shape, many tasks can be solved efficiently. These include cleaning of noise, filling-in missing values (interpolating), recovering the data form various degradations, compressing this information, recognition, detection of anomalies, separation, classification, and more. In the past two decades there is a growing interest in "manifold learning," i.e., methods that aim to grasp some knowledge about the data manifold from examples.

The field of "Sparse Approximation" offers a mathematical model that tries to describe this low-dimensional data structure. The model suggests a use of a dictionary—a set of prototype data vectors, each of length 100 in our case—which we refer to as "atoms." The model assumption is that every data vector from the source could, in principle, be described as a linear combination of FEW atoms from the dictionary. This way, we provide a "chemistry" interpretation for the data—the data vector is considered as a molecule, and it is decomposed into its few building fundamental elements.

The field of "Sparse Approximation" deals with ways to perform atom decomposition, namely finding the atoms building the data vector. This seemingly simple problem is in fact NP-Hard, and approximation algorithms are used to obtain possible solutions. One of the most surprising things in the field of "sparse approximation" is the ability to claim that such approximation algorithms are guaranteed to find the ideal decomposition, provided that there are only few atoms involved in the mixture.

Another key problem dealt with in this field is the identification of the dictionary from a large corpus data set, also known as dictionary learning. Beyond these, there is the obvious interest in ways to harness all this into applications in data, signal, and image processing.

**How did you become involved in this research, and
how would you describe the particular challenges, setbacks, and
successes that you've encountered along the way?**

"...this paper is a review one, providing a wide-angle view of a rapidly growing field of research, called "Sparse Approximation" or "Sparse and Redundant Representation Modeling".

In the year 2000, David Donoho came to visit the Technion (Haifa, Israel) and gave a lecture on his findings in this emerging field. I could not attend this talk, but a good friend and close colleague of mine, Freddy Bruckstein (a co-author) attended it and told me about it with a lot of excitement. Freddy felt that one of the questions posed in the lecture was not answered well, and he suggested that we work on it.

At the time, I was working in a high-tech company, managing a team of 25 researchers and engineers (it was three years after my Ph.D. graduation). Still, I kept my interest in basic research and allocated time for it, and we started working together on this problem.

Half a year later, we had a new and exciting result, and we submitted it for publication. This was the first among many papers that I wrote in this field. In parallel, I felt that the work in the industry did not appeal to me anymore, and I decided to go for a post-doc, with the (overly optimistic) hope to take an academic position at the Technion later on.

Naturally, I contacted David Donoho and expressed my interest to work with him. I spent two amazing and very productive years in Stanford University, working with David Donoho, Gene Golub, and Peyman Milanfar (UC Santa Cruz), and much of what I am today as a researcher started back then, in that post-doc period.

**Where do you see your research leading in the
future?**

The knowledge accumulated so far is amazing, given the fact that this field is so young. I believe that there are three main tracks in which this field will probably expand in the new future:

1. Improving the model in order to better describe a given data set. Among other things, this will require introduction of connections between the atoms as part of the model, the ability to handle very high-dimensional data using multi-scale techniques. It is also very likely that we will see new and alternative models emerging, completing with the existing ones.

2. Dictionary learning is a prominent part of using this model. There are various dictionary learning algorithms, and much work is needed to theoretically understand their behavior and guarantees. On the more practical side, new algorithms will have to be developed to handle high-dimensional data. I expect a convergence between this task and wavelet theory and harmonic analysis.

3. New applications are entering this field every day, and with them, new theoretical and numerical ideas are brought to this field. In the past few years we see a use of sparse approximation in medical imaging, computer-vision, and machine-learning. This trend is important and will play a vital role in expanding this field and its effect.

**Michael Elad, Professor**

**Computer Science Department**

**The Technion - Israel Institute of Technology**

**Haifa, Israel**

ADDITIONAL INFORMATION:

- View a August 2006 Emerging Research Front commentary, and a March 2002 interview with David Donoho.

KEYWORDS: ROBUST UNCERTAINTY PRINCIPLES; LARGE UNDERDETERMINED SYSTEMS; MINIMAL L(1)-NORM SOLUTION; NONLINEAR APPROXIMATION; GREEDY ALGORITHMS; WAVELET-DOMAIN; LEAST-SQUARES; BASIS PURSUIT; OVERCOMPLETE REPRESENTATIONS; REDUNDANT REPRESENTATIONS.

**Figure 1:**

**Figure 1:** This figures illustrates how the model would
operate on image patches - every given patch is assumed to be a linear
combination of few (three in this case) atom-patches. A dictionary of 256
atoms is used to describe every given patch this way.

**Figure 2:**

**Figure 2:** The field of "Sparse & Redundant
Representation Modeling" is based on a simple underdetermined linear system
of equations. The columns of **D** are the atoms, the vector x
stands for the given data, and we are interested to find the
**sparsest** possible solution to this system, i.e., the ones
with the fewest non-zeros. This problem is NP-hard in general, but there
are approximation algorithms with provable performance guarantees.

**Figure 3:**

**Figure 3:** An illustration of the performance obtained
using this model for compression of face images - Work done by Ori Bryt and
Michael Elad. The results obtained with the JPEG and JPEG-2000 algorithms
are compared to the ones obtained using trained dictionaries and sparse
approximation. The numbers appearing on each picture are the (per pixel
root-mean-squared-) compression-decompression errors.