Monday, January 05, 2009

CS: A small discussion with Dror Baron


Following up on the previous entry, I asked Dror Baron some questions on how the matrices used in that paper compared to other sparse measurement matrices:
In your paper, you mention "similar matrices were also proposed for CS by Berinde and Indyk [34] (as featured in this wiki).", can you enlighten me about the actual differences between the two approaches ? Piotr also indicated to me that their table featured here contained all the bounds for sparse recovery. How do your results fit in that table ?
Dror responded with:
...First, let me tell you about LDPC codes. These channel codes come very close to approaching Shannon's channel capacity. They have had tremendous theoretical and practical impact, and are used in all kinds of wireless systems. The goal is to reliably communicate a message, which consists of bits, from the encoder to the decoder. The encoder side multiples the input (a bit-vector) by a binary matrix that is dominated by zeros and has a small percentage of ones (over the binary field, only 0 and 1 values are allowed). The sparsity of the matrix is very useful for fast encoding. In the decoder, LDPC codes construct a bipartite graph that relates the "measurements" (channel output) and estimated transmitted bits. Messages are iteratively relayed between the two parts of the graph, where each message corresponds to the marginal distribution of one of the nodes. This technique is called "belief propagation" and used in our current work.

My research at Rice was heavily influenced by information theory:

1. Distributed CS: our models contain information theoretic concepts such as "sparsity rate" (the number of large coefficients scales with the size of the problem) and "measurement rate" (number of measurements also scales linearly).

2. CS reconstruction: we identified LDPC codes as something that we could emulate for CS. First we introduced "sudocodes" for decoding strictly sparse signals from noiseless measurements; in that paper, the CS encoding matrix is an LDPC matrix. Next, in the current paper (preliminary results appeared in a technical report from 2006) we consider more general signal and noise models and use belief propagation for decoding.

3. Bounds on the number of measurements ("Measurements vs bits" paper, [Note from igor, the video presenting this work is here]) use the source-channel separation principle of information theory.

Indyk-Berinde also used an LDPC encoding matrix, but the decoding technique is different. Their papers in early 2008 use ell_1, their more recent papers use other techniques too. Wainwright, Wei, and Ramchandran also use similar encoding matrices. These matrices may have poor RIP performance, yet they nonetheless reconstruct well and can be used in very fast algorithms.

Relating to the table of Berinde, Gilbert, Indyk, Karloff and Strauss, I can definitely make the following statements:

a. Random (poor; see remark later)
b. Length: K*logN (good)
c. Encoder: N*logN (good)
d. Update time: each column contains R=O(logN) nonzero entries, and so an
update is O(logN) (good).
e. Recovery: N*log^2(N) (excellent)
f. Approx: I don't know if I can make such statements. Our Bayesian inference is not a geometric approach. That said, there might be geometric properties, because the Gaussian priors imply quadratic metrics. But a serious answer would require careful consideration.

Note that I believe that R=O(log(N/K)) can be used in our paper, thus providing (b) length K*log(N/K) (excellent); (c) encoder N*log(N/K) (excellent); and (d) update R=O(log(N/K)) (excellent).

My main comment related to the comparison between deterministic and random settings.

Take a look at our Figure 4 and how our technique reconstructs better than ell_1 and cosamp. Reasonable people use the information they have at their disposal to improve their performance. Deterministic isn't good if its reconstruction quality is poor. Our Bayesian ("random") approach may do a bit worse if it uses a wrong model (see our Figure 6). A broader perspective is important, there is no single answer to this. When you have statistical information about your signal you'll seriously consider a Bayesian approach, and our algorithm is good under that category.
I then pressed with:
You said: "...These matrices may have poor RIP performance, yet they nonetheless reconstruct well and can be used in very fast algorithms...". In the work of Berinde, Gilbert, Indyk, Karloff and Strauss, they define the RIP-1 condition as opposed to RIP-2 for these sparse matrices. Irrespective to having this property, I am personnaly wondering, how the Juditsky-Nemirovski (JN) condition fares in providing some measure on how good those matrices are.

Dror Kindly responded with:

...It seems that they are examining properties for a matrix to be good with ell_1 reconstruction. The ell_1 was the first breakthrough in CS recovery but not necessarily the best way to do things. While it is important to develop an understanding of why ell_1 works, it is even more important to develop algorithms that work even better. For example, I like papers like Wakin, Boyd and Candes (weighted ell_1) it gives better MMSE performance with less computation. Of course within the Bayesian setting I like our CS-BP too...


Thanks Dror!

Credit Photo: NASA/JPL/Space Science Institute. Saturn on Jan 2, 2009.

No comments:

Printfriendly