## Thursday, July 31, 2008

### Hyperspectral Non Negative Tensor Factorization and Dictionary Learning

Here is a use of Non Negative Tensor Factorization that does Dictionary learning. The dictionary is then compared to actual measurements performed by a hyperspectral telescope in Maui.

Tensor Methods for Hyperspectral Data Processing: A Space Object Identification Study

Qiang Zhang, Han Wang, Robert J. Plemmons, and V. Paul Pauca

An important problem arising in hyperspectral image data applications is to identify materials present in the object or scene being imaged and to quantify their abundance in the mixture. Due to the increasing quantity of data usually encountered in hyperspectral datasets, effective data compression is also an important consideration. In this paper, we develop novel methods based on tensor analysis that focus on all three of these goals: material identification, material abundance estimation, and data compression. Test results are reported for Space Object Identification (SOI).

Some part of this work was mentioned and explained before here. Not bad uh!

## Wednesday, July 30, 2008

### If you are looking at the Moon from this point of view, you are in a lot of trouble!

The whole video made by snapshot of Deep Impact/Epoxi looking back at the Earth-Moon system over a small period of time can be found here. Enjoy!

Via the Planetary Society blog.

Video Credit: Donald J. Lindler, Sigma Space Corporation and NASA/JPL-Caltech/GSFC/UMD

CREDITS: We extend special thanks to the EPOXI flight team at NASA's Jet Propulsion Laboratory for operation of the spacecraft, for design and implementation of the Earth observing sequences, and for downlink of the data. We also thank Lucy McFadden and Tilak Hewagama (UMD) and Timothy Livengood (USRA), for image deconvolutions and color co-registration on preliminary renderings of the lunar transit data.

## Monday, July 28, 2008

### CS: A 3D Illumination Compressive Sensing Camera System ?

Instead of using one photodiode in the illumination based Compressive Sensing Camera, what about using two or more single photodiodes at different locations so that they provide a 3D description of the Stanford Bunny being illuminated ? Since we are solving for A x_1 = b_1 for the first photodiode and A x_2 = b_2 for the second photodiode, then one can also solve for A x_1 = b_1 and A (x_2 - x_1) = (b_2 - b_1) and expect a much faster resolution for the second problem as x_2 - x_1 is likely to be very sparse as mentioned in Yin Zhang's presentation featured here.

## Friday, July 25, 2008

### CS: Exotic Imaging Sampling, Amazing Shots

As some of you may have noted, the images used in this blog are generally coming from the space exploration. While we go about daily earth life, we generally do not realize that some robot is doing something for us in other worlds at the exact same moment. And some times, some of these shots are breathtaking. But I am also interested in exotic sampling schemes (see CS: Coded Mask Imagers: What are they good for ? The George Costanza "Do the Opposite" Sampling Scheme) in hardware, mostly because they are certainly conducive to irregular sampling and maybe even Compressive Sampling.

Before Phoenix landed on Mars, it was caught landing by the Mars Reconnaissance Orbiter's HiRISE camera (more here). Emily Lakdawalla has a detailed explanation on how this amazing shot was taken. But more importantly, she and Timothy Reed provide a good description of the HiRISE FPA.

The HiRISE focal plane array is composed of a series of small array (small rows featured in the photo above) sensor that use the movement of the satelitte to aid in the pushbroom sampling technique. The interesting thing is the use of different arrays when only one would be OK.

At a meeting last april, Gwendoline Blanchet of CNES presented a similar set up of the SPOT 5 satellite camera where two arrays have been built with a similar displacement. Even though there is no need for these two arrays up there, it is a nice capability to know about in case one would want to try some different sampling scheme. And for what I hear, they are looking into it.

In a different direction is the commercial use of stereoscopic technology as mentioned in Making a Modern 3-D, Movie Journey to the Center of the Earth 3-D shows off Hollywood's most advanced technology by Kate Greene who features the 3d PACE cameras used in the latest 3D movie. A nice introduction to the constraints of commercial entities. I note the following:

For all the apparent trouble of making a 3-D movie, the industry is investing in the technology. Lowry notes that this year, there have been six 3-D movies, and next year, about 17 3-D releases are planned. "It's in a state of growth that's quite remarkable," he says.

Credit Photo: NASA / JPL / UA / courtesy of Timothy Reed (HiRISE Optical Integration Team) via the Planetary Society Blog.

## Thursday, July 24, 2008

### Going to the Moon...not landing on the Moon.

You have until tomorrow to go here and leave your name so that it leaves with the Lunar Reconnaissance Orbiter.
The spacecraft will then orbit the Moon but will not land on it. If History is any indication, at end of life, it will be de-orbited and crashed on the Moon like we did with the Lunar Prospector nine years ago. For those concerned with ETs putting their hands on the list, the list of names will probably not survive.

## Wednesday, July 23, 2008

### The Center of Attention.

Using Wordle, it looks like RIC and Recovery are central here.

## Tuesday, July 22, 2008

### CS: Gravimetric Detection by Compressed Sensing

Here is an inverse problem using sparsity as a constraint alone: Gravimetric Detection by Compressed Sensing by Marina Meila, Caren Marzban, Ulvi Yurtsever. The beginning of the article goes with:
This paper sets forth a novel approach for solving geospatial inversion problems using cutting edge techniques from statistics. Assume for instance that we want to obtain the mass density function in a given domain of interest from gravitational field measurements on the boundary of that domain. This is a well-studied and difficult problem. In all except a few special cases, the inverse problem has multiple solutions, and additional constraints (physical, or problem related) are needed to regularize it and to select a single, plausible solution.

## Monday, July 21, 2008

### CS: Learning Sparse Representations for Audiovisual Signals

Gianluca Monaci
and Friedrich Sommer, submitted a poster at Computational and Systems Neuroscience 2008 (COSYNE08) conference entitled: Learning Sparse Representations for Audiovisual Signals. They try to build a model of the early integration of cross modal fusion of audio and visual signals.

## Thursday, July 17, 2008

### Compressed Sensing Community: Expanding and Accelerating our Current Understanding

A while back, a discussion went on with one of you. Let me just cut and paste the important part of the exchange:

I was thinking about a more formal software and empirical performance database or repository. You already have quite a collection of software and matrices. Perhaps it is time to begin to formalize, as a community, a suite of experiments/tests upon which to evaluate these algorithms, matrices, etc. I don't know what kind of computing and server resources you have but if you're interested, I'd like to explore doing this.

After I suggested that Sparco was beginning to do some of the things the reader mentioned (this was a little earlier than June), here is what the reader responded:

As for a repository, framework, whatever, I think that Sparco is a good start but it's skewed a bit towards matlab, specific problem instances, and specific recovery tasks. Also, it doesn't serve as a repository for community knowledge. What if I want to know how a certain measurement matrix construction works for all k-sparse signals with a certain noise model, with a certain recovery algorithm? What if I want a more global view than just recovery of a standard image? Furthermore, I think that we need an independent third-party to "host" such a venture .... Everyone reads your blog (whether they admit it or not) but you don't have a research agenda or yourself to promote :) (at least not in the way that others do...) I do think that Sparco is a good start if we add the ability to

1. generate your own signal types
2. generate your own measurement matrices
3. execute your own reconstruction algorithms
4. specify, then record and disseminate, your own experiments
5. gather, analyze, and mine the data from the above different "axes"

Perhaps even a further fork into different application areas---imaging being quite different from 1d problems.
It may seem I am a neutral third party, but I am not really. The emphasis of this blog is mostly about understanding CS better and enabling others to know more about it in a speedy fashion. For researchers that are publishing, this blog should also be the location of the most up-to-date information. I agree with the reader that in order to deal with what I would call the combinatorial amount of possibilities for CS (as mentioned here), we probably need to have either SPARCO or a SPARCO like capability to take a stab at the most promising possibilities. Michael Friedlander has done an excellent job at differentiating SPARCO and SPGL1 for instance and has made SPARCO as agnostic as possible when it comes to using a different reconstruction algorithm. In the past two months, I actually think that SPARCO has gone in the direction the reader asked. For the moment, I have not seen yet is the implementation of the combinatorial algorithms. I am being told the reason is because they are written in C. I am sure that can be taken care of using some matlab wrapping.

Finally, let us also remind ourselves that some of the readers are students that have plenty of times in their hands and want to contribute to the subject without being necessarily in the locations where CS papers are being written or with advisors who know much about CS. Even if their contribution is writing a SPARCO script to do the equivalent of comparison between different reconstruction capabilities, I would be absolutely glad to advertize their work as much as possible.

Any thoughts ?

## Wednesday, July 16, 2008

I am probably going to be unreachable internet-wise for the next two weeks. If for some reason you have a paper/ talk /video you would like to have featured here, please update your webpages and the robots should do the detecting. I have written some entries that should appear automatically over the next two weeks.

Here is a blog of interest: The Alternative Scientist is a blog
about alternative career options for scientists. There are many career paths for a scientist in addition to the traditional tenure track, and the goal of this blog is to provide a forum for open and honest discussions about the various possibilities.

## Tuesday, July 15, 2008

### CS: a Video in Portuguese, a Summer School in French, a Tutorial in Switzerland, an MMDS talk and better 3d reconstruction.

Found on the internets:

A video entitled: Compressive Sampling: A new Paradigm in Signal Acquisition in Portuguese, actually it is Reconstrução de imagens subamostradas (compressed sensing) by Mário Figueiredo. It was added to the video section on Compressed Sensing.

A Summer school in France: École d'Été annuelle en traitement du signal et des images in the village of Peyresq will feature a presentation by Stéphane Chrétien entitled: Un decodeur 'two stage' pour le compressed sensing ( A two-stage decoder for Compressed Sensing).

Résumé : Le probleme de compressed sensing de Candes et Tao consiste a retrouver un signal qui admet une representation parcimonieuse dans un dictionnaire connu au moyen d'un nombre limite de mesures. Le decodeur de plus employe actuellement est la relaxation l1 du probleme combinatoire NP-difficile consistant a chercher le vecteur le plus sparse satisfaisant un systeme de contraintes affines. Nous proposons d'etudier une nouvelle relaxation Two Stage tres efficace pour le probleme de compressed sensing. Cette relaxation est aussi performante en pratique que l'approche dite Reweighted l1 de Candes, Wakin et Boyd (Journal of Fourier Analysis and Applications, a paraitre) mais a l'avantage d'etre gouvernee par un parametre de reglage a l'interpretation "physique" claire et tres facile a choisir. Si le temps le permet, on montrera comment obtenir des relaxations plus fines encore par dualite lagrangienne.

It looks like an extension of his paper: An Alternating l_1 Relaxation for compressed sensing.

At EUSIPCO, there will be a tutorial session. On Monday, August 25th from 2:30 pm to 5:50 pm, one of them will feature: Theory and Applications of Compressive Sensing by Richard Baraniuk. The smmary reads:

Sensors, cameras, and imaging systems are under increasing pressure to accommodate ever larger and higher-dimensional data sets; ever faster capture, sampling, and processing rates; ever lower power consumption; communication over ever more difficult channels; and radically new sensing modalities. The foundation of today's digital data acquisition systems is the Shannon/Nyquist sampling theorem, which asserts that to avoid losing information when digitizing a signal or image, one must sample at least two times faster than the signal's bandwidth, at the so-called Nyquist rate. Unfortunately, the physical limitations of current sensing systems combined with inherently high Nyquist rates impose a performance brick wall to a large class of important and emerging applications. In digital image and video cameras, for instance, the Nyquist rate is so high that too many samples result, making compression by algorithm like JPEG or MPEG a necessity prior to storage or transmission. In imaging systems (medical scanners and radars) and high-speed analog-to-digital converters, increasing the sampling rate is very expensive or detrimental to a patient's health.

Compressive Sensing is a new approach to data acquisition in which analog signals are digitized for processing not via uniform sampling but via measurements using more general, even random, test functions. In stark contrast with conventional wisdom, the new theory asserts that one can combine "low-rate sampling" with digital computational power for efficient and accurate signal acquisition. Compressive sensing systems directly translate analog data into a compressed digital form; all we need to do is "decompress" the measured data through an optimization on a digital computer. The implications of compressive sensing are promising for many applications and enable the design of new kinds of analog-to-digital converters, cameras, and imaging systems.

This tutorial will overview the theory of compresive sensing, point out the important role played by the geometry of high-dimensional vector spaces, and discuss how the ideas can be applied in next-generation acquisition devices. Particular topics include sparse signal representations, convex optimization, random projections, the restricted isometry principle, the Johnson-Lindenstrauss lemma, Whitney's embedding theorem for manifolds, and applications to imaging systems, sensor networks, and analog-to-digital converters.

Here is a new technique competing with Make3D previously mentioned here before. It is featured in Closing the Loop on Scene Interpretation by Derek Hoiem, Alexei A. Efros, Martial Hebert. Here is a 3D reconstruction video comparing it with the older Photo Pop-up and Make3D. Unlike Make3D, the software is not available, so we can't really check the issue of out-of-this world renderings.

A new presentation at MMDS 2008. Workshop on Algorithms for Modern Massive Data Sets just showed up. It is that of Inderjit Dhillon entitled Rank Minimization via Online Learning.

Credit: NASA/JPL-Caltech/University of Arizona/Texas A&M. Poppy fields, sol 49.

## Thursday, July 10, 2008

### CS: Restricted isometry constants where lp-minimization can fail for p above 0 and less or equal to 1, and a talk at Rice.

We investigate conditions under which the solution of an underdetermined linear system with minimal ℓp norm, $0 \lt p \le 1$, is guaranteed to be also the sparsest one. Our results highlight the pessimistic nature of sparse recovery analysis when recovery is predicted based on the restricted isometry constants (RIC) of the associated matrix. We construct matrices with RIC δ2m arbitrarily close to 1/√2 ≈ 0.717 where sparse recovery with p = 1 fails for at least one m-sparse vector. This indicates that there is limited room for improving over the best known positive results of Foucart and Lai (mentioned here), which guarantee that ℓ1-minimisation recovers all m-sparse vectors for any matrix with $\delta_{2m} \lt 2 (3 -\sqrt{2})/7 = 0.4531$ . Another consequence of our construction is that recovery conditions expressed uniformly for all matrices in terms of RIC must require that all 2m-column submatrices are extremely well conditioned (condition numbers less than 2.5). In contrast, we also construct matrices with δ2m arbitrarily close to one and δ2m+1 = 1 where ℓ1-minimisation succeeds for any m-sparse vector. This illustrates the limits of RIC as a tool to predict the behaviour of ℓ1 minimisation. These constructions are a by-product of tight conditions for ℓp recovery ($0 \le p \le 1$) with matrices of unit spectral norm, which are expressed in terms of the minimal singular values of 2m-column submatrices. The results show that, compared to ℓ1-minimisation, ℓp-minimisation recovery failure is only slightly delayed in terms of the RIC values. Furthermore in this case the minimisation is nonconvex and it is important to consider the specific minimisation algorithm being used. We show that when ℓp optimisation is attempted using an iterative reweighted ℓ1 scheme, failure can still occur for $\delta_2m$ arbitrarily close to 1/√2.

Remi Gribonval also sent me the summarizing comments on the subject:
The main contributions are as follows:

1) Weakness of the prediction of l1 recovery in terms of RIC

We highlight the pessimistic nature of the RIC to predict sparse recovery by constructing matrices with RIC delta_2m ~ 0.717 where l1 minimization fails to recover at least one m-sparse vector, as well as tight frames with RIC delta_2m ~ 1 where l1 is nevertheless always successful.

2) Tight characterizations of lp recovery

Furthermore, we obtain tight characterizations of the success of lp recovery, $0 \lt p \le1$, in terms of asymetric versions of the RIC. The results show that lp-failure is only slightly delayed compared to l1 failure.

3) Iterative reweighted l1

Last, we prove that when an iterative re-weighted l1 scheme is used to attempt to solve the non-convex lp-minimization problem, failure can still occur for an RIC arbitrarily close to 0.717.

While this may seem in contradiction with empirical evidence indicating the substantial benefits of iterative re-weighted l1 over plain l1, this suggests that the typical performance of these algorithms is not fully captured by a worst case analysis based on the RIC, and a more subtle coefficient dependent analysis is required to fully understand it.

Presentation at Rice (Thursday, July 17, 2008 4:00 PM to 5:00 PM 1049 Duncan Hall, Rice University, 6100 Main St, Houston, Texas) by Rachel Ward entitled: Cross Validation in Compressed Sensing via the Johnson Lindenstrauss Lemma.

Compressed Sensing decoding algorithms aim to reconstruct an unknown N dimensional vector x from m with an assumed sparsity constraint on x. All algorithms presently are iterative in nature, producing a sequence of approximations $( s_1, s_2, ... s_j, ...)$ until a certain algorithm-specific stopping criterion is reached at iteration J, at which point the estimate x* = $s_J$ is returned as an approximation to x. In many algorithms, the error || x - x* || measured in the Hilbert norm is bounded above by a function of the error between x and the best k-term approximation to x. However, as x is unknown, such estimates provide no numerical bounds on the error.

In this talk, we demonstrate that tight numerical upper and lower bounds on the error ||x - s_j || for $j <= p$ iterations of a compressed sensing decoding algorithm are attainable by simply reserving 4 log p of the original m measurements to cross validate the sequence of approximants s_j obtained from the remaining measurements, using as underlying tool the Johnson-Lindenstrauss lemma. As a result, a numerical upper bound on the error between x and the best k-term approximation to x can be estimated with almost no cost. Our observation has applications outside of Compressed Sensing as well.

Credit: NASA/JPL-Caltech/University of Arizona/Texas A&M. Before Scraping Wonderland as seen from Phoenix, Sol 45.

### CS: Efficient Compressed Sensing using Lossless Expander Graphs with Fast Bilateral Quantum Recovery Algorithm

Here is a new twist. Using Quantum Computing algorithm to speed up CS measurements and reconstruction. It is featured in the following article:

Efficient Compressed Sensing using Lossless Expander Graphs with Fast Bilateral Quantum Recovery Algorithm by Sina Jafarpour. The abstract reads:

Compressed Sensing is a novel approach to bypass the Nyquist sampling limits whenever the signals are sparse, and to compress them simultaneously. In this paper, improving our previous results, we will propose a compressed sensing algorithm based on the high-quality lossless unbalanced vertex expander graphs, with a fast and simple quantum decoding algorithm. Exploiting the unique neighborhood property of the lossless expander graphs in combination with the efficient quantum algorithm for finding distinct elements among a set we will prove the validity and efficiency of the algorithm and will show how the combination of the lossless expander graphs and quantum recovery algorithm leads to a general compressed sensing framework which is as good as the previous results in terms of the sketch size, encoding time, update time, and having explicit construction; furthermore, it is superior to the previous recovery algorithms in both the recovery time and simplicity of implementation. Finally we will show how the algorithm can be modified to be robust for a well-structured family of almost sparse signals. The robust algorithm will first find the position of the largest elements of the signal, and then using this information finds efficiently an explicit sparse approximation for the original signal.

According to Sina's website, we are going to see more on that idea with "Quantum Computers can make Compressed Sensing Faster" a paper under preparation.

## Tuesday, July 08, 2008

### CS: MMDS papers, a CS page, A2I Converters, CS in Space.

I just stumbled upon the MMDS 2008. Workshop on Algorithms for Modern Massive Data Sets that took place at Stanford University on June 25–28, 2008. All the abstracts are located here. All the presentations are here. Among the ones that caught my interest:

Piotr Indyk has a presentation entitled Sparse recovery using sparse random matrices/ Or: Fast and Effective Linear Compression where he presents a joint work with: Radu Berinde, Anna Gilbert, Piotr Indyk, Howard Karloff, Milan Ruzic and Martin Strauss that was partially shown in Anna Gilbert's presentation at the L1 meeting at Texas A&M). The interesting new part of this presentation is the recent result with Milan Ruzic where it is shown that if a measurement matrix follows RIP-1, then OMP converges. The previous results was on LP-decoding only working. The abstract reads:

Over the recent years, a new *linear* method for compressing high-dimensional data (e.g., images) has been discovered. For any high-dimensional vector x, its *sketch* is equal to Ax, where A is an m x n matrix (possibly chosen at random). Although typically the sketch length m is much smaller than the number of dimensions n, the sketch contains enough information to recover an *approximation* to x. At the same time, the linearity of the sketching method is very convenient for many applications, such as data stream computing and compressed sensing.

The major sketching approaches can be classified as either combinatorial (using sparse sketching matrices) or geometric (using dense sketching matrices). They achieve different trade-offs, notably between the compression rate and the running time. Thus, it is desirable to understand the connections between them, with the ultimate goal of obtaining the "best of both worlds" solution.

In this talk we show that, in a sense, the combinatorial and geometric approaches are based on different manifestations of the same phenomenon. This connection will enable us to obtain several novel algorithms and constructions, which inherit advantages of sparse matrices, such as lower sketching and recovery times.

Also, Anna Gilbert had a presentation entitled Combinatorial Group Testing in Signal Recovery. The abstract reads:

Traditionally, group testing is a design problem. The goal is to construct an optimally efficient set of tests of items such that the test results contains enough information to determine a small subset of items of interest. It has its roots in the statistics community and was originally designed for the Selective Service to find and to remove men with syphilis from the draft. It appears in many forms, including coin-weighing problems, experimental designs, and public health. We are interested in both the design of tests and the design of an efficient algorithm that works with the tests to determine the group of interest. Many of the same techniques that are useful for designing tests are also used to solve algorithmic problems in analyzing and in recovering statistical quantities from streaming data. I will discuss some of these methods and briefly discuss several recent applications in high throughput drug screening.
Joint work with Radu Berinde, Piotr Indyk, Howard Karloff, Martin Strauss, Raghu Kainkaryam and Peter Woolf.

By the way, it's not a rat, it's a Chef.

Tong Zhang, An Adaptive Forward/Backward Greedy Algorithm for Learning Sparse Representations (the technical report is here: Forward-Backward Greedy Algorithm for Learning Sparse Representations).

Consider linear least squares regression where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. This problem is NP-hard. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever necessary. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory.

Nir Ailon, Efficient Dimension Reduction. The abstract reads:
The Johnson-Lindenstrauss dimension reduction idea using random projections was discovered in the early 80's. Since then many "computer science friendly" versions were published, offering constant factor but no big-Oh improvements in the runtime. Two years ago Ailon and Chazelle showed a nontrivial algorithm with the first asymptotic improvement, and suggested the question: What is the exact complexity of J-L computation from d dimensions to k dimensions? An O(d log d) upper bound is implied by A-C for k up to d^{1/3} (in the L2 to L2) case. In this talk I will show how to achieve this bound for k up to d^{1/2} combining techniques from the theories of error correcting codes and probability in Banach spaces. This is based on joint work with Edo Liberty.

Yoram Singer, Efficient Projection Algorithms for Learning Sparse Representations from High Dimensional Data.
Many machine learning tasks can be cast as constrained optimization problems. The talk focuses on efficient algorithms for learning tasks which are cast as optimization problems subject to L1 and hyper-box constraints. The end result are typically sparse and accurate models. We start with an overview of existing projection algorithms onto the simplex. We then describe a linear time projection for dense input spaces. Last, we describe a new efficient projection algorithm for very high dimensional spaces. We demonstrate the merits of the algorithm in experiments with
large scale image and text classification.

Kenneth Clarkson, Tighter Bounds for Random Projections of Manifolds (this is the report). The abstract reads:
The Johnson-Lindenstrauss random projection lemma gives a simple way to reduce the dimensionality of a set of points while approximately preserving their pairwise Euclidean distances. The most direct application of the lemma applies to a finite set of points, but recent work has extended the technique to affine subspaces, smooth manifolds, and sets of bounded doubling dimension; in each case, a projection to a sufficiently large dimension k implies that all pairwise distances are approximately preserved with high probability. Here the case of random projection of a smooth manifold (submanifold of R^m) is considered, and a previous analysis is sharpened, giving an upper bound for k that depends on the surface area, total absolute curvature, and a few other quantities associated with the manifold, and in particular does not depend on the ambient dimension m or the reach of the manifold.

Sanjoy Dasgupta, Random Projection Trees and Low Dimensional Manifolds. The abstract reads:
The curse of dimensionality has traditionally been the bane of nonparametric statistics (histograms, kernel density estimation, nearest neighbor search, and so on), as reflected in running times and convergence rates that are exponentially bad in the dimension. This problem is all the more pressing as data sets get increasingly high dimensional. Recently the field has been rejuvenated substantially, in part by the realization that a lot of real-world data which appears high-dimensional in fact has low "intrinsic" dimension in the sense of lying close to a low-dimensional manifold. In the past few years, there has been a huge interest in learning such manifolds from data, and then using the learned structure to transform the data into a lower dimensional space where standard statistical methods generically work better.

I'll exhibit a way to benefit from intrinsic low dimensionality without having to go to the trouble of explicitly learning its fine structure. Specifically, I'll present a simple variant of the ubiquitous k-d tree (a spatial data structure widely used in machine learning and statistics) that is provably adaptive to low dimensional structure. We call this a "random projection tree" (RP tree).

Along the way, I'll discuss different notions of intrinsic dimension -- motivated by manifolds, by local statistics, and by analysis on metric spaces -- and relate them. I'll then prove that RP trees require resources that depend only on these dimensions rather than the dimension of the space in which the data happens to be situated. This is work with Yoav Freund (UC San Diego).

Lars Kai Hansen, Generalization in High-Dimensional Matrix Factorization. The abstract reads:
While the generalization performance of high-dimensional principal component analysis is quite well understood, matrix factorizations like independent component analysis, non-negative matrix factorization, and clustering are less investigated for generalizability. I will review theoretical results for PCA and heuristics used to improve PCA test performance, and discuss extensions to high-dimensional ICA, NMF, and clustering.

Holly Jin (at LinkedIn !), Exploring Sparse NonNegative Matrix Factorization. The abstract reads:
We explore the use of basis pursuit denoising (BPDN) for sparse nonnegative matrix factorization (sparse NMF). A matrix A is approximated by low-rank factors UDV', where U and V are sparse with unit-norm columns, and D is diagonal. We use an active-set BPDN solver with warm starts to compute the rows of U and V in turn. (Friedlander and Hatz have developed a similar formulation for both matrices and tensors.) We present computational results and discuss the benefits of sparse NMF for some real matrix applications. This is joint work with Michael Saunders.

Compressed Counting and Stable Random Projections by Ping Li. The abstract reads:
The method of stable random projections has become a popular tool for dimension reduction, in particular, for efficiently computing pairwise distances in massive high-dimensional data (including dynamic streaming data) matrices, with many applications in data mining and machine learning such as clustering, nearest neighbors, kernel methods etc.. Closely related to stable random projections, Compressed Counting (CC) is recently developed to efficiently compute Lp frequency moments of a single dynamic data stream. CC exhibits a dramatic improvement over stable random projections when p is about 1. Applications of CC include estimating entropy moments of data streams and statistical parameter estimations in dynamic data using low memory.

Ronald Coifman, Diffusion Geometries and Harmonic Analysis on Data Sets. The abstract reads:
We discuss the emergence of self organization of data, either through eigenvectors of affinity operators, or equivalently through a multiscale ontology. We illustrate these ideas on images and audio files as well as molecular dynamics.

Thomas Blumensath and Mike Davies have set up a larger web page on Compressed Sensing and their attendant research in that field.

This one is a little old but it just came out on my radar screen and it is not on the Rice page yet. It features an A2I paper by Sami Kirolos, Tamer Ragheb, Jason Laska, Marco F. Duarte, Yehia Massoud and Richard Baraniuk entitled Practical Issues in Implementing Analog-to-Information Converters. The abstract reads:

The stability and programmability of digital signal processing systems has motivated engineers to move the analog-to-digital conversion (ADC) process closer and closer to the front end of many signal processing systems in order to perform as much processing as possible in the digital domain. Unfortunately, many important applications, including radar and communication systems, involve wideband signals that seriously stress modern ADCs; sampling these signals above the Nyquist rate is in some cases challenging and in others impossible. While wideband signals by definition have a large bandwidth, often the amount of information they carry per second is much lower; that is, they are compressible in some sense. The first contribution of this paper is a new framework for wideband signal acquisition purpose-built for compressible signals that enables sub-Nyquist data acquisition via an analog-to-information converter (AIC). The framework is based on the recently developed theory of compressive sensitng in which a small number of non-adaptive, randomized measurements are sufficient to reconstruct compressible signals. The second contribution of this paper is an AIC implementation design and study of the tradeoffs and nonidealities introduced by real hardware. The goal is to identify and optimize the parameters that dominate the overall system performance.
Finally, CS in Space

The eighth annual NASA Earth Science Technology Conference (ESTC2008) was held June 24-26. 2008, at the University of Maryland University College and showcased a wide array of technology research and development related to NASA's Earth science endeavors. The papers are here. Of particular interest is this presentation:

Novel Distributed Wavelet Transforms and Routing Algorithms for Efficient Data Gathering in Sensor Webs. Antonio Ortega, G. Shen S. Lee S.W. Lee S. Pattem A. Tu B. Krishnamachari, M. Cheng S. Dolinar A. Kiely M. Klimesh, H. Xie
on page 13, there is : Compressed Sensing for Sensor Networks.

The GRETSI conference occured a year ago. This is one of the papers (in French with an English abstract) entitled Quelques Applications du Compressed Sensing en Astronomie, David Mary and Olivier Michel. The abstract reads:
We investigate in this communication how recently introduced “ Compressed Sensing” methods can be applied to some important problems in observational Astronomy. The mathematical background is first outlined. Some examples are then described in stellar variability and in image reconstruction for space-based observations. We finally illustrate the interest of such techniques for direct imaging through stellar interferometry.

Nous proposons dans cette communication d'évaluer le potentiel des méthodes de « Compressed Sensing » récemment introduites, à travers leur application à quelques problèmes importants en astrophysique observationelle. Après avoir rapidement décrit les bases mathématiques de ces approches, des exemples sont développés dans la cadre de la variabilité stellaire, de la reconstruction d'images satellitaire et enfin dans le cadre plus prospectif des futures possibilités offertes par les grands projets d'interféromètres à multiples bases pour l'imagerie directe dans le plan de Fourier.

## Monday, July 07, 2008

### CS: A Necessary and Sufficient Condition for Measurement Matrices in CS, Italian translation, a connection to EPH ?, a course, three meetings.

Here is another presentation from the Rice group by Richard Baraniuk. It is pretty much standard except that this slide reminded me of the set-up of a recent video (the counter is now at 251 from 52 initially!). The Rice repository also had a new addition this week-end.

Weiyu Xu, Babak Hassibi, Compressed sensing over the Grassmann manifold: A unified analytical framework. The abstract reads:

It is well known that compressed sensing problems reduce to finding the sparse solutions for large under-determined systems of equations. Although finding the sparse solutions in general may be computationally difficult, starting with the seminal work of [2], it has been shown that linear programming techniques, obtained from an l1-norm relaxation of the original non-convex problem, can provably find the unknown vector in certain instances. In particular, using a certain restricted isometry property, [2] shows that for measurement matrices chosen from a random Gaussian ensemble, l1 optimization can find the correct solution with overwhelming probability even when the support size of the unknown vector is proportional to its dimension. The paper [1] uses results on neighborly polytopes from [6] to give a “sharp” bound on what this proportionality should be in the Gaussian case. In this paper we shall focus on finding sharp bounds on the recovery of “approximately sparse” signals (also possibly for noisy measurements). While the restricted isometry property can be used to study the recovery of approximately sparse signals (and also in the presence of noisy measurements), the obtained bounds can be quite loose. On the other hand, the neighborly polytopes technique which yields sharp bounds in the ideal case cannot be generalized to approximately sparse signals. In this paper, starting from a necessary and sufficient condition for achieving a certain signal recovery accuracy, using high-dimensional geometry, we give a unified null-space Grassmannian angle-based analytical framework for compressive sensing. This new framework gives sharp quantitative tradeoffs between the signal sparsity and the recovery robustness of the l1 optimization for approximately sparse signals. As it will turn out, the neighborly polytopes result of [1] for ideally sparse signals can be viewed as a special case of ours. Our result concerns fundamental properties of linear subspaces and so may be of independent mathematical interest.

This is an important paper as it produces a sufficient and necessary condition for a measurement matrix to enable the recovery of a sparse solution with an L1 recovery method. We talked before how RIP was just a sufficient condition and how it was not optimal in guiding the design of certain systems. Let us hope there is an easy way to check this new condition out.

Recoiling from reading this entry, an Italian reader sent me a translation for CS in Italian:

so I'm currently putting down on paper some ideas on translations of the keywords in Compressive Sampling, "Campionamento Compressivo", Compressed/compressive Sensing, "Acquisizione Compressiva". I think the best translation is actually "Acquisizione Compressiva"

They talk about Nuit Blanche/The Big Picture and the List:

A course is offered in the statistics department at Oxford entitled:Compressive sampling for variable selection taught by Nicolai Meinshausen. The course description is as follows:
In recent years, compressive sampling (or compressed sensing), invented by Emanuel Candes, Terence Tao and David Donoho, has attracted a lot of attention, mostly in the signal processing community. The basic idea is that a set of n measurements / samples of a response variables (e.g. n pixels of a picture) can be projected into a lower-dimensional space with dimension m much less n, by taking for example random Gaussian projections of the original samples, while the original image can still be re-constructed with these m much less than n measurements if and only if the original signal (e.g. the image) has a sparse representation in terms of a suitable basis (e.g. wavelet basis for images).
In this project, it is of interest to apply the idea of compressive sampling to variable selection in regression problems with many predictor variables, using the standard framework of compressive sampling, but examining the effect of the projected dimension m (the regularisation) on predictive performance of the chosen model and on the reliability with which a correct sparse regression model can be identified. Data from biological and physical problems are available.

This is interesting: Jeremie Bigot made a a presentation (in French) on Problemes Sparses en statistique. As I went through the goals of the Research Group (GdR) called "Méthodes d'Analyse Stochastique pour les COdes et Traitements NUMériques", I could not help but notice the connection to another subject I mentioned earlier: The Experimental Probabilistic Hypersurface. More on that later.

Three meetings: Two upcoming and one past:

Frames for the finite world: Sampling, coding and quantization, August 18 to 22 at the American Institute of Mathematics, Palo Alto, California. It is organized by:

This workshop, sponsored by AIM and the NSF, will be devoted to frame theory in finite dimensions with an emphasis on compressive sensing and quantization theory for frames. Recent years have seen significant advances in a number of subjects in signal processing and information theory in which frame theory has played a central role. Some of the most important of these "finite world" applications are analog-to-digital (A/D) conversion, coding theory, and compressive sensing. While these subjects fit well within electrical engineering, several key contributions have been made by mathematicians who have strong interest in real-world applications, resulting in fruitful interdisciplinary research collaborations.

A frame is a collection of vectors in a Hilbert space allowing for a stable linear decomposition of any element in the space, similar to a basis expansion. However, a frame is not required to be linearly independent, and consequently, expansion coefficients can be highly non-unique. This redundancy is advantageous for applications in which additional constraints need to be imposed, such as the set of coefficients being sparse or quantized. The nature of such constraints also establishes a venue for the design of specific frames. This workshop will be concerned with the construction, properties, and applications of frames in finite dimensions.

The workshop will focus on the following specific topics:

• Construction of good deterministic frames (measurement matrices) for compressive sensing.
• Quantization theory for frames: rate-distortion theory, coarse quantization, redundancy vs. robustness, A/D conversion with compressive sensing.

The workshop will differ from typical conferences in some regards. Participants will be invited to suggest open problems and questions before the workshop begins, and these will be posted on the workshop website. These include specific problems on which there is hope of making some progress during the workshop, as well as more ambitious problems which may influence the future activity of the field. Lectures at the workshop will be focused on familiarizing the participants with the background material leading up to specific problems, and the schedule will include discussion and parallel working sessions.

The deadline to apply for support to participate in this workshop has passed.

International Symposium on Low Power Electronics and Design 2008, National Science Seminar Complex, Indian Institute of Science, Bangalore, India, August 11-13, 2008. A presentation entitled Noninvasive Leakage Power Tomography of. Integrated Circuits by Compressive Sensing will be featured.

International Conference on Interdisciplinary Mathematical and Statistical Techniques - IMST 2008 / FIM XVI on May 16-18, 2008. A presentation entitled: "Probabilistic existence theorems for group testing with lies" by Anatoly Zhigljavsky. The abstract reads:

For a wide range of search problems with lies the upper bounds for the length of optimal non-adaptive algorithms are derived. The method of deriving these bounds is probabilistic and is therefore not constructive; it does not provide the way to construct the optimal algorithms. The class of search problems considered in the presentation include many known combinatorial group testing problems such as testing in binary, additive and multiaccess channel models.

For several general models we show that the leading asymptotic term for the number of tests does not depend on the number of lies and is thus the same as for the zero-lie case. However, the other terms in the asymptotic upper bounds depend on the number of lies and substantially influence the upper bounds in the nonasymptotic situation.

At the end of the talk, a connection will be established to the so-called 'compressed sensing', where non-adaptive random designs are used for compressing information.

## Thursday, July 03, 2008

### CS: Toeplitz CS Matrices, Better IHT, ICML 08 papers, Trace Norm Minimization, Shift Invariant Sparse Coding, Matrix Factorization

Deterministic Designs with Deterministic Guarantees: Toeplitz Compressed Sensing Matrices, Sequence Designs and System Identification by Venkatesh Saligrama. The abstract reads:

In this paper we present a new family of discrete sequences having ``random like'' uniformly decaying auto-correlation properties. The new class of infinite length sequences are higher order chirps constructed using irrational numbers. Exploiting results from the theory of continued fractions and diophantine approximations, we show that the class of sequences so formed has the property that the worst-case auto-correlation coefficients for every finite length sequence decays at a polynomial rate. These sequences display doppler immunity as well. We also show that Toeplitz matrices formed from such sequences satisfy restricted-isometry-property (RIP), a concept that has played a central role recently in Compressed Sensing applications. Compressed sensing has conventionally dealt with sensing matrices with arbitrary components. Nevertheless, such arbitrary sensing matrices are not appropriate for linear system identification and one must employ Toeplitz structured sensing matrices. Linear system identification plays a central role in a wide variety of applications such as channel estimation for multipath wireless systems as well as control system applications. Toeplitz matrices are also desirable on account of their filtering structure, which allows for fast implementation together with reduced storage requirements.
Found on Thomas Blumensath's site :

Thomas Blumensath and Mike Davies; Iterative Hard Thresholding for Compressed Sensing. [This revised version is now available with an improvement of sqrt(2) in terms of the required isometry constant. The uniform bounds are now better for Iterative Hard Thresholding than for CoSaMP. The version on arXiv: arXiv:0805.0510v1 [cs.IT] has not yet been updated.]

Jort Gemmeke mentioned to me three papers related to CS featured at ICML 2008 (which starts next week). I have covered them already I think, so I'll just cut and paste the links and abstracts:

• Multi-Task Compressive Sensing with Dirichlet Process Priors by Yuting Qi, Dehong Liu, David Dunson, Lawrence Carin

Abstract
Compressive sensing (CS) is an emerging field that, under appropriate conditions, can significantly reduce the number of measurements required for a given signal. In many applications, one is interested in multiple signals that may be measured in multiple CS-type measurements, where here each signal corresponds to a sensing 'task'. In this paper we propose a novel multitask compressive sensing framework based on a Bayesian formalism, where a Dirichlet process (DP) prior is employed, yielding a principled means of simultaneously inferring the appropriate sharing mechanisms as well as CS inversion for each task. A variational Bayesian (VB) inference algorithm is employed to estimate the full posterior on the model parameters.
• Compressed Sensing and Bayesian Experimental Design by Matthias W. Seeger, Hannes Nickisch.

Abstract
We relate compressed sensing (CS) with Bayesian experimental design and provide
a novel efficient approximate method for the latter, based on expectation propagation. In a large comparative study about linearly measuring natural images, we show that the simple standard heuristic of measuring wavelet coefficients top-down systematically outperforms CS methods using random measurements; the sequential projection optimisation approach of (Ji & Carin, 2007) performs even worse. We also show that our own approximate Bayesian method is able to learn measurement filters on full images efficiently which outperform the wavelet heuristic. To our knowledge, ours is the first successful attempt at “learning compressed sensing” for images of realistic size. In contrast to common CS methods, our framework is not restricted to sparse signals, but can readily be applied to other notions of signal complexity or noise models. We give concrete ideas how
our method can be scaled up to large signal representations.

Also of interest in the following talk: Large Scale Approximate Inference and Experimental Design for Sparse Linear Models by Matthias Seeger.

Hannes Nickisch has developed FWTN, a Fast Wavelet-Transformation for D dimensional data in L levels. C-code including Matlab MEX file and Matlab Demo-Script.

• Sparse Bayesian Nonparametric Regression by Francois Caron, Arnaud Doucet.
Abstract
One of the most common problems in machine learning and statistics consists of esti-
mating the mean response X¯ from a vector of observations y assuming y = X¯ + "
where X is known, ¯ is a vector of parameters of interest and " a vector of stochastic
errors. We are particularly interested here in the case where the dimension K of ¯ is
much higher than the dimension of y. We propose some flexible Bayesian models which can yield sparse estimates of ¯. We show that as K ! 1 these models are closely related to a class of Levy processes. Simulations demonstrate that our models outperform significantly a range of popular alternatives.

• Finally three other papers grabbed my attention, the first two are not direclty CS related:

Column subset selection, matrix factorization, and eigenvalue optimization by Joel Tropp. The abstract reads:

Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in a novel approximation algorithm for the (1; 1) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous maxcut semidefinite program.

and

Consistency of Trace Norm Minimization by Francis Bach. The abstract reads:
Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled.
Finally, of interest with regards to dictionary building:

Shift Invariant Sparse Coding of Image and Music Data by Morten Mørup, Mikkel Schmidt, Lars Kai Hansen. The abstract reads:

When analyzing multi-media data such as image and music it is useful to extract higher-level features that constitute prominent signatures of the data. We demonstrate how a 2D shift invariant sparse coding model is capable of extracting such higher level features forming so-called icon alphabets for the data. For image data the model is able to find high-level prominent features while for music the model is able to extract both the harmonic structure of instruments as well as indicate the scores they play. We further demonstrate that non-negativity constraints are useful since they favor part based representation. The success of the model relies in finding a good value for the degree of sparsity. For this, we propose an ‘L-curve’-like argument and use the sparsity parameter that maximizes the curvature in the graph of the residual sum of squares plotted against the number of non-zero elements of the sparse code. Matlab implementation of the algorithm is available for download.
Non-Negative Shift Invariant Sparse Coding algorithm is here and is featured in the dictionary section of the big picture.

Also Trac Tran's presentation at Simon Frasier University entitled: Fast Efficient and Practical Algorithms for Compressed Sensing.

Credit: NASA/JPL-Caltech/University of Arizona/Texas A&M. Peter Pan print as seen from Phoenix, Sol 36.

## Tuesday, July 01, 2008

### CS: Coded Mask Imagers: What are they good for ? The George Costanza "Do the Opposite" Sampling Scheme.

I have mentioned exotic sampling schemes before in photography and other areas. These exotic sampling techniques are generally devised to obtain newer sorts of information (hyperspectral, 3-d,....). One of these exotic means of collecting images is coded aperture. How are Compressive Sensing and Coded Aperture related in a simple way ? They differ in the reconstruction scheme: traditional coded aperture rely on a linear technique and no assumption of sparsity where CS uses a nonlinear technique and assume sparsity of the signal. But then, why has traditional coded aperture been successful in some niche areas ? Gerry Skinner, a specialist in coded aperture used for X-ray observatories in orbiting satellites, provides the beginning of an answer in Coded Mask Imagers: when to use them, and when not ? The following slide is extraordinarily expressive: (CM4 in this presentation)

In the context of Compressive Sensing, projecting a delta function onto a widespread function is natural whereas it is totally counter-intuitive in a normal setting. Yet it works! When reading about the "worst" Point Spread Function imaginable, one cannot but be reminded of George Constanza's "doing the opposite" in an unforgetable Seinfeld episode:

To use the wording used in Optics, when imaging stars, it looks as though designing a Compressive Sensing system is equivalent to designing a point spread function that is incoherent with the target of interest. The rest of the paper reads as if it were a CS paper except for the reconstruction aspect of it. In Compressed Sensing, the assumption is that we are looking at a sparse signal whereas in a normal setting, no such assumption is made The reason coded aperture has had continued success in Astronomy stems from the fact that the scene of interest is sparse in point-like sources (no extended objects) and these sources can be considered at infinity. In other words, the signal is sparse in the physical world already. It is embedded in the problem statement. Therefore, it seems likely that most algorithms developed for theses cases do in fact parallel those in compressed sensing (the one mentioned here seems to implement some sort of matching pursuit algorithm). For show, it interesting to see how rich the imaging systems have been:

Let us note also, that the arrows in the graph above show the movement of certain detectors. They are in effect using the movements of the sensor/spacecraft as a way to perform time multiplexing. When put in the context of compressed sensing, we should see this presentation as a 40-year jump start in our understanding of what to expect from these systems.

This lessons learned can be mapped to Compressive Sensing within the context of:
• low light flux
• far field approximation
• point like sources (diracs)

Obviously we need to find the same sets of rules for close field of view situations (nuclear medicine) and/or high fluxes and extended sources. In turn, we also need to understand how applying other type of mask properties (such as the Restricted Isometry Property) improve X-ray astronomy. We have already seen that Compressed Sensing is an enabling technology in traditional astronomy by providing a simple and efficient time coding of star field data (example of Herschel compression scheme using CS). Can CS provide a similar jump in improvement of Coded Aperture processing by a modified use of the mask or the time encoding?

Curiously, the reason coded aperture has not been more extensively used may have to do with the slow realization that most of our surrounding world is sparse, albeit not sparse in point-like features. Until we figured that out in the mid-1990's, we could only conceive coded aperture to be a viable option when we had not other way to go about it. This is pretty clear in the case of nuclear medicine, where particle/rays cannot be focused.

Finally, it is interesting to note that because star shining is a low flux problem and that spreading light is not efficient because you are drowning your compressed signal in the noise (recall that only the star-point-like-diracs is incoherent with the mask, not the background noise), recent development goes into avoiding Coded Aperture altogether by developing focusing lenses for gamma rays using Fresnel like lenses for gamma rays:

And so the question is: Could CS reconstruction techniques improve current coded aperture imaging to the point where this newer focusing technology is made "obsolete" by an improvement in the algorithm development of current coded masks ? What about raising the ability to provide data from current orbiting spacecrafts that have low ranking in NASA's latest review ? or provide better time modulating schemes [1] in nuclear medicine ?

Acknowledgment: Thanks to Ben Kowash (via Glenn Knoll) for pointing to some of the references above.

References:
[1] Digital Tomographic Imaging with Time-Modulated Pseudorandom Coded Aperture and Anger Camera by Kenneth F. Koral, W. Leslie Rogers, and Glenn F. Knoll, Journal of Nuclear Medicine, Volume 16, Number 5, 1974.