## Tuesday, September 30, 2014

### Ce Soir: Machine Learning Meetup Hors-Série #1 (season 2): Datajournalisme

c

Thanks to Numa and HopWorkChrystèle, Franck and I will be hosting a "Hors-Série" of our Machine Learning Meetup on the theme of Data Journalism. At least the first guest will speak in English while the others are expected to speak french (slides ought to be in english). The video of the meetup is at the top of this blog entry and it should start around 6:50/6:55PM Paris time (CET).

Video of the Meetup in French (mostly)

Invités confirmés :

Animation de la discussion : Chrystèle Bazin, consultante indépendante et rédactrice pour différents médias numériques.

Avec le soutien actif de Numa et de HopWork

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Map Estimation for Bayesian Mixture Models with Submodular Priors ( and slides Fourth Cargese Workshop on Combinatorial Optimization )

We saw submodularity here before, take for instance Francis Bach's slides (page 99 and up) on Structured sparsity through convex optimization , where he mentions the use of submodularity to find new regularizers).  Here is another way of using submodularity in compressive sensing: Map Estimation for Bayesian Mixture Models with Submodular Priors by Marwa El Halabi, Luca Baldassarre and Volkan Cevher
We propose a Bayesian approach where the signal structure can be represented by a mixture model with a submodular prior. We consider an observation model that leads to Lipschitz functions. Due to its combinatorial nature, computing the maximum a posteriori estimate for this model is NP-Hard, nonetheless our converging majorization-minimization scheme yields approximate estimates that, in practice, outperform state-of-the-art modular prior. We consider an observation model that leads to Lipschitz functions. Due to its combinatorial nature, computing the maximum a posteriori estimate for this model is NP-Hard, nonetheless our converging majorization-minimization scheme yields approximate estimates that, in practice, outperform state-of-the-art methods

why submodularity is a big deal, from the paper:

Submodularity is considered the discrete equivalent of convexity in the sense that submodular function minimization (SFM) admits efficient algorithms, with best known complexity of O(N5T+N6), where T is the function evaluation complexity [11]. In practice, however, the minimum-norm point algorithm is usually used whenever, the minimum-norm point algorithm is usually used, which commonly runs in O(N2), but has no known complexity [12]. Furthermore, for certain functions which are “graph representable” [13, 14], SFM is equivalent to the minimum s-t cut on an appropriate graph G(V,E), with time complexity 1 O(|E|min{|V|2/3,|E|1/2}) [15].

Relevant to the theme of submodularity are last year's Fourth Cargese Workshop on Combinatorial Optimization

### Naor:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, September 29, 2014

### Image Classification with A Deep Network Model based on Compressive Sensing

Using compressive sensing to build the first layers of a neural network, and more importantly thinking of iterations of different reconstruction solvers as equilvant to layers of neural network, we are getting there.

To simplify the parameter of the deep learning network, a cascaded compressive sensing model "CSNet" is implemented for image classification. Firstly, we use cascaded compressive sensing network to learn feature from the data. Secondly, CSNet generates the feature by binary hashing and block-wise histograms. Finally, a linear SVM classifier is used to classify these features. The experiments on the MNIST dataset indicate that higher classification accuracy can be obtained by this algorithm.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, September 26, 2014

### Compressive Earth Observatory: An Insight from AIRS/AMSU Retrievals (and a comment)

This is exciting but I have included my thoughts below:

Compressive Earth Observatory: An Insight from AIRS/AMSU Retrievals by Ardeshir Mohammad Ebtehaj, Efi Foufoula-Georgiou, Gilad Lerman, Rafael Luis Bras
We demonstrate that the global fields of temperature, humidity and geopotential heights admit a nearly sparse representation in the wavelet domain, offering a viable path forward to explore new paradigms of sparsity-promoting assimilation and compressive retrieval of spaceborne earth observations. We illustrate this idea using retrieval products of the Atmospheric Infrared Sounder (AIRS) and Advanced Microwave Sounding Unit (AMSU) on board the Aqua satellite. The results reveal that the sparsity of the fields of temperature and geopotential height is relatively pressure-independent while atmospheric humidity fields are typically less sparse at higher pressures. Using the sparsity prior, we provide evidence that the global variability of these land-atmospheric states can be accurately estimated from space in a compressed form, using a small set of randomly chosen measurements/retrievals.

From the paper we can see:

In other words, our sensing matrices are obtained from an identity matrix in which we have randomly eliminated 55% and 65% of its rows, respectively. In this case, it is easy to show that the sensing matrix has the RIP property for which the CS in (5) can lead to an accurate and successful recovery (see, Section B in Appendix).

so it looks like inpainting and from the conclusion, we have:

While progress has been made recently in developing sparse digital image acquisition in visible bands [33], development of sparse-remote-sensing instruments for earth observations from space in microwave and infrared wavelengths remains an important challenge in the coming years. However, our results suggest that, even under the current sensing protocols, transmitting, storing, and processing only a few randomly chosen pixel-samples of the primary land-atmospheric states can be advantageously exploited for a speedy reconstruction of the entire sensor’s field of view with a notable degree of accuracy. The implications of such a capability cannot be overstated for real-time tracking and data assimilation of extreme land-atmospheric phenomena in global early warning systems.
I like the fact that the findings put the current remote sensing systems within the larger view on how to do sampling and how future instruments might be an extension of that through the use of compressive sensing.

There is the potential for impressionable kids to think that compressive sensing and inpainting might generally be the same as the title of this article on Wired might imply. It is a dichotomy that is difficult to communicate and to this day, we still have people mixing up compressive sensing and inpainting. The idea is very difficult to eradicate that in order to perform compressive sensing, you generally oversample the signal so that measurements are actually redundant. Inpainting on the other hand makes it look like we are getting something for nothing (by avoiding sampling in some part of the signal - an image in general - we can recover the full image), i.e. the algorithm somehow discovers something that was never sensed in the first place. This is not what we have here.

These two views can be coincident or blurred only if you show that the field being measured is actually spread or incoherent with the measurement system being used (here it is point like). This is why the authors of the paper spend a large part of the paper showing that the field is part of a dictionary (low frequency wavelets) and that sampling at specific locations is an adequate incoherent measurement system....at that scale.

I wish the authors had written a sentence on that because there are a lot of impressionable kids on the interwebs.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Thursday, September 25, 2014

### Alternating proximal gradient method for sparse nonnegative Tucker decomposition - implementation -

Multi-way data arises in many applications such as electroencephalography (EEG) classification, face recognition, text mining and hyperspectral data analysis. Tensor decomposition has been commonly used to find the hidden factors and elicit the intrinsic structures of the multi-way data. This paper considers sparse nonnegative Tucker decomposition (NTD), which is to decompose a given tensor into the product of a core tensor and several factor matrices with sparsity and nonnegativity constraints. An alternating proximal gradient method (APG) is applied to solve the problem. The algorithm is then modified to sparse NTD with missing values. Per-iteration cost of the algorithm is estimated scalable about the data size, and global convergence is established under fairly loose conditions. Numerical experiments on both synthetic and real world data demonstrate its superiority over a few state-of-the-art methods for (sparse) NTD from partial and/or full observations. The MATLAB code along with demos are accessible from the author's homepage.
The implementation is on Yangyang Xu's  page.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, September 24, 2014

### Stochastic Coordinate Coding (SCC) - implementation -

As examplified by this work, genomics is bound to have a profound effect on designing new algorithms.

Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation by Binbin Lin, Qingyang Li, Qian Sun, Ming-Jun Lai, Ian Davidson, Wei Fan, Jieping Ye

\textit{Drosophila melanogaster} has been established as a model organism for investigating the fundamental principles of developmental gene interactions. The gene expression patterns of \textit{Drosophila melanogaster} can be documented as digital images, which are annotated with anatomical ontology terms to facilitate pattern discovery and comparison. The automated annotation of gene expression pattern images has received increasing attention due to the recent expansion of the image database. The effectiveness of gene expression pattern annotation relies on the quality of feature representation. Previous studies have demonstrated that sparse coding is effective for extracting features from gene expression images. However, solving sparse coding remains a computationally challenging problem, especially when dealing with large-scale data sets and learning large size dictionaries. In this paper, we propose a novel algorithm to solve the sparse coding problem, called Stochastic Coordinate Coding (SCC). The proposed algorithm alternatively updates the sparse codes via just a few steps of coordinate descent and updates the dictionary via second order stochastic gradient descent. The computational cost is further reduced by focusing on the non-zero components of the sparse codes and the corresponding columns of the dictionary only in the updating procedure. Thus, the proposed algorithm significantly improves the efficiency and the scalability, making sparse coding applicable for large-scale data sets and large dictionary sizes. Our experiments on Drosophila gene expression data sets show that the proposed algorithm achieves one or two orders of magnitude speedup compared to the state-of-art sparse coding algorithm.

The attendant implementation of SCC is on Jieping Ye's software page.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, September 22, 2014

### Book / Proceedings / Lecture Notes: Statistical physics, Optimization, Inference and Message-Passing algorithms

Last year, there was an "Ecole de Physique" in Cargese on Statistical physics, Optimization, Inference and Message-Passing algorithms. Florent Krzakala, one of the organizers, mentioned earlier today that they had released the lecture notes of the talks. They are all on this page

# Proceeding of all lectures

Called "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms" and edited by F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova, R. Zecchina, E. W. Tramel, L. F. Cugliandolo. A book containing the proceeding of all lectures is in preparation at Oxford University Press. A preliminary version of some chapters can be found online on Arxiv:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Sunday, September 21, 2014

### Unfiltered Access

It happened last season. When Franck and I decided to program Season 1 of the Paris Machine Learning meetup, we initially thought the whole thing would run out pretty quickly. Indeed, our interest was to get people to talk about algorithms as opposed to specific web/langage technologies as is usually the case in technical meetups these days.

Over the course of the Parisian Winter, we both realized that people who came to the meetups were not scared by difficult subjects. In fact, the feedback suggested attendees were sick of dumbed down stories on important topics. While the presentations were probably less heavy than usual academic talks, quite a few academics even told us they liked the format. Deep down, it really was a moment of nostalgia. Those instances reminded people of how great they were when they were learning in their earlier years. They remembered how exceptional those moments were, they craved for these unique times. The meetups brought back that message: You are not stupid, you are decrypting the world and you are not alone.

Over the Parisian Spring, we realized something important. Some of the best people in the world could not come to Paris. Not that they did not want to, but, much like long distance runners, most have reached the Zone and they don't want to deviate from it for some limited exposure. We realized that there had to be a better way, so we started with Skype or hangouts to get them to speak to our audience for 10 to 15 minutes remotely.

Another realization crystalized at meetup #10. I had seen a video of Brenda McCowan who was talking about what her group was doing in Dolphin communication - a faraway concern to people who's livelihood is to predict the next click. Putting this in context of our meetup's theme, that work is hard Unsupervised Learning, it's the stuff of exploration, think Champollion but with animals and no Rosetta Stone: we had to get her on the meetup schedule. Because Brenda is very busy and because this was probably an unexpected invitation from some crazy French guys, she kindly accepted a five minutes Q&A from our crowd on Skype after we had locally played the video that had been recorded earlier by the good folks at NIMBioS. The untold promise was that it would be one of the least impactful event in her professional life: answer technical questions for five minutes from folks 8,000 miles away who had just seen her earlier academic pitch. In the end, the exchange lasted longer.

I am sure I am not the only person in the audience who realized this but for the first time in our lives, when we were asking questions to Dr. Brenda McCowan, we were becoming actors of a really technical National Geographic feature film... "sans" the post-production , "sans" the dumbing down of a traditional filtered Q&A, "sans" the delicate narrative of a show. We, the meetup audience, whose inner 9-year old kids had given up on our own dreams of talking to Flipper when we decided to be serious and eventually have jobs, realized we were having a technically insightful conversation with the one of the few people on Earth who had the interest, knowledge and means of potentially understanding Dolphin communication. For a moment, more than a hundred "wiser" 9-year old kids were back in business.

Props to NIMBioS, Brenda McCowan, meetup.com , Skype, TheFamilyHopWork, DojoCrea, for making this unique moment happen.

In light of the success of the Paris Machine Learning Applications Group, I just started two new meetups in Paris:
Credit Photo: Hans Hillewaert - The Rosetta Stone in the British Museum.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, September 20, 2014

### Saturday Morning Videos: Random Functions for Dependence and Component Analysis, Randomized Nonlinear Component Analysis and the Automatic Statistician

Random Functions for Dependence and Component Analysis by David Lopez-Paz. The slides are here.

ICML: Randomized Nonlinear Component Analysis by David Lopez-Paz; Suvrit Sra; Alex Smola; Zoubin Ghahramani; Bernhard Schoelkopf
Joint Talk: Automatic Statistician - the big picture & Automatic construction and natural language description of nonparametric regression models by Zoubin Ghahramani and James LLoyd. The slides are here.

Other talks and videos of the First MSR-MLG Joint Meeting  organized by David Lopez-Paz, Sebastian Nowozin, Zoubin Ghahramani at Microsoft Research Cambridge, 2014

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, September 19, 2014

### The Randomized Causation Coefficient - implementation -

We are interested in learning causal relationships between pairs of random variables, purely from observational data. To effectively address this task, the state-of-the-art relies on strong assumptions regarding the mechanisms mapping causes to effects, such as invertibility or the existence of additive noise, which only hold in limited situations. On the contrary, this short paper proposes to learn how to perform causal inference directly from data, and without the need of feature engineering. In particular, we pose causality as a kernel mean embedding classification problem, where inputs are samples from arbitrary probability distributions on pairs of random variables, and labels are types of causal relationships. We validate the performance of our method on synthetic and real-world data against the state-of-the-art. Moreover, we submitted our algorithm to the ChaLearn's "Fast Causation Coefficient Challenge" competition, with which we won the fastest code prize and ranked third in the overall leaderboard.

The code is on David Lopez-Paz's page.

Also relevant: The Randomized Dependence Coefficient by David Lopez-Paz, Philipp Hennig and Bernhard Schölkopf. Attendant code is also on David Lopez-Paz's page.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Randomized Kernels, Randomized Solvers, Random features and more

Here are a slew of very interesting papers connecting to the earlier Hardware Based Stochastic Gradient Descent and much earlier random features:

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization by Vikas Sindhwani, Haim Avron
Complex machine learning tasks arising in several domains increasingly require "big models" to be trained on "big data". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods constitute a very popular, versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. However, their storage requirements and high computational complexity poses a significant barrier to their widespread adoption in big data applications. We propose an algorithmic framework for massive-scale training of kernel-based machine learning models. Our framework combines two key technical ingredients: (i) distributed general purpose convex optimization for a class of problems involving very large but implicit datasets, and (ii) the use of randomization to significantly accelerate the training process as well as prediction speed for kernel-based models. Our approach is based on a block-splitting variant of the Alternating Directions Method of Multipliers (ADMM) which is carefully reconfigured to handle very large random feature matrices only implicitly, while exploiting hybrid parallelism in compute environments composed of loosely or tightly coupled clusters of multicore machines. Our implementation supports a variety of machine learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in a highly extensible framework. We study the scalability of our framework on both commodity clusters as well as on BlueGene/Q, and provide a comparison against existing sequential and parallel libraries for such problems.

Scalable Kernel Methods via Doubly Stochastic Gradients by Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina Balcan, Le Song
The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization performance of O(1/t√). This doubly stochasticity also allows us to avoid keeping the support vectors and to implement the algorithm in a small memory footprint, which is linear in number of iterations and independent of data dimension. Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show that our method can achieve competitive performance to neural nets in datasets such as 8 million handwritten digits from MNIST, 2.3 million energy materials from MolecularSpace, and 1 million photos from ImageNet.

Large-scale randomized-coordinate descent methods with non-separable linear constraints by Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, Suvrit Sra
We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of constraints. We present algorithms and analysis for four key problem scenarios: (i) smooth; (ii) smooth + nonsmooth separable; (iii) asynchronous distributed; and (iv) stochastic. We illustrate empirical behavior of our algorithms by simulation experiments.

Compact Random Feature Maps by Raffay Hamid, Ying Xiao, Alex Gittens, Dennis DeCoste
Kernel approximation using randomized feature maps has recently gained a lot of interest. In this work, we identify that previous approaches for polynomial kernel approximation create maps that are rank deficient, and therefore do not utilize the capacity of the projected feature space effectively. To address this challenge, we propose compact random feature maps (CRAFTMaps) to approximate polynomial kernels more concisely and accurately. We prove the error bounds of CRAFTMaps demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes. We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers. We present experiments on multiple standard data-sets with performance competitive with state-of-the-art results.

Random Laplace Feature Maps for Semigroup Kernels on Histograms Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron

With the goal of accelerating the training and test ing complexity of nonlinear kernel methods, several recent papers have proposed explicit embeddings of the input data into low-dimensional feature spaces, where fast linear methods can instead be used to generate approximate solutions. Analogous to random Fourier feature maps to approximate shift-invariant kernels, such as the Gaussian kernel, on Rd, we develop a new randomized technique called random Laplace features, to approximate a family of kernel functions adapted to the semigroup structure of Rd+. This is the natural algebraic structure on the set of histograms and other non-negative data representations. We provide theoretical results on the uniform convergence of random Laplace features. Empirical analyses on image classification and surveillance event detection tasks demonstrate the attractiveness of using random Laplace features relative to several other feature maps proposed in the literature.

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels by Jiyan Yang, Vikas Sindhwani, Haim Avron, Michael Mahoney

We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large datasets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead where the relevant integrands are evaluated on a low-discrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. We derive a new discrepancy measure called box discrepancy based on theoretical characterizations of the integration error with respect to a given sequence. We then propose to learn QMC sequences adapted to our setting based on explicit box discrepancy minimization. Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem.

From Kernel Machines to Ensemble Learning by Chunhua Shen, Fayao Liu

Ensemble methods such as boosting combine multiple learners to obtain better prediction than could be obtained from any individual learner. Here we propose a principled framework for directly constructing ensemble learning methods from kernel methods. Unlike previous studies showing the equivalence between boosting and support vector machines (SVMs), which needs a translation procedure, we show that it is possible to design boosting-like procedure to solve the SVM optimization problems. In other words, it is possible to design ensemble methods directly from SVM without any middle procedure. This finding not only enables us to design new ensemble learning methods directly from kernel methods, but also makes it possible to take advantage of those highly-optimized fast linear SVM solvers for ensemble learning. We exemplify this framework for designing binary ensemble learning as well as a new multi-class ensemble learning methods.  Experimental results demonstrate the flexibility and usefulness of the proposed framework.
Credits: ESA/Rosetta/MPS for OSIRIS Team MPS/UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA
Date: 15 September 2014
Satellite: Rosetta
Depicts: Philae's primary landing site

## Wednesday, September 17, 2014

### Ce Soir: Paris Machine Learning Meetup #1, Season 2, A New Beginning (Snips, Nomo, Clustaar and more)

Once again with Franck, we have decided to do a second season of the pretty successul season 1 of the Paris Machine Learning Meetup. We have about 1200 members and growing. The twitter hashtag for tonight's meetup is #MLParis.
The YouTube page is here, while the Google Hangout Event page is here.

Resources for the Paris Machine Learning group other than ones on Meetup.com include the LinkedIn group, Google+ group and the Archives of the meetup.
A big thank you to DojoEvents for hosting us.

Here is the tentative list of talks and abstracts for tonight's meetup:

+ Franck Bardol, Igor Carron, Introduction, What's New....

Lightning talk (5 minutes)

+ Jean-Baptiste Tien, Criteo, Update on the Kaggle Criteo contest ( remote lightning talk from Palo Alto)

Full Talks (15-20 minutes)

+ Maël Primet, Snips,

Machine Learning for Context-Awareness

+ Cédric Coussinet:

Langage Nomo
Le projet nomoSeed propulse le langage nomo pour concevoir des systèmes complexes temps-réel avec des capacités d'apprentissage incrémental. Le langage nomo permet de définir une base de règles toujours prêtes à réagir à tout événement et pouvant s’adapter et créer des règles en fonction des interactions constantes avec l'environnement. http://nomoseed.org

+ Philippe Duhamel & Nicolas Chollet (www.clustaar.com)

Extract Consumer Insight from Seach Engine Queries
Abstract:  150 billion searches are made in Google each month on a global basis. As an increasing number of cases indicates, analyzing the terms people search for in Google offers a new understanding of how people and consumers behave and express their needs and concerns. In this presentation, we show how algorithm-based and semantical search term analysis can bring a powerful and relevant market insights to brands and organizations.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Multiresolution Matrix Factorization

You probably recall the idea expressed in Sunday Morning Insight on Matrix Factorizations and the Grammar of Life, where one would iterate on several matrix factorization in order to provide a convinving decomposition of a data matrix. The idea is similar to diffusion wavelets and more recent  signal processing on graphs. Here is a new paper on the matter:

Multiresolution Matrix Factorization by Risi Kondor, Nedelina Teneva and Vikas Garg

Abstract: The types of large matrices that appear in modern Machine Learning problems often have complex hierarchical structures that go beyond what can be found by traditional linear algebra tools, such as eigendecompositions. Inspired by ideas from multiresolution analysis, this paper introduces a new notion of matrix factorization that can capture structure in matrices at multiple different scales. The resulting Multiresolution Matrix Factorizations (MMFs) not only provide a wavelet basis for sparse approximation, but can also be used for matrix compression (similar to Nystrom approximations) and as a prior for matrix completion.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, September 16, 2014

### Hardware Based Stochastic Gradient Descent

Starting at 52 minutes and 10 seconds of this video, Ali Rahimi explains how one can use a noisy evaluation of the gradient in a gradient descent algorithm in order to minimize a convex function (most relaxations featured in current Advanced Matrix Factorization techniques and some instances of Compressive Sensing fall in that category). The interesting part of this is that the noise comes from the errors made by the, now stochastic, Floating Point Unit.  From the paper below:

Unlike the traditional setting for stochastic gradient descent, where stochasticity arises because the gradient direction is computed from a random subset of a dataset, here the processor itself is the source of stochasticity. We call our approach application robustification.

There have been several attempts at correcting process variation induced errors by identifying and masking these errors at the circuit and architecture level. These approaches take up valuable die area and power on the chip. As an alternative, we explore the feasibility of an approach that allows these errors to occur freely, and handle them in software, at the algorithmic level. In this paper, we present a general approach to converting applications into an error tolerant form by recasting these applications as numerical optimization problems, which can then be solved reliably via stochastic optimization. We evaluate the potential robustness and energy benefits of the proposed approach using an FPGA-based framework that emulates timing errors in the floating point unit (FPU) of a Leon3 processor. We show that stochastic versions of applications have the potential to produce good quality outputs in the face of timing errors under certain assumptions. We also show that good quality results are possible for both intrinsically robust algorithms as well as fragile applications under these assumptions.
Of note, these two theses:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, September 15, 2014

### Accelerating Random Forests in Scikit-Learn

Accelerating Random Forests in Scikit-Learn by Gilles Louppe

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### CSjob: Associate professor in signal processing with special emphasis on compressive sensing, Aalborg, Denmark

Found on the interweb:

# Associate professor in signal processing with special emphasis on compressive sensing (42102)

At the Faculty of Engineering and Science, Department of Electronic Systems, Signal and Informations Processing section (SIP), a position as associate professor in signal processing with special emphasis on compressive sensing is open for appointment from 1 October 2014 or soon hereafter. The position is for a period of four years. The Department of Electronic Systems is one of the largest departments at Aalborg University with a total of more than 300 employees. The department is internationally recognized in particular for its contributions within Information and Communication Technology (ICT). The research and teaching of the Department of Electronic Systems focus on electronic engineering and the activity areas are organized in five sections (http://www. es.aau.dk/research/sections/) The department also hosts some larger research centers (http://www. es.aau.dk/research/centres/) (with great international scope). The department focuses on maintaining a close interplay with the university’s surroundings - locally, nationally and internationally – as well as producing unique basic research and educating talented and creative engineers. The department collaborates with leading ICT researchers all over the world.

## Job description

4 years position as associate professor in signal processing with special emphasis on compressive sensing

For a project supported by the Danish Research Councils and Aalborg University we need an associate professor for compressive sensing applied to Atomic Force Microscopy. Significant teaching (up to around 50% average over the four years) must be expected. This can be at bachelor, master and PhD level. Documented pedagogical skills at university level is mandatory.

Competences:
- Must have master and PhD degrees in signal processing or similar.
- Experience in developing high quality computational software for open source projects    (also willing to make software publicly available as demanded by the research project).
- Must be able to document research experience in compressive sensing – including non-ideal effects.
- Experience in Python programming for computational science.
- Documented pedagogical competences at university level (preferably in mathematical modeling, Python programming, compressive sensing, scientific computing and related areas).
- Experience in developing new courses – preferably both at bachelor, master and PhD level.
- Knowledge in Atomic Force Microscopy (AFM) is an advantage.

You may obtain further professional information from Professor, Dr.Techn. Torben Larsen, phone: +45 20206856 or e-mail: tl@es.aau.dk.

Qualification requirements:

The level of qualification for Associate Professors shall correspond to the level, which can be achieved on the basis of the appointment as Assistant Professor, but may be achievable in other ways. The appointment presupposes that the applicant can demonstrate original scientific production at an international level as well as documented teaching qualifications.

Appointment to the position requires that both research and teaching qualifications are at the requested level. The two qualifications will be given equal and principal priority in the overall assessment.

The application must contain the following:
• A motivated text wherein the reasons for applying, qualifications in relation to the position, and intentions and visions for the position are stated.
• A current curriculum vitae.
• Copies of relevant diplomas (Master of Science and PhD).
• Scientific qualifications. A complete list of publications must be attached with an indication of the works the applicant wishes to be considered. You may attach up to 10 publications.
• Teaching qualifications described in the teaching portfolio.  If this is not enclosed the applicant must include an explanation for its absence.
• Dissemination qualifications, including participation on committees or boards, participation in organisations and the like.
• Additional qualifications in relation to the position.
• References/recommendations.
• Personal data.
The applications are only to be submitted online by using the "Apply online" button below.

An assessment committee will assess all candidates.

Information regarding guidelines, ministerial circular in force, teaching portfolio and procedures can be seen here.

## Agreement

Employment is in accordance with the Ministerial Order on the Appointment of Academic Staff at Universities (the Appointment Order) and the Ministry of Finance’s current Job Structure for Academic Staff at Universities. Employment and salary are in accordance with the collective agreement for state-employed academics or the collective agreement for academics under the Danish Society of Engineers’ (IDA) and the Danish Association of Chartered Surveyors’ (DDL) negotiation areas.

## Vacancy number

42102

15/09/2014

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Thesis: Randomized Algorithms For Large-Scale Strongly Overdetermined Linear Regression Problems - Xiangrui Meng

In the era of big data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable data processing. With cheap storage, instead of storing only currently relevant data, most people choose to store data as much as possible, expecting that its value can be extracted later. In this way, exabytes (1018) of data are being created on a daily basis. However, extracting value from big data requires scalable implementation of advanced analytical algorithms beyond simple data processing, e.g., regression analysis and optimization. Many traditional methods are designed to minimize floating-point operations, which is the dominant cost of in-memory computation on a single machine. In a distributed environment, load balancing and communication including disk and network I/O can easily dominate computation. These factors greatly increase the complexity and challenge the way of thinking in the design of distributed algorithms. Randomized methods for big data analysis have received a great deal of attention in recent years because they are generally faster and simpler to implement than traditional methods, it is easier to distribute the work, and they sometimes have theoretically provable performance. In this work, we are most interested in random projection and random sampling algorithms for 2 regression and its robust alternative, 1 regression, with strongly rectangular data. Random projection and random
sampling are used to create preconditioned systems that are easier to solve or sub-sampled problems that provide relative-error approximations. Our main result shows that in near input-sparsity time and only a few passes through the data we can obtain a good approximate solution, with high probability. Our theory holds for general p ∈ [1, 2], and thus we formulate our results in p.
In the first chapter, we introduce p regression problems and p-norm conditioning, as well as traditional solvers for p regression problems and how they are affected by the condition number.
The second chapter describes the solution framework, where we discuss how ellipsoidal rounding and subspace embedding are connected to p regression and develop faster rounding and embedding algorithms via random projection and random sampling. Chapter 3 describes a parallel solver named LSRN for strongly over- or under-determined linear least squares (2 regression), and Chapter 4 establishes the theory for p subspace embedding and its application to p regression.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.