Friday, April 28, 2017

Thesis: Randomized Algorithms for Large-Scale Data Analysis by Farhad Pourkamali-Anaraki

Image 1

Stephen just sent me the following:

Hi Igor, 
It's a pleasure to write to you again and announce the graduation of my PhD student Farhad Pourkamali-Anaraki.

It contains a lot of good things, some published some not. In particular (see attached image 1) he has great work on a 1-pass algorithm for K-means that seems to be one of the only 1-pass algorithms to accurately estimate cluster centers (implementation at https://github.com/stephenbeckr/SparsifiedKMeans ), and also has very recent work on efficient variations of the Nystrom method for approximating kernel matrices that seems to give the high-accuracy of the clustered Nystrom method at a fraction of the computational cost (see image 2). 
Best,
Stephen

Image 2




Thanks Stephen but I think the following paper also does 1-pass for K-Means (Keriven N., Tremblay N., Traonmilin Y., Gribonval R., "Compressive K-means" and its implementation SketchMLbox: A MATLAB toolbox for large-scale mixture learning ) even though the contruction seems different. Both of these implementations will be added to the Advanced Matrix Factorization Jungle page.

Anyway, congratulations Dr. Pourkamali-Anaraki !
Randomized Algorithms for Large-Scale Data AnalysisFarhad Pourkamali-Anaraki The abstract reads :

Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extract- ing meaningful information from these data sets will bring future advances in fields of science and engineering. However, the complexity and high-dimensionality of modern data sets pose unique computational and statistical challenges. The computational requirements of analyzing large-scale data exceed the capacity of traditional data analytic tools. The challenges surrounding large high-dimensional data are felt not just in processing power, but also in memory access, storage requirements, and communication costs. For example, modern data sets are often too large to fit into the main memory of a single workstation and thus data points are processed sequentially without a chance to store the full data. Therefore, there is an urgent need for the development of scalable learning tools and efficient optimization algorithms in today’s high-dimensional data regimes.

A powerful approach to tackle these challenges is centered around preprocessing high-dimensional data sets via a dimensionality reduction technique that preserves the underlying geometry and structure of the data. This approach stems from the observation that high- dimensional data sets often have intrinsic dimension which is significantly smaller than the ambient dimension. Therefore, information-preserving dimensionality reduction methods are valuable tools for reducing the memory and computational requirements of data analytic tasks on large-scale data sets.

Recently, randomized dimension reduction has received a lot of attention in several fields, including signal processing, machine learning, and numerical linear algebra. These methods use random sampling or random projection to construct low-dimensional representations of the data, known as sketches or compressive measurements. These randomized methods are effective in modern data settings since they provide a non-adaptive data- independent mapping of high-dimensional data into a low-dimensional space. However, such methods require strong theoretical guarantees to ensure that the key properties of original data are preserved under a randomized mapping.

This dissertation focuses on the design and analysis of efficient data analytic tasks using randomized dimensionality reduction techniques. Specifically, four efficient signal processing and machine learning algorithms for large high-dimensional data sets are proposed: covariance estimation and principal component analysis, dictionary learning, clustering, and low-rank approximation of positive semidefinite kernel matrices. These techniques are valu- able tools to extract important information and patterns from massive data sets. Moreover, an efficient data sparsification framework is introduced that does not require incoherence and distributional assumptions on the data. A main feature of the proposed compression scheme is that it requires only one pass over the data due to the randomized preconditioning transformation, which makes it applicable to streaming and distributed data settings.

The main contribution of this dissertation is threefold: (1) strong theoretical guarantees are provided to ensure that the proposed randomized methods preserve the key properties and structure of high-dimensional data; (2) tradeoffs between accuracy and memory/computation savings are characterized for a large class of data sets as well as dimensionality reduction methods, including random linear maps and random sampling; (3) extensive numerical experiments are presented to demonstrate the performance and benefits of our proposed methods compared to prior works.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, April 26, 2017

ICLR2017, third and last day.

This is the last day of ICLR 2017. The meeting is be featured live on Facebook here at: https://www.facebook.com/iclr.cc/ . If you want to say hi, I am around.and we're hiring.


Morning Session – Session Chair: Slav Petrov
7.30 – 9.00 Registration
9.00 - 9.40 Invited talk 1: Regina Barzilay
9.40 - 10.00 Contributed talk 1: Learning End-to-End Goal-Oriented Dialog
10.00 - 10.20 Contributed talk 2: Multi-Agent Cooperation and the Emergence of (Natural) Language
10.20 - 10.30 Coffee Break
10.30 - 12.30 Poster Session 1 (Conference Papers, Workshop Papers)
12.30 - 14.30 Lunch provided by ICLR

Afternoon Session – Session Chair: Navdeep Jaitly
14.30 - 15.10 Invited talk 2: Alex Graves
15.10 - 15.30 Contributed Talk 3: Making Neural Programming Architectures Generalize via Recursion - BEST PAPER AWARD
15.30 - 15.50 Contributed Talk 4: Neural Architecture Search with Reinforcement Learning
15.50 - 16.10 Contributed Talk 5: Optimization as a Model for Few-Shot Learning
16.10 - 16.30 Coffee Break
16.30 - 18.30 Poster Session 2 (Conference Papers, Workshop Papers)






Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, April 25, 2017

#ICLR2017 Tuesday Afternoon Program

 
ICLR 2017 continues this afternoon in Toulon, there will be a blog post for each half day that features directly links to papers from the Open review section. The meeting will be featured live on Facebook here at: https://www.facebook.com/iclr.cc/ . If you want to say hi, I am around.and we're hiring.
 
14.00 - 16.00 Poster Session 2 (Conference Papers, Workshop Papers)
16.00 - 16.15 Coffee Break
16.15 - 17.00 Invited talk 2: Riccardo Zecchina
17.00 - 17.20 Contributed Talk 3: Learning to Act by Predicting the Future
17.20 - 17.40 Contributed Talk 4: Reinforcement Learning with Unsupervised Auxiliary Tasks
17.40 - 18.00 Contributed Talk 5: Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
18.00 - 18.10 Group photo at the Stade Félix Mayol
19.00 - 24.00 Gala dinner offered by ICLR

C1: Sigma Delta Quantized Networks 
( code)
C2: Paleo: A Performance Model for Deep Neural Networks
C3: DeepCoder: Learning to Write Programs
C4: Topology and Geometry of Deep Rectified Network Optimization Landscapes
C5: Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights
C6: Learning to Perform Physics Experiments via Deep Reinforcement Learning
C7: Decomposing Motion and Content for Natural Video Sequence Prediction
C8: Calibrating Energy-based Generative Adversarial Networks
C9: Pruning Convolutional Neural Networks for Resource Efficient Inference
C10: Incorporating long-range consistency in CNN-based texture generation
( code )
C11: Lossy Image Compression with Compressive Autoencoders
C12: LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation
C13: Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
C14: Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data
C15: Mollifying Networks
C16: beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
C17: Categorical Reparameterization with Gumbel-Softmax
C18: Online Bayesian Transfer Learning for Sequential Data Modeling
C19: Latent Sequence Decompositions
C20: Density estimation using Real NVP
C21: Recurrent Batch Normalization
C22: SGDR: Stochastic Gradient Descent with Restarts
C23: Variable Computation in Recurrent Neural Networks
C24: Deep Variational Information Bottleneck
C25: SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
C26: TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency
C27: Frustratingly Short Attention Spans in Neural Language Modeling
C28: Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax
C29: LEARNING A NATURAL LANGUAGE INTERFACE WITH NEURAL PROGRAMMER
C30: Designing Neural Network Architectures using Reinforcement Learning
C31: Metacontrol for Adaptive Imagination-Based Optimization (spaceship dataset )
C32: Recurrent Environment Simulators
C33: EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

W1: Lifelong Perceptual Programming By Example
W2: Neu0
W3: Dance Dance Convolution
W4: Bit-Pragmatic Deep Neural Network Computing
W5: On Improving the Numerical Stability of Winograd Convolutions
W6: Fast Generation for Convolutional Autoregressive Models
W7: THE PREIMAGE OF RECTIFIER NETWORK ACTIVITIES
W8: Training Triplet Networks with GAN
W9: On Robust Concepts and Small Neural Nets
W10: Pl@ntNet app in the era of deep learning
W11: Exponential Machines
W12: Online Multi-Task Learning Using Biased Sampling
W13: Online Structure Learning for Sum-Product Networks with Gaussian Leaves
W14: A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples
W15: Compositional Kernel Machines
W16: Loss is its own Reward: Self-Supervision for Reinforcement Learning
W17: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
W18: Precise Recovery of Latent Vectors from Generative Adversarial Networks
W19: Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (code)
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

#ICLR2017 Tuesday Morning Program

 
 
 
So ICLR 2017 continues today in Toulon, there will be a blog post for each half day that features directly links to papers from the Open review section. The meeting will be featured live on Facebook here at: https://www.facebook.com/iclr.cc/ . If you want to say hi, I am around.and we're hiring.


7.30 – 9.00 Registration
9.00 - 9.40 Invited talk 1: Chloé-Agathe Azencott
9.40 - 10.00 Contributed talk 1: Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data - BEST PAPER AWARD
10.00 - 10.20 Contributed talk 2: Learning Graphical State Transitions
10.20 - 10.30 Coffee Break
10.30 - 12.30 Poster Session 1 (Conference Papers, Workshop Papers)


 Conference posters (1st floor)
 
C1: DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning (code)
C2: A SELF-ATTENTIVE SENTENCE EMBEDDING
C3: Deep Probabilistic Programming
C4: Lie-Access Neural Turing Machines
C5: Learning Features of Music From Scratch
C6: Mode Regularized Generative Adversarial Networks
C7: End-to-end Optimized Image Compression (web)
C8: Variational Recurrent Adversarial Deep Domain Adaptation
C9: Steerable CNNs
C10: Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning (code)
C11: PixelVAE: A Latent Variable Model for Natural Images
C12: A recurrent neural network without chaos
C13: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
C14: Tree-structured decoding with doubly-recurrent neural networks
C15: Introspection:Accelerating Neural Network Training By Learning Weight Evolution
C16: Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization (page)
C17: Quasi-Recurrent Neural Networks (Keras)
C18: Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain
C19: A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
C20: Trusting SVM for Piecewise Linear CNNs
C21: Maximum Entropy Flow Networks
C22: The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
C23: Unrolled Generative Adversarial Networks
C24: A Simple but Tough-to-Beat Baseline for Sentence Embeddings (blog entry)
C25: Query-Reduction Networks for Question Answering (code)
C26: Machine Comprehension Using Match-LSTM and Answer Pointer (code)
C27: Words or Characters? Fine-grained Gating for Reading Comprehension
C28: Dynamic Coattention Networks For Question Answering (code)
C29: Multi-view Recurrent Neural Acoustic Word Embeddings
C30: Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement
C31: Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning
C32: Generalizing Skills with Semi-Supervised Reinforcement Learning
C33: Improving Policy Gradient by Exploring Under-appreciated Rewards
 
3rd Floor
 
W1: Programming With a Differentiable Forth Interpreter
W2: Unsupervised Feature Learning for Audio Analysis
W3: Neural Functional Programming
W4: A Smooth Optimisation Perspective on Training Feedforward Neural Networks
W5: Synthetic Gradient Methods with Virtual Forward-Backward Networks
W6: Explaining the Learning Dynamics of Direct Feedback Alignment
W7: Training a Subsampling Mechanism in Expectation
W8: Deep Kernel Machines via the Kernel Reparametrization Trick
W9: Encoding and Decoding Representations with Sum- and Max-Product Networks
W10: Embracing Data Abundance
W11: Variational Intrinsic Control
W12: Fast Adaptation in Generative Models with Generative Matching Networks
W13: Efficient variational Bayesian neural network ensembles for outlier detection
W14: Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols
W15: Adaptive Feature Abstraction for Translating Video to Language
W16: Delving into adversarial attacks on deep policies
W17: Tuning Recurrent Neural Networks with Reinforcement Learning
W18: DeepMask: Masking DNN Models for robustness against adversarial samples
W19: Restricted Boltzmann Machines provide an accurate metric for retinal responses to visual stimuli

 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, April 24, 2017

#ICLR2017 Monday Afternoon Program

 
ICLR 2017 is taking place today in Toulon this week, there will be a blog post for each half day that features directly links to papers and attendant codes if there are any. The meeting will be featured live on Facebook here at: https://www.facebook.com/iclr.cc/ . If you want to say hi, I am around.
 
Afternoon Session – Session Chair: Joan Bruna (sponsored by Baidu) 14.30 - 15.10 Invited talk 2: Benjamin Recht
15.10 - 15.30 Contributed Talk 3: Understanding deep learning requires rethinking generalization - BEST PAPER AWARD
15.30 - 15.50 Contributed Talk 4: Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15.50 - 16.10 Contributed Talk 5: Towards Principled Methods for Training Generative Adversarial Networks
16.10 - 16.30 Coffee Break
16.30 - 18.20 Poster Session 2 (Conference Papers, Workshop Papers)
18.20 - 18.30 Group photo at stadium attached to Neptune Congress Center.
 
C1: Neuro-Symbolic Program Synthesis
C2: Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy (code)
C3: Trained Ternary Quantization (code)
C4: DSD: Dense-Sparse-Dense Training for Deep Neural Networks (code)
C5: A Compositional Object-Based Approach to Learning Physical Dynamics (code, project site)
C6: Multilayer Recurrent Network Models of Primate Retinal Ganglion Cells
C7: Improving Generative Adversarial Networks with Denoising Feature Matching (chainer implementation)
C8: Transfer of View-manifold Learning to Similarity Perception of Novel Objects
C9: What does it take to generate natural textures?
C10: Emergence of foveal image sampling from learning to attend in visual scenes
C11: PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications
C12: Learning to Optimize
C13: Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
C14: Optimal Binary Autoencoding with Pairwise Correlations
C15: On the Quantitative Analysis of Decoder-Based Generative Models (evaluation code)
C16: Adversarial machine learning at scale
C17: Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
C18: Capacity and Learnability in Recurrent Neural Networks
C19: Deep Learning with Dynamic Computation Graphs  (TensorFlow code)
C20: Exploring Sparsity in Recurrent Neural Networks
C21: Structured Attention Networks (code)
C22: Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning
C23: Variational Lossy Autoencoder
C24: Learning to Query, Reason, and Answer Questions On Ambiguous Texts
C25: Deep Biaffine Attention for Neural Dependency Parsing
C26: A Compare-Aggregate Model for Matching Text Sequences (code)
C27: Data Noising as Smoothing in Neural Network Language Models
C28: Neural Variational Inference For Topic Models
C29: Bidirectional Attention Flow for Machine Comprehension (code, page)
C30: Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
C31: Stochastic Neural Networks for Hierarchical Reinforcement Learning
C32: Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning (video)
C33: Third Person Imitation Learning
 
W1: Audio Super-Resolution using Neural Networks (code)
W2: Semantic embeddings for program behaviour patterns
W3: De novo drug design with deep generative models : an empirical study
W4: Memory Matching Networks for Genomic Sequence Classification
W5: Char2Wav: End-to-End Speech Synthesis
W6: Fast Chirplet Transform Injects Priors in Deep Learning of Animal Calls and Speech
W7: Weight-averaged consistency targets improve semi-supervised deep learning results
W8: Particle Value Functions
W9: Out-of-class novelty generation: an experimental foundation
W10: Performance guarantees for transferring representations (presentation, video)
W11: Generative Adversarial Learning of Markov Chains
W12: Short and Deep: Sketching and Neural Networks
W13: Understanding intermediate layers using linear classifier probes
W14: Symmetry-Breaking Convergence Analysis of Certain Two-layered Neural Networks with ReLU nonlinearity
W15: Neural Combinatorial Optimization with Reinforcement Learning (TensorFlow code)
W16: Tactics of Adversarial Attacks on Deep Reinforcement Learning Agents
W17: Adversarial Discriminative Domain Adaptation (workshop extended abstract)
W18: Efficient Sparse-Winograd Convolutional Neural Networks
W19: Neural Expectation Maximization 

 
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

#ICLR2017 Monday Morning Program

 
So ICLR 2017 is taking place today in Toulon this week, there will be a blog post for each half day that features directly links to papers from the Open review section. The meeting will be featured live on Facebook here at: https://www.facebook.com/iclr.cc/ . If you want to say hi, I am around.

Monday April 24, 2017

Morning Session – Session Chair: Dhruv Batra

7.00 - 8.45 Registration
8.45 - 9.00 Opening Remarks
9.00 - 9.40 Invited talk 1: Eero Simoncelli
9.40 - 10.00 Contributed talk 1: End-to-end Optimized Image Compression
10.00 - 10.20 Contributed talk 2: Amortised MAP Inference for Image Super-resolution
10.20 - 10.30 Coffee Break
10.30 - 12.30 Poster Session 1

C1: Making Neural Programming Architectures Generalize via Recursion (slides, code, video)
C2: Learning Graphical State Transitions (code)
C3: Distributed Second-Order Optimization using Kronecker-Factored Approximations
C4: Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes
C5: Neural Program Lattices
C6: Diet Networks: Thin Parameters for Fat Genomics
C7: Unsupervised Cross-Domain Image Generation  (TensorFlow implementation )
C8: Towards Principled Methods for Training Generative Adversarial Networks
C9: Recurrent Mixture Density Network for Spatiotemporal Visual Attention
C10: Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer (PyTorch code)
C11: Pruning Filters for Efficient ConvNets
C12: Stick-Breaking Variational Autoencoders
C13: Identity Matters in Deep Learning
C14: On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
C15: Recurrent Hidden Semi-Markov Model
C16: Nonparametric Neural Networks
C17: Learning to Generate Samples from Noise through Infusion Training
C18: An Information-Theoretic Framework for Fast and Robust Unsupervised Learning via Neural Population Infomax
C19: Highway and Residual Networks learn Unrolled Iterative Estimation
C20: Soft Weight-Sharing for Neural Network Compression (Tutorial)
C21: Snapshot Ensembles: Train 1, Get M for Free
C22: Towards a Neural Statistician
C23: Learning Curve Prediction with Bayesian Neural Networks
C24: Learning End-to-End Goal-Oriented Dialog
C25: Multi-Agent Cooperation and the Emergence of (Natural) Language
C26: Efficient Vector Representation for Documents through Corruption ( code)
C27: Improving Neural Language Models with a Continuous Cache
C28: Program Synthesis for Character Level Language Modeling
C29: Tracking the World State with Recurrent Entity Networks (TensorFlow implementation)
C30: Reinforcement Learning with Unsupervised Auxiliary Tasks (blog post, an implementation )
C31: Neural Architecture Search with Reinforcement Learning ( slides, some implementation of appendix A
C32: Sample Efficient Actor-Critic with Experience Replay
C33: Learning to Act by Predicting the Future
 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, April 22, 2017

Sunday Morning Insight: "No Need for the Map of a Cat, Mr Feynman" or The Long Game in Nanopore Sequencing.



About 5 weeks ago, we wondered how we could tell if the world was changing right before our eyes ?  well, this is happening, instance #2 just got more real:


Nanopore sequencing is a promising technique for genome sequencing due to its portability, ability to sequence long reads from single molecules, and to simultaneously assay DNA methylation. However until recently nanopore sequencing has been mainly applied to small genomes, due to the limited output attainable. We present nanopore sequencing and assembly of the GM12878 Utah/Ceph human reference genome generated using the Oxford Nanopore MinION and R9.4 version chemistry. We generated 91.2 Gb of sequence data (~30x theoretical coverage) from 39 flowcells. De novo assembly yielded a highly complete and contiguous assembly (NG50 ~3Mb). We observed considerable variability in homopolymeric tract resolution between different basecallers. The data permitted sensitive detection of both large structural variants and epigenetic modifications. Further we developed a new approach exploiting the long-read capability of this system and found that adding an additional 5x-coverage of "ultra-long" reads (read N50 of 99.7kb) more than doubled the assembly contiguity. Modelling the repeat structure of the human genome predicts extraordinarily contiguous assemblies may be possible using nanopore reads alone. Portable de novo sequencing of human genomes may be important for rapid point-of-care diagnosis of rare genetic diseases and cancer, and monitoring of cancer progression. The complete dataset including raw signal is available as an Amazon Web Services Open Dataset at: https://github.com/nanopore-wgs-consortium/NA12878.
Here is some context:

And previously on Nuit Blanche:
 
Credit: NASA, JPL




Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, April 21, 2017

Random Feature Expansions for Deep Gaussian Processes / AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models - implementation -

[I will be at ICLR next week, let's grab some coffee if you are there]



Random Feature Expansions for Deep Gaussian Processes by Kurt Cutajar, Edwin V. Bonilla, Pietro Michiardi, Maurizio Filippone
The composition of multiple Gaussian Processes as a Deep Gaussian Process (DGP) enables a deep probabilistic nonparametric approach to flexibly tackle complex machine learning problems with sound quantification of uncertainty. Existing inference approaches for DGP models have limited scalability and are notoriously cumbersome to construct. In this work, we introduce a novel formulation of DGPs based on random feature expansions that we train using stochastic variational inference. This yields a practical learning framework which significantly advances the state-of-the-art in inference for DGPs, and enables accurate quantification of uncertainty. We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers.
A python / TensorFlow implementation can be found here: https://github.com/mauriziofilippone/deep_gp_random_features

We investigate the capabilities and limitations of Gaussian process models by jointly exploring three complementary directions: (i) scalable and statistically efficient inference; (ii) flexible kernels; and (iii) objective functions for hyperparameter learning alternative to the marginal likelihood. Our approach outperforms all previously reported GP methods on the standard MNIST dataset; performs comparatively to previous kernel-based methods using the RECTANGLES-IMAGE dataset; and breaks the 1% error-rate barrier in GP models using the MNIST8M dataset, showing along the way the scalability of our method at unprecedented scale for GP models (8 million observations) in classification problems. Overall, our approach represents a significant breakthrough in kernel methods and GP models, bridging the gap between deep learning approaches and kernel machines.
and here is a recent presentation by one of the author: "Practical and Scalable Inference for Deep Gaussian Processes"

Wednesday, April 19, 2017

Stochastic Gradient Descent as Approximate Bayesian Inference

[I will be at ICLR next week, let's grab some coffee if you are there]


The recent distill pub on Why Momentum Really Works by Gabriel Goh does provide a some insight on why Gradient Descent might work. Overviews such as the ones listed below also do help:
But today, we have an addtional insight in the mapping of SGD to Bayesian inference: Stochastic Gradient Descent as Approximate Bayesian Inference by Stephan MandtMatthew D. HoffmanDavid M. Blei
Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.




Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation


  [Personal message: I will at ICLR next week, let's grab some coffee if you are there]
Yue just sent me the following:
Dear Igor,

I hope all is well.

We recently posted a paper on arXiv on analyzing the exact asymptotic performance of a popular spectral initialization method for various nonconvex signal estimation problems (such as phase retrieval). We think you and readers of your blog might be interested in this research.

The paper can be found here:

https://arxiv.org/abs/1702.06435

Best regards,
Yue
Thanks Yue, two phase transitions ! I like it: Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation by Yue M. Lu, Gen Li
We study a spectral initialization method that serves a key role in recent work on estimating signals in nonconvex settings. Previous analysis of this method focuses on the phase retrieval problem and provides only performance bounds. In this paper, we consider arbitrary generalized linear sensing models and present a precise asymptotic characterization of the performance of the method in the high-dimensional limit. Our analysis also reveals a phase transition phenomenon that depends on the ratio between the number of samples and the signal dimension. When the ratio is below a minimum threshold, the estimates given by the spectral method are no better than random guesses drawn from a uniform distribution on the hypersphere, thus carrying no information; above a maximum threshold, the estimates become increasingly aligned with the target signal. The computational complexity of the method, as measured by the spectral gap, is also markedly different in the two phases. Worked examples and numerical results are provided to illustrate and verify the analytical predictions. In particular, simulations show that our asymptotic formulas provide accurate predictions for the actual performance of the spectral method even at moderate signal dimensions.



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, April 18, 2017

MLHardware: Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent / WRPN: Training and Inference using Wide Reduced-Precision Networks

( Personal message: I will at ICLR next week, let's grab some coffee if you are there. )


As ML is becoming more and more important, the hardware architecture on which it runs needs to change as well. These changes in turns are wholly dependent on a number of trade-offs. Today, we have two such studies, one on the quantization issues in neural networks and another one on the influence of low precision on Stochastic Gradient Descent (something we already seen for  gradient descent )

For computer vision applications, prior works have shown the efficacy of reducing the numeric precision of model parameters (network weights) in deep neural networks but also that reducing the precision of activations hurts model accuracy much more than reducing the precision of model parameters. We study schemes to train networks from scratch using reduced-precision activations without hurting the model accuracy. We reduce the precision of activation maps (along with model parameters) using a novel quantization scheme and increase the number of filter maps in a layer, and find that this scheme compensates or surpasses the accuracy of the baseline full-precision network. As a result, one can significantly reduce the dynamic memory footprint, memory bandwidth, computational energy and speed up the training and inference process with appropriate hardware support. We call our scheme WRPN - wide reduced-precision networks. We report results using our proposed schemes and show that our results are better than previously reported accuracies on ILSVRC-12 dataset while being computationally less expensive compared to previously reported reduced-precision networks.


Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called BUCKWILD! that uses both asynchronous execution and low-precision computation. We introduce the DMGC model, the first conceptualization of the parameter space that exists when implementing low-precision SGD, and show that it provides a way to both classify these algorithms and model their performance. We leverage this insight to propose and analyze techniques to improve the speed of low-precision SGD. First, we propose software optimizations that can increase throughput on existing CPUs by up to 11×. Second, we propose architectural changes, including a new cache technique we call an obstinate cache, that increase throughput beyond the limits of current-generation hardware. We also implement and analyze low-precision SGD on the FPGA, which is a promising alternative to the CPU for future SGD systems.



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, April 17, 2017

F2F: A Library For Fast Kernel Expansions

( Personal message: I will at ICLR next week, let's grab some coffee if you are there. )

F2F is a C++ library for large-scale machine learning. It contains a CPU optimized implementation of the Fastfood algorithm, that allows the computation of approximated kernel expansions in loglinear time. The algorithm requires to compute the product of Walsh-Hadamard Transform (WHT) matrices. A cache friendly SIMD Fast Walsh-Hadamard Transform (FWHT) that achieves compelling speed and outperforms current state-of-the-art methods has been developed. F2F allows to obtain non-linear classification combining Fastfood and a linear classifier.
I am told by one of the author that the library should be out at some point in time.



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Understanding Shallower Networks

We may need to Understanding Deep Neural Networks but we also need to understand Shallower networks




Empirical risk minimization (ERM) is ubiquitous in machine learning and underlies most supervised learning methods. While there has been a large body of work on algorithms for various ERM problems, the exact computational complexity of ERM is still not understood. We address this issue for multiple popular ERM problems including kernel SVMs, kernel ridge regression, and training the final layer of a neural network. In particular, we give conditional hardness results for these problems based on complexity-theoretic assumptions such as the Strong Exponential Time Hypothesis. Under these assumptions, we show that there are no algorithms that solve the aforementioned ERM problems to high accuracy in sub-quadratic time. We also give similar hardness results for computing the gradient of the empirical loss, which is the main computational burden in many non-convex learning tasks.



Remarkable success of deep neural networks has not been easy to analyze theoretically. It has been particularly hard to disentangle relative significance of architecture and optimization in achieving accurate classification on large datasets. On the flip side, shallow methods have encountered obstacles in scaling to large data, despite excellent performance on smaller datasets, and extensive theoretical analysis. Practical methods, such as variants of gradient descent used so successfully in deep learning, seem to perform below par when applied to kernel methods. This difficulty has sometimes been attributed to the limitations of shallow architecture.
In this paper we identify a basic limitation in gradient descent-based optimization in conjunctions with smooth kernels. An analysis demonstrates that only a vanishingly small fraction of the function space is reachable after a fixed number of iterations drastically limiting its power and resulting in severe over-regularization. The issue is purely algorithmic, persisting even in the limit of infinite data.
To address this issue, we introduce EigenPro iteration, based on a simple preconditioning scheme using a small number of approximately computed eigenvectors. It turns out that even this small amount of approximate second-order information results in significant improvement of performance for large-scale kernel methods. Using EigenPro in conjunction with stochastic gradient descent we demonstrate scalable state-of-the-art results for kernel methods on a modest computational budget.
Finally, these results indicate a need for a broader computational perspective on modern large-scale learning to complement more traditional statistical and convergence analyses. In particular, systematic analysis concentrating on the approximation power of algorithms with a fixed computation budget will lead to progress both in theory and practice.


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Printfriendly