Thursday, February 16, 2017

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks / BranchHull: Convex bilinear inversion from the entrywise product of signals with known signs

Ali just sent me the following:

Hi Igor,

Hope all is going well. Just wanted to share two papers one related to a convex way of addressing bilinear inverse problems and another one addressing the problem of pruning deep neural networks in a convex way:
All the best;-Ali
Cool, thanks Ali !

We consider the bilinear inverse problem of recovering two vectors, x and w, in RL from their entrywise product. For the case where the vectors have known signs and belong to known subspaces, we introduce the convex program BranchHull, which is posed in the natural parameter space and does not require an approximate solution or initialization in order to be stated or solved. Under the structural assumptions that x and w are the members of known K and N dimensional random subspaces, we prove that BranchHull recovers x and wup to the inherent scaling ambiguity with high probability whenever L≳K+N. This program is motivated by applications in blind deconvolution and self-calibration.

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks by Alireza Aghasi, Nam Nguyen, Justin Romberg
Model reduction is a highly desirable process for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Net-Trim is a layer-wise convex framework to prune (sparsify) deep neural networks. The method is applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. The basic idea is to retrain the network layer by layer keeping the layer inputs and outputs close to the originally trained model, while seeking a sparse transform matrix. We present both the parallel and cascade versions of the algorithm. While the former enjoys computational distributability, the latter is capable of achieving simpler models. In both cases, we mathematically show a consistency between the retrained model and the initial trained network. We also derive the general sufficient conditions for the recovery of a sparse transform matrix. In the case of standard Gaussian training samples of dimension N being fed to a layer, and s being the maximum number of nonzero terms across all columns of the transform matrix, we show that O(slogN) samples are enough to accurately learn the layer model.

No comments: