A Data and Platform-Aware Framework For Large-Scale Machine Learning (pdf is here) by Azalia Mirhoseini
Over the last decade, significant strides have been made in parallel research domains to address the challenges of big data, including efficient algorithmic solutions, advanced computing architectures, and high performance computing engines. My thesis bridges these different domains to provide a framework for scalable machine learning that holistically takes into account the data geometry, learning algorithms, and properties of the underlying computing fabrics. Several classes of fast-growing data, including image and video content, contain non-sparse dependencies, i.e., a large number of non-zeros in their data correlation matrix. Such dependencies severely degrade the performance of most contemporary learning algorithms. This is because these algorithms typically require iterative updates on the correlation matrix to converge. The challenges are exacerbated when the iterative updates have to be applied to data that is distributed across multiple processing nodes. In distributed settings, the dense structure leads to increased iterative computations and inter-node communications, which can quickly become untenable in larger datasets. I have developed RankMap, the first domain-specific data and platform-aware solution to enable highly efficient learning on large and non-sparse datasets in distributed settings. The key idea is that, despite the apparent density and dimensionality, in many datasets, the information may lie on a single or a union of lower dimensional subspaces. I use this property to transform the dense but structured data into a new sparse domain where distributed computing becomes much more efficient. The transformed data becomes a product of a low-dimensional dictionary matrix and a large sparse coefficient matrix. The sparsity property is then exploited to create an efficient data partitioning and execution flow for the iterative algorithms. RankMap leverages the degrees of freedom in the sparse decomposition to tailor the transformation for the target computing platform. By defining a platform-specific computing metric that measures the overall computational and communication cost, RankMap is able to fine-tune the transformation to trade off overheads in the computing platform. For example, in a platform with slower links between nodes, a communication-minimizing transformation might be favorable over a transformation that minimizes the number of FLOPs. My framework has enabled graph-parallel distributed engines, that heavily rely on sparse connections, to become applicable to dense but structured data. In particular, I have developed an efficient mapping of the iterative learning algorithms on the transformed data that conforms to the vertex-centric computing format of GraphLab, a highly successful graph-based distributed machine learning engine. We have implemented RankMap in C++/MPI, as well as a GraphLab module, which are now available as open-source APIs. We have also developed a hardware-accelerated high-throughput version of RankMap that is customizable, resource-aware, and supports streaming input data. RankMap is a generic framework than can be applied to a vast range of problems ranging from cone optimization to regularized loss minimization problems, such as support vector machines (SVMs), ridge, or lasso. Evaluations of L1-regularization, power method, and SVM algorithms on datasets that contain billions of non-zeros demonstrate that RankMap can achieve up to two orders of magnitude improvement in runtime, energy, and memory usage simultaneously compared to the state-of-the art methods.
RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets by Azalia Mirhoseini, Eva.L. Dyer, Ebrahim.M. Songhori, Richard Baraniuk, Farinaz Koushanfar
This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense datasets. In contrast to the existing dense (iterative) data analysis methods that are oblivious to the platform, for the first time, we introduce novel scalable data transformation and mapping algorithms that enable optimizing for the underlying computing platforms' cost/constraints. The cost is defined by the number of arithmetic and (within-platform) message passing operations incurred by the variable updates in each iteration, while the constraints are set by the available memory resources. RankMap's transformation scalably factorizes data into an ensemble of lower dimensional subspaces, while its mapping schedules the flow of iterative computation on the transformed data onto the pertinent computing machine. We show a trade-off between the desired level of accuracy for the learning algorithm and the achieved efficiency. RankMap provides two APIs, one matrix-based and one graph-based, which facilitate automated adoption of the framework for performing several contemporary iterative learning applications optimized to the platform. To demonstrate the utility of RankMap, we solve sparse recovery and power iteration problems on various real-world datasets with up to 1.8 billion non-zeros. Our evaluations are performed on Amazon EC2 and IBM iDataPlex platforms using up to 244 cores. The results demonstrate up to 2 orders of magnitude improvements in memory usage, execution speed, and bandwidth compared with the best reported prior work.
RankMap's implementation is here: https://github.com/azalia/RankMap
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.