Logistic regression classifier trained in 91.5 seconds, 46× faster than the best result reported so far
“A growing number of small and medium enterprises rely on machine learning as part of their everyday business.”
—Celestine Dünner, IBM scientist
We have developed an efficient, scalable machine-learning library that enables very fast training of generalized linear models. We have demonstrated that our library can remove the training time as a bottleneck for machine-learning workloads, paving the way to a range of new applications. For instance, it allows more agile development, faster and more fine-grained exploration of the hyper-parameter space, enables scaling to massive datasets and makes frequent retraining of models possible in order to adapt to events as they occur.
“Cloud resources are typically billed by the hour, so the time required to train machine-learning models is directly related to outgoing costs.”
—Thomas Parnell, IBM scientist
Our library, called Snap Machine Learning (Snap ML), combines recent advances in machine-learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern distributed systems. This allows us to leverage available network, memory and heterogeneous compute resources effectively. On a terabyte-scale publicly available dataset for click-through-rate prediction in computational advertising, we demonstrate the training of a logistic regression classifier in 1.53 min.
The three main features that distinguish Snap ML are
- Distributed training: We built our system as a data-parallel framework, enabling us to scale out and train on massive datasets that exceed the memory capacity of a single machine, which is crucial for large-scale applications.
- GPU acceleration: We implemented specialized solvers designed to leverage the massively parallel architecture of GPUs while respecting the data locality in GPU memory to avoid large data transfer overheads. To make this approach scalable, we take advantage of recent developments in heterogeneous learning in order to achieve GPU acceleration even if only a small fraction of the data can indeed be stored in the accelerator memory.
- Sparse data structures: Many machine-learning datasets are sparse. Therefore we employ new optimizations for the algorithms used in our system when applied to sparse data structures.