STRADS-AP: Simplifying distributed machine learning programming without introducing a new programming model

Jin Kyu Kim, Abutalib Aghayev, Garth A. Gibson, Eric P. Xing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

It is a daunting task for a data scientist to convert sequential code for a Machine Learning (ML) model, published by an ML researcher, to a distributed framework that runs on a cluster and operates on massive datasets. The process of fitting the sequential code to an appropriate programming model and data abstractions determined by the framework of choice requires significant engineering and cognitive effort. Furthermore, inherent constraints of frameworks sometimes lead to inefficient implementations, delivering suboptimal performance. We show that it is possible to achieve automatic and efficient distributed parallelization of familiar sequential ML code by making a few mechanical changes to it while hiding the details of concurrency control, data partitioning, task parallelization, and fault-tolerance. To this end, we design and implement a new distributed ML framework, STRADS-Automatic Parallelization (AP), and demonstrate that it simplifies distributed ML programming significantly, while outperforming a popular data-parallel framework with a non-familiar programming model, and achieving performance comparable to an ML-specialized framework.

Original languageEnglish (US)
Title of host publicationProceedings of the 2019 USENIX Annual Technical Conference, USENIX ATC 2019
PublisherUSENIX Association
Pages207-221
Number of pages15
ISBN (Electronic)9781939133038
StatePublished - 2019
Event2019 USENIX Annual Technical Conference, USENIX ATC 2019 - Renton, United States
Duration: Jul 10 2019Jul 12 2019

Publication series

NameProceedings of the 2019 USENIX Annual Technical Conference, USENIX ATC 2019

Conference

Conference2019 USENIX Annual Technical Conference, USENIX ATC 2019
Country/TerritoryUnited States
CityRenton
Period7/10/197/12/19

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'STRADS-AP: Simplifying distributed machine learning programming without introducing a new programming model'. Together they form a unique fingerprint.

Cite this