Robust coreset construction for distributed machine learning

Hanlin Lu, Ming Ju Li, Ting He, Shiqiang Wang, Vijaykrishnan Narayanan, Kevin S. Chan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


Motivated by the need of solving machine learning problems over distributed datasets, we explore the use of emph{coreset} to reduce the communication overhead. Coreset is a summary of the original dataset in the form of a small weighted set in the same sample space. Compared to other data summaries, coreset has the advantage that it can be used as a proxy of the original dataset. However, existing coreset construction algorithms are each tailor-made for a specific machine learning problem. Thus, to solve different machine learning problems, one has to collect coresets of different types, defeating the purpose of saving communication overhead. We resolve this dilemma by developing robust coreset construction algorithms based on k-means/median clustering, that give a provably good approximation for a broad range of machine learning problems with sufficiently continuous cost functions. Through evaluations on diverse datasets and machine learning problems, we verify the robust performance of the proposed algorithms.

Original languageEnglish (US)
Title of host publication2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728109626
StatePublished - Dec 2019
Event2019 IEEE Global Communications Conference, GLOBECOM 2019 - Waikoloa, United States
Duration: Dec 9 2019Dec 13 2019

Publication series

Name2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings


Conference2019 IEEE Global Communications Conference, GLOBECOM 2019
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Signal Processing
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Media Technology
  • Health Informatics


Dive into the research topics of 'Robust coreset construction for distributed machine learning'. Together they form a unique fingerprint.

Cite this