TY - GEN
T1 - Robust coreset construction for distributed machine learning
AU - Lu, Hanlin
AU - Li, Ming Ju
AU - He, Ting
AU - Wang, Shiqiang
AU - Narayanan, Vijaykrishnan
AU - Chan, Kevin S.
PY - 2019/12
Y1 - 2019/12
N2 - Motivated by the need of solving machine learning problems over distributed datasets, we explore the use of emph{coreset} to reduce the communication overhead. Coreset is a summary of the original dataset in the form of a small weighted set in the same sample space. Compared to other data summaries, coreset has the advantage that it can be used as a proxy of the original dataset. However, existing coreset construction algorithms are each tailor-made for a specific machine learning problem. Thus, to solve different machine learning problems, one has to collect coresets of different types, defeating the purpose of saving communication overhead. We resolve this dilemma by developing robust coreset construction algorithms based on k-means/median clustering, that give a provably good approximation for a broad range of machine learning problems with sufficiently continuous cost functions. Through evaluations on diverse datasets and machine learning problems, we verify the robust performance of the proposed algorithms.
AB - Motivated by the need of solving machine learning problems over distributed datasets, we explore the use of emph{coreset} to reduce the communication overhead. Coreset is a summary of the original dataset in the form of a small weighted set in the same sample space. Compared to other data summaries, coreset has the advantage that it can be used as a proxy of the original dataset. However, existing coreset construction algorithms are each tailor-made for a specific machine learning problem. Thus, to solve different machine learning problems, one has to collect coresets of different types, defeating the purpose of saving communication overhead. We resolve this dilemma by developing robust coreset construction algorithms based on k-means/median clustering, that give a provably good approximation for a broad range of machine learning problems with sufficiently continuous cost functions. Through evaluations on diverse datasets and machine learning problems, we verify the robust performance of the proposed algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85081965994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081965994&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM38437.2019.9013625
DO - 10.1109/GLOBECOM38437.2019.9013625
M3 - Conference contribution
T3 - 2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings
BT - 2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Global Communications Conference, GLOBECOM 2019
Y2 - 9 December 2019 through 13 December 2019
ER -