Optimizing CPU Performance for Recommendation Systems At-Scale

Rishabh Jain, Vrushabh Sanghavi, Kiwan Maeng, Scott Cheng, Samvit Kaul, Adwait Jog, Vishwas Kalagi, Meena Arunachalam, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Deep Learning Recommendation Models (DLRMs) are very popular in personalized recommendation systems and are a major contributor to the data-center AI cycles. Due to the high computational and memory bandwidth needs of DLRMs, specifically the embedding stage in DLRM inferences, both CPUs and GPUs are used for hosting such workloads. This is primarily because of the heavy irregular memory accesses in the embedding stage of computation that leads to significant stalls in the CPU pipeline. As the model and parameter sizes keep increasing with newer recommendation models, the computational dominance of the embedding stage also grows, thereby, bringing into question the suitability of CPUs for inference. In this paper, we first quantify the cause of irregular accesses and their impact on caches and observe that off-chip memory access is the main contributor to high latency. Therefore, we exploit two well-known techniques: (1) Software prefetching, to hide the memory access latency suffered by the demand loads and (2) Overlapping computation and memory accesses, to reduce CPU stalls via hyperthreading to minimize the overall execution time. We evaluate our work on a single-core and 24-core configuration with the latest recommendation models and recently released production traces. Our integrated techniques speed up the inference by up to 1.59x, and on average by 1.4x.

Original languageEnglish (US)
Title of host publicationISCA 2023 - Proceedings of the 2023 50th Annual International Symposium on Computer Architecture
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1078-1092
Number of pages15
ISBN (Electronic)9798400700958
DOIs
StatePublished - Jun 17 2023
Event50th Annual International Symposium on Computer Architecture, ISCA 2023 - Orlando, United States
Duration: Jun 17 2023Jun 21 2023

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897

Conference

Conference50th Annual International Symposium on Computer Architecture, ISCA 2023
Country/TerritoryUnited States
CityOrlando
Period6/17/236/21/23

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Optimizing CPU Performance for Recommendation Systems At-Scale'. Together they form a unique fingerprint.

Cite this