Robinhood: Tail latency-aware caching - Dynamically reallocating from cache-rich to cache-poor

Daniel S. Berger, Benjamin Berg, Timothy Zhu, Mor Harchol-Balter, Siddhartha Sen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Scopus citations

Abstract

Tail latency is of great importance in user-facing web services. However, maintaining low tail latency is challenging, because a single request to a web application server results in multiple queries to complex, diverse backend services (databases, recommender systems, ad systems, etc.). A request is not complete until all of its queries have completed. We analyze a Microsoft production system and find that backend query latencies vary by more than two orders of magnitude across backends and over time, resulting in high request tail latencies. We propose a novel solution for maintaining low request tail latency: repurpose existing caches to mitigate the effects of backend latency variability, rather than just caching popular data. Our solution, RobinHood, dynamically reallocates cache resources from the cache-rich (backends which don't affect request tail latency) to the cache-poor (backends which affect request tail latency). We evaluate RobinHood with production traces on a 50-server cluster with 20 different backend systems. Surprisingly, we find that RobinHood can directly address tail latency even if working sets are much larger than the cache size. In the presence of load spikes, RobinHood meets a 150ms P99 goal 99.7% of the time, whereas the next best policy meets this goal only 70% of the time.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018
PublisherUSENIX Association
Pages195-212
Number of pages18
ISBN (Electronic)9781939133083
StatePublished - 2007
Event13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018 - Carlsbad, United States
Duration: Oct 8 2018Oct 10 2018

Publication series

NameProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018

Conference

Conference13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018
Country/TerritoryUnited States
CityCarlsbad
Period10/8/1810/10/18

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Cite this