Abstract
In applications of distributed storage systems to distributed computing and implementation of key-value stores, the following property, usually referred to as consistency in distributed computing, is an important requirement: As the data stored changes, the latest version of the data must be accessible to a client that connects to the storage system. Motivated by technological trends where key-value stores are increasingly implemented in high-speed memory, an information theoretic formulation called multi-version coding is introduced in this paper in order to understand and minimize the memory overhead of consistent distributed storage. Multi-version coding is characterized by ν totally ordered versions of a message and a storage system with n servers. At each server, values corresponding to an arbitrary subset of the ν versions are received and encoded. For any subset of c servers in the storage system, the value corresponding to the latest common version or a later version, as per the total ordering, among the c servers is required to be decodable. An achievable multi-version code construction via linear coding and a converse result that shows that the construction is asymptotically tight when ν |(c-1) are provided. An implication of the converse is that there is an inevitable price, in terms of storage cost, to ensure consistency in distributed storage systems.
Original language | English (US) |
---|---|
Pages (from-to) | 4540-4561 |
Number of pages | 22 |
Journal | IEEE Transactions on Information Theory |
Volume | 64 |
Issue number | 6 |
DOIs | |
State | Published - Jun 2018 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences