Open e-commerce 1.0, five years of crowdsourced U.S. Amazon purchase histories with user demographics

Alex Berke, Dan Calacci, Robert Mahari, Takahiro Yabe, Kent Larson, Sandy Pentland

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This is a first-of-its-kind dataset containing detailed purchase histories from 5027 U.S. Amazon.com consumers, spanning 2018 through 2022, with more than 1.8 million purchases. Consumer spending data are customarily collected through government surveys to produce public datasets and statistics, which serve public agencies and researchers. Companies now collect similar data through consumers’ use of digital platforms at rates superseding data collection by public agencies. We published this dataset in an effort towards democratizing access to rich data sources routinely used by companies. The data were crowdsourced through an online survey and shared with participants’ informed consent. Data columns include order date, product code, title, price, quantity, and shipping address state. Each purchase history is linked to survey data with information about participants’ demographics, lifestyle, and health. We validate the dataset by showing expenditure correlates with public Amazon sales data (Pearson r = 0.978, p < 0.001) and conduct analyses of specific product categories, demonstrating expected seasonal trends and strong relationships to other public datasets.

Original languageEnglish (US)
Article number491
JournalScientific Data
Volume11
Issue number1
DOIs
StatePublished - Dec 2024

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Information Systems
  • Education
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Cite this