Detecting demographic bias in automatically generated personas

Joni Salminen, Bernard J. Jansen, Jung Soongyo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

Original languageEnglish (US)
Title of host publicationCHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450359719
DOIs
StatePublished - May 2 2019
Event2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019 - Glasgow, United Kingdom
Duration: May 4 2019May 9 2019

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019
Country/TerritoryUnited Kingdom
CityGlasgow
Period5/4/195/9/19

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Detecting demographic bias in automatically generated personas'. Together they form a unique fingerprint.

Cite this