Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text

Vinayshekhar Bannihatti Kumar, Roger Iyengar, Namita Nisal, Yuanyuan Feng, Hana Habib, Peter Story, Sushain Cherivirala, Margaret Hagan, Lorrie Cranor, Shomir Wilson, Florian Schaub, Norman Sadeh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

54 Scopus citations

Abstract

Website privacy policies sometimes provide users the option to opt-out of certain collections and uses of their personal data. Unfortunately, many privacy policies bury these instructions deep in their text, and few web users have the time or skill necessary to discover them. We describe a method for the automated detection of opt-out choices in privacy policy text and their presentation to users through a web browser extension. We describe the creation of two corpora of opt-out choices, which enable the training of classifiers to identify opt-outs in privacy policies. Our overall approach for extracting and classifying opt-out choices combines heuristics to identify commonly found opt-out hyperlinks with supervised machine learning to automatically identify less conspicuous instances. Our approach achieves a precision of 0.93 and a recall of 0.9. We introduce Opt-Out Easy, a web browser extension designed to present available opt-out choices to users as they browse the web. We evaluate the usability of our browser extension with a user study. We also present results of a large-scale analysis of opt-outs found in the text of thousands of the most popular websites.

Original languageEnglish (US)
Title of host publicationThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PublisherAssociation for Computing Machinery, Inc
Pages1943-1954
Number of pages12
ISBN (Electronic)9781450370233
DOIs
StatePublished - Apr 20 2020
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China
Duration: Apr 20 2020Apr 24 2020

Publication series

NameThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan, Province of China
CityTaipei
Period4/20/204/24/20

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text'. Together they form a unique fingerprint.

Cite this