Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text

Mukund Srinath, Lee Matheson, Pranav Narayanan Venkit, Gabriela Zanfir-Fortuna, Florian Schaub, C. Lee Giles, Shomir Wilson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

The General Data Protection Regulation (GDPR) and other recent privacy laws require organizations to post their privacy policies, and place specific expectations on organisations' privacy practices. Privacy policies take the form of documents written in natural language, and one of the expectations placed upon them is that they remain up to date. To investigate legal compliance with this recency requirement at a large scale, we create a novel pipeline that includes crawling, regex-based extraction, candidate date classification and date object creation to extract updated and effective dates from privacy policies written in English. We then analyze patterns in policy dates using four web crawls and find that only about 40% of privacy policies online contain a date, thereby making it difficult to assess their regulatory compliance. We also find that updates in privacy policies are temporally concentrated around passage of laws regulating digital privacy (such as the GDPR), and that more popular domains are more likely to have policy dates as well as more likely to update their policies regularly.

Original languageEnglish (US)
Title of host publicationDocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400700279
DOIs
StatePublished - Aug 22 2023
Event2023 ACM Symposium on Document Engineering, DocEng 2023 - Limerick, Ireland
Duration: Aug 22 2023Aug 25 2023

Publication series

NameDocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering

Conference

Conference2023 ACM Symposium on Document Engineering, DocEng 2023
Country/TerritoryIreland
CityLimerick
Period8/22/238/25/23

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text'. Together they form a unique fingerprint.

Cite this