TY - GEN
T1 - PrivaSeer
T2 - 21st International Conference on Web Engineering, ICWE 2021
AU - Srinath, Mukund
AU - Sundareswara, Soundarya Nurani
AU - Giles, C. Lee
AU - Wilson, Shomir
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Web privacy policies are used by organisations to disclose their privacy practices to users on the web. However, users often do not read privacy policies because they are too long, time consuming, or too complicated. Attempts to simplify privacy policies using natural language processing have achieved some success, but they face limitations of scalability and generalization. While this puts an onus on researchers and policy regulators to protect users against unfair privacy practices, they often lack a large-scale collection of policies to study the state of internet privacy. To remedy this bottleneck, we present PrivaSeer, the first privacy policy search engine. PrivaSeer has been indexed on 1,400,318 English language website privacy policies and can be used to search privacy policies based on text queries and several search facets. Results can be ranked by PageRank, query-based document relevance, and the probability that a document is a privacy policy. Results also can be filtered by readability, vagueness, industry, and mentions of tracking technology, self-regulatory bodies, or regulations and cross-border agreements in the policy text. PrivaSeer allows legal experts, researchers, and policy regulators to discover privacy trends and policy anomalies in privacy policies at scale. In this paper we present the search interface, ranking technique, and filtering techniques for PrivaSeer. We create two indexes of privacy policies: one including supplementary non-policy content present in privacy policy web pages and one without. We evaluate the functionality of PrivaSeer by comparing ranking techniques on these two indexes.
AB - Web privacy policies are used by organisations to disclose their privacy practices to users on the web. However, users often do not read privacy policies because they are too long, time consuming, or too complicated. Attempts to simplify privacy policies using natural language processing have achieved some success, but they face limitations of scalability and generalization. While this puts an onus on researchers and policy regulators to protect users against unfair privacy practices, they often lack a large-scale collection of policies to study the state of internet privacy. To remedy this bottleneck, we present PrivaSeer, the first privacy policy search engine. PrivaSeer has been indexed on 1,400,318 English language website privacy policies and can be used to search privacy policies based on text queries and several search facets. Results can be ranked by PageRank, query-based document relevance, and the probability that a document is a privacy policy. Results also can be filtered by readability, vagueness, industry, and mentions of tracking technology, self-regulatory bodies, or regulations and cross-border agreements in the policy text. PrivaSeer allows legal experts, researchers, and policy regulators to discover privacy trends and policy anomalies in privacy policies at scale. In this paper we present the search interface, ranking technique, and filtering techniques for PrivaSeer. We create two indexes of privacy policies: one including supplementary non-policy content present in privacy policy web pages and one without. We evaluate the functionality of PrivaSeer by comparing ranking techniques on these two indexes.
UR - http://www.scopus.com/inward/record.url?scp=85111146700&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111146700&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-74296-6_22
DO - 10.1007/978-3-030-74296-6_22
M3 - Conference contribution
AN - SCOPUS:85111146700
SN - 9783030742959
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 286
EP - 301
BT - Web Engineering - 21st International Conference, ICWE 2021, Proceedings
A2 - Brambilla, Marco
A2 - Chbeir, Richard
A2 - Frasincar, Flavius
A2 - Manolescu, Ioana
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 18 May 2021 through 21 May 2021
ER -