TY - GEN
T1 - A large-scale exploration of terms of service documents on the web
AU - Sundareswara, Soundarya Nurani
AU - Srinath, Mukund
AU - Wilson, Shomir
AU - Lee Giles, C.
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/8/16
Y1 - 2021/8/16
N2 - Terms of service documents are a common feature of organizations' websites. Although there is no blanket requirement for organizations to provide these documents, their provision often serves essential legal purposes. Users of a website are expected to agree with the contents of a terms of service document, but users tend to ignore these documents as they are often lengthy and difficult to comprehend. As a step towards understanding the landscape of these documents at a large scale, we present a first-of-its-kind terms of service corpus containing 247,212 English language terms of service documents obtained from company websites sampled from Free Company Dataset. We examine the URLs and contents of the documents and find that some websites that purport to post terms of service actually do not provide them. We analyze reasons for unavailability and determine the overall availability of terms of service in a given set of website domains. We also identify that some websites provide an agreement that combines terms of service with a privacy policy, which is often an obligatory separate document. Using topic modeling, we analyze the themes in these combined documents by comparing them with themes found in separate terms of service and privacy policies. Results suggest that such single-page agreements miss some of the most prevalent topics available in typical privacy policies and terms of service documents and that many disproportionately cover privacy policy topics as compared to terms of service topics.
AB - Terms of service documents are a common feature of organizations' websites. Although there is no blanket requirement for organizations to provide these documents, their provision often serves essential legal purposes. Users of a website are expected to agree with the contents of a terms of service document, but users tend to ignore these documents as they are often lengthy and difficult to comprehend. As a step towards understanding the landscape of these documents at a large scale, we present a first-of-its-kind terms of service corpus containing 247,212 English language terms of service documents obtained from company websites sampled from Free Company Dataset. We examine the URLs and contents of the documents and find that some websites that purport to post terms of service actually do not provide them. We analyze reasons for unavailability and determine the overall availability of terms of service in a given set of website domains. We also identify that some websites provide an agreement that combines terms of service with a privacy policy, which is often an obligatory separate document. Using topic modeling, we analyze the themes in these combined documents by comparing them with themes found in separate terms of service and privacy policies. Results suggest that such single-page agreements miss some of the most prevalent topics available in typical privacy policies and terms of service documents and that many disproportionately cover privacy policy topics as compared to terms of service topics.
UR - http://www.scopus.com/inward/record.url?scp=85113634890&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113634890&partnerID=8YFLogxK
U2 - 10.1145/3469096.3474940
DO - 10.1145/3469096.3474940
M3 - Conference contribution
AN - SCOPUS:85113634890
T3 - DocEng 2021 - Proceedings of the 2021 ACM Symposium on Document Engineering
BT - DocEng 2021 - Proceedings of the 2021 ACM Symposium on Document Engineering
PB - Association for Computing Machinery, Inc
T2 - 21st ACM Symposium on Document Engineering, DocEng 2021
Y2 - 24 August 2021 through 27 August 2021
ER -