Abstract
A significant body of research is dedicated to developing language models that can detect various types of online abuse, for example, hate speech, cyberbullying. However, there is a disconnect between platform policies, which often consider the author's intention as a criterion for content moderation, and the current capabilities of detection models, which typically lack efforts to capture intent. This paper examines the role of intent in the moderation of abusive content. Specifically, we review state-of-the-art detection models and benchmark training datasets to assess their ability to capture intent. We propose changes to the design and development of automated detection and moderation systems to improve alignment with ethical and policy conceptualizations of these abuses.
| Original language | English (US) |
|---|---|
| Journal | Harvard Kennedy School Misinformation Review |
| Volume | 6 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2025 |
All Science Journal Classification (ASJC) codes
- Social Sciences (miscellaneous)
Fingerprint
Dive into the research topics of 'The unappreciated role of intent in algorithmic moderation of abusive content on social media'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver