Author profiling using texts in social networks

Iqra Ameer, Grigori Sidorov

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Scopus citations

Abstract

The automatic identification of an author's demographic traits (e.g., gender, age group) from their written text is termed as author profiling. This problem has become an essential problem in fields like linguistic forensics, marketing, and security. In recent years, online social setups (e.g., Twitter, Facebook, blogs, hotel reviews) have extended remarkably; however, it is easy to provide fake profiles. This research aims to predict the traits of the authors for a benchmark existing corpus, based on Twitter, hotel reviews, social media, and blogs' profiles. In this chapter, the authors have explored four sets of features, including syntactic n-grams of part-of-speech tags, traditional n-grams of part-of-speech tags, combinations of word n-grams, and combinations of character n-grams. They used word unigram and character three-gram as a baseline approach. After analyzing the results, they concluded that the performance improves when the combination of word n-grams is used.

Original languageEnglish (US)
Title of host publicationHandbook of Research on Natural Language Processing and Smart Service Systems
PublisherIGI Global
Pages245-265
Number of pages21
ISBN (Electronic)9781799847311
ISBN (Print)9781799847304
DOIs
StatePublished - Oct 2 2020

All Science Journal Classification (ASJC) codes

  • General Computer Science

Cite this