Abstract
The automatic identification of an author's demographic traits (e.g., gender, age group) from their written text is termed as author profiling. This problem has become an essential problem in fields like linguistic forensics, marketing, and security. In recent years, online social setups (e.g., Twitter, Facebook, blogs, hotel reviews) have extended remarkably; however, it is easy to provide fake profiles. This research aims to predict the traits of the authors for a benchmark existing corpus, based on Twitter, hotel reviews, social media, and blogs' profiles. In this chapter, the authors have explored four sets of features, including syntactic n-grams of part-of-speech tags, traditional n-grams of part-of-speech tags, combinations of word n-grams, and combinations of character n-grams. They used word unigram and character three-gram as a baseline approach. After analyzing the results, they concluded that the performance improves when the combination of word n-grams is used.
Original language | English (US) |
---|---|
Title of host publication | Handbook of Research on Natural Language Processing and Smart Service Systems |
Publisher | IGI Global |
Pages | 245-265 |
Number of pages | 21 |
ISBN (Electronic) | 9781799847311 |
ISBN (Print) | 9781799847304 |
DOIs | |
State | Published - Oct 2 2020 |
All Science Journal Classification (ASJC) codes
- General Computer Science