AttriInfer: Inferring user attributes in online social networks using Markov random fields

Jinyuan Jia, Binghui Wang, Le Zhang, Neil Zhenqiang Gong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

60 Scopus citations

Abstract

In the attribute inference problem, we aim to infer users’ private attributes (e.g., locations, sexual orientation, and interests) using their public data in online social networks. State-of-the-art methods leverage a user’s both public friends and public behaviors (e.g., page likes on Facebook, apps that the user reviewed on Google Play) to infer the user’s private attributes. However, these methods suffer from two key limitations: 1) suppose we aim to infer a certain attribute for a target user using a training dataset, they only leverage the labeled users who have the attribute, while ignoring the label information of users who do not have the attribute; 2) they are inefficient because they infer attributes for target users one by one. As a result, they have limited accuracies and applicability in real-world social networks. In this work, we propose AttriInfer, a new method to infer user attributes in online social networks. AttriInfer can leverage both friends and behaviors, as well as the label information of training users who have an attribute and who do not have the attribute. Specifically, we model a social network as a pairwise Markov Random Field (pMRF). Given a training dataset, which consists of some users who have a certain attribute and some users who do not have a certain attribute, we compute the posterior probability that a target user has the attribute and use the posterior probability to infer attributes. In the basic version of AttriInfer, we use Loopy Belief Propagation (LBP) to compute the posterior probability. However, LBP is not scalable to very large-scale real-world social networks and not guaranteed to converge. Therefore, we further optimize LBP to be scalable and guaranteed to converge. We evaluated our method and compare it with state-of-the-art methods using a real-world Google+ dataset with 5.7M users. Our results demonstrate that our method substantially outperforms state-of-the-art methods in terms of both accuracy and efficiency.

Original languageEnglish (US)
Title of host publication26th International World Wide Web Conference, WWW 2017
PublisherInternational World Wide Web Conferences Steering Committee
Pages1561-1569
Number of pages9
ISBN (Print)9781450349130
DOIs
StatePublished - 2017
Event26th International World Wide Web Conference, WWW 2017 - Perth, Australia
Duration: Apr 3 2017Apr 7 2017

Publication series

Name26th International World Wide Web Conference, WWW 2017

Conference

Conference26th International World Wide Web Conference, WWW 2017
Country/TerritoryAustralia
CityPerth
Period4/3/174/7/17

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this