TY - GEN
T1 - Aggregate estimation over a microblog platform
AU - Thirumuruganathan, Saravanan
AU - Zhang, Nan
AU - Hristidis, Vagelis
AU - Das, Gautam
PY - 2014/6/1
Y1 - 2014/6/1
N2 - Microblogging platforms such as Twitter have experienced a phenomenal growth of popularity in recent years, making them attractive platforms for research in diverse fields from computer science to sociology. However, most microblogging platforms impose strict access restrictions (e.g., API rate limits) that prevent scientists with limited resources-e.g., who cannot afford microblog-data-access subscriptions offered by GNIP et al.-to leverage the wealth of microblogs for analytics. For example, Twitter allows only 180 queries per 15 minutes, and its search API only returns tweets posted within the last week. In this paper, we consider a novel problem of estimating aggregate queries over microblogs, e.g., "how many users mentioned the word 'privacy' in 2013?". We propose novel solutions exploiting the user-timeline information that is publicly available in most microblogging platforms. Theoretical analysis and extensive real-world experiments over Twitter, Google+ and Tumblr confirm the effectiveness of our proposed techniques.
AB - Microblogging platforms such as Twitter have experienced a phenomenal growth of popularity in recent years, making them attractive platforms for research in diverse fields from computer science to sociology. However, most microblogging platforms impose strict access restrictions (e.g., API rate limits) that prevent scientists with limited resources-e.g., who cannot afford microblog-data-access subscriptions offered by GNIP et al.-to leverage the wealth of microblogs for analytics. For example, Twitter allows only 180 queries per 15 minutes, and its search API only returns tweets posted within the last week. In this paper, we consider a novel problem of estimating aggregate queries over microblogs, e.g., "how many users mentioned the word 'privacy' in 2013?". We propose novel solutions exploiting the user-timeline information that is publicly available in most microblogging platforms. Theoretical analysis and extensive real-world experiments over Twitter, Google+ and Tumblr confirm the effectiveness of our proposed techniques.
UR - http://www.scopus.com/inward/record.url?scp=84904339927&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904339927&partnerID=8YFLogxK
U2 - 10.1145/2588555.2610517
DO - 10.1145/2588555.2610517
M3 - Conference contribution
AN - SCOPUS:84904339927
SN - 9781450323765
SN - 9781450323765
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1519
EP - 1530
BT - SIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
Y2 - 22 June 2014 through 27 June 2014
ER -