TY - GEN
T1 - Discovering health-related knowledge in social media using ensembles of heterogeneous features
AU - Tuarob, Suppawong
AU - Tucker, Conrad S.
AU - Salathe, Marcel
AU - Ram, Nilam
PY - 2013
Y1 - 2013
N2 - Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.
AB - Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.
UR - http://www.scopus.com/inward/record.url?scp=84889601386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889601386&partnerID=8YFLogxK
U2 - 10.1145/2505515.2505629
DO - 10.1145/2505515.2505629
M3 - Conference contribution
AN - SCOPUS:84889601386
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1685
EP - 1690
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 27 October 2013 through 1 November 2013
ER -