Cross-language domain adaptation for classifying crisis-related short messages

Muhammad Imran, Prasenjit Mitra, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

Original languageEnglish (US)
Title of host publicationISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management
EditorsPedro Antunes, Victor Amadeo Banuls Silvera, Joao Porto de Albuquerque, Kathleen Ann Moore, Andrea H. Tapia
PublisherInformation Systems for Crisis Response and Management, ISCRAM
ISBN (Electronic)9788460879848
StatePublished - 2016
Event13th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2016 - Rio de Janeiro, Brazil
Duration: May 22 2016May 25 2016

Publication series

NameProceedings of the International ISCRAM Conference
ISSN (Electronic)2411-3387

Other

Other13th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2016
Country/TerritoryBrazil
CityRio de Janeiro
Period5/22/165/25/16

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Cross-language domain adaptation for classifying crisis-related short messages'. Together they form a unique fingerprint.

Cite this