Fast Sentence Classification using Word Co-occurrence Graphs

Ashirbad Mishra, Shad Kirmani, Kamesh Madduri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider a supervised classification problem of categorizing e-commerce products based on just the words in the title. If done in real-time, the categorization can greatly benefit sellers by enabling them to offer immediate feedback. We present a deterministic algorithm by constructing weighted word co-occurrence graphs from the listing/item titles. We empirically evaluate this algorithm on two publicly available product listing datasets, Etsy and Amazon. Our method's accuracy is comparable to that of a supervised classifier constructed using the fastText library. The inference time of our model is up to 2.9× faster than the fastText classifier and has small training times. The training and inference of our model scales well for big datasets performing large-scale classification on millions of listings. We perform a detailed analysis and provide insights into our method and the product categorization task.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages620-629
Number of pages10
ISBN (Electronic)9798350362480
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States
Duration: Dec 15 2024Dec 18 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Big Data, BigData 2024

Conference

Conference2024 IEEE International Conference on Big Data, BigData 2024
Country/TerritoryUnited States
CityWashington
Period12/15/2412/18/24

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Fast Sentence Classification using Word Co-occurrence Graphs'. Together they form a unique fingerprint.

Cite this