Accelerating Substructure Similarity Search for Formula Retrieval

Wei Zhong, Shaurya Rohatgi, Jian Wu, C. Lee Giles, Richard Zanibbi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Scopus citations


Formula retrieval systems using substructure matching are effective, but suffer from slow retrieval times caused by the complexity of structure matching. We present a specialized inverted index and rank-safe dynamic pruning algorithm for faster substructure retrieval. Formulas are indexed from their Operator Tree (OPT) representations. Our model is evaluated using the NTCIR-12 Wikipedia Formula Browsing Task and a new formula corpus produced from Math StackExchange posts. Our approach preserves the effectiveness of structure matching while allowing queries to be executed in real-time.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Proceedings
EditorsJoemon M. Jose, Emine Yilmaz, João Magalhães, Flávio Martins, Pablo Castells, Nicola Ferro, Mário J. Silva
Number of pages14
ISBN (Print)9783030454388
StatePublished - 2020
Event42nd European Conference on IR Research, ECIR 2020 - Lisbon, Portugal
Duration: Apr 14 2020Apr 17 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12035 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference42nd European Conference on IR Research, ECIR 2020

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Accelerating Substructure Similarity Search for Formula Retrieval'. Together they form a unique fingerprint.

Cite this