Interval Parsing Grammars for File Format Parsing

Jialun Zhang, Greg Morrisett, Gang Tan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.

Original languageEnglish (US)
Article number150
JournalProceedings of the ACM on Programming Languages
Volume7
DOIs
StatePublished - Jun 6 2023

All Science Journal Classification (ASJC) codes

  • Software
  • Safety, Risk, Reliability and Quality

Cite this