Abstract
File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.
| Original language | English (US) |
|---|---|
| Article number | 150 |
| Journal | Proceedings of the ACM on Programming Languages |
| Volume | 7 |
| DOIs | |
| State | Published - Jun 6 2023 |
All Science Journal Classification (ASJC) codes
- Software
- Safety, Risk, Reliability and Quality
Fingerprint
Dive into the research topics of 'Interval Parsing Grammars for File Format Parsing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver