TY - JOUR
T1 - Interval Parsing Grammars for File Format Parsing
AU - Zhang, Jialun
AU - Morrisett, Greg
AU - Tan, Gang
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6/6
Y1 - 2023/6/6
N2 - File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.
AB - File formats specify how data is encoded for persistent storage. They cannot be formalized as context-free grammars since their specifications include context-sensitive patterns such as the random access pattern and the type-length-value pattern. We propose a new grammar mechanism called Interval Parsing Grammars IPGs) for file format specifications. An IPG attaches to every nonterminal/terminal an interval, which specifies the range of input the nonterminal/terminal consumes. By connecting intervals and attributes, the context-sensitive patterns in file formats can be well handled. In this paper, we formalize IPGs' syntax as well as its semantics, and its semantics naturally leads to a parser generator that generates a recursive-descent parser from an IPG. In general, IPGs are declarative, modular, and enable termination checking. We have used IPGs to specify a number of file formats including ZIP, ELF, GIF, PE, and part of PDF; we have also evaluated the performance of the generated parsers.
UR - http://www.scopus.com/inward/record.url?scp=85161961905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161961905&partnerID=8YFLogxK
U2 - 10.1145/3591264
DO - 10.1145/3591264
M3 - Article
AN - SCOPUS:85161961905
SN - 2475-1421
VL - 7
JO - Proceedings of the ACM on Programming Languages
JF - Proceedings of the ACM on Programming Languages
M1 - 150
ER -