TY - GEN
T1 - Learning Canonical Representations for Scene Graph to Image Generation
AU - Herzig, Roei
AU - Bar, Amir
AU - Xu, Huijuan
AU - Chechik, Gal
AU - Darrell, Trevor
AU - Globerson, Amir
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model that addresses these issues by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes (The project page is available at https://roeiherz.github.io/CanonicalSg2Im/). Our model demonstrates improved empirical performance on large scene graphs, robustness to noise in the input scene graph, and generalization on semantically equivalent graphs. Finally, we show improved performance of the model on three different benchmarks: Visual Genome, COCO, and CLEVR.
AB - Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model that addresses these issues by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes (The project page is available at https://roeiherz.github.io/CanonicalSg2Im/). Our model demonstrates improved empirical performance on large scene graphs, robustness to noise in the input scene graph, and generalization on semantically equivalent graphs. Finally, we show improved performance of the model on three different benchmarks: Visual Genome, COCO, and CLEVR.
UR - http://www.scopus.com/inward/record.url?scp=85097271476&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097271476&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-58574-7_13
DO - 10.1007/978-3-030-58574-7_13
M3 - Conference contribution
AN - SCOPUS:85097271476
SN - 9783030585730
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 210
EP - 227
BT - Computer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings
A2 - Vedaldi, Andrea
A2 - Bischof, Horst
A2 - Brox, Thomas
A2 - Frahm, Jan-Michael
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th European Conference on Computer Vision, ECCV 2020
Y2 - 23 August 2020 through 28 August 2020
ER -