TY - JOUR
T1 - Incorporating simulated spatial context information improves the effectiveness of contrastive learning models
AU - Zhu, Lizhen
AU - Wang, James Z.
AU - Lee, Wonseuk
AU - Wyble, Brad
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/5/10
Y1 - 2024/5/10
N2 - Visual learning often occurs in a specific context, where an agent acquires skills through exploration and tracking of its location in a consistent environment. The historical spatial context of the agent provides a similarity signal for self-supervised contrastive learning. We present a unique approach, termed environmental spatial similarity (ESS), that complements existing contrastive learning methods. Using images from simulated, photorealistic environments as an experimental setting, we demonstrate that ESS outperforms traditional instance discrimination approaches. Moreover, sampling additional data from the same environment substantially improves accuracy and provides new augmentations. ESS allows remarkable proficiency in room classification and spatial prediction tasks, especially in unfamiliar environments. This learning paradigm has the potential to enable rapid visual learning in agents operating in new environments with unique visual characteristics. Potentially transformative applications span from robotics to space exploration. Our proof of concept demonstrates improved efficiency over methods that rely on extensive, disconnected datasets.
AB - Visual learning often occurs in a specific context, where an agent acquires skills through exploration and tracking of its location in a consistent environment. The historical spatial context of the agent provides a similarity signal for self-supervised contrastive learning. We present a unique approach, termed environmental spatial similarity (ESS), that complements existing contrastive learning methods. Using images from simulated, photorealistic environments as an experimental setting, we demonstrate that ESS outperforms traditional instance discrimination approaches. Moreover, sampling additional data from the same environment substantially improves accuracy and provides new augmentations. ESS allows remarkable proficiency in room classification and spatial prediction tasks, especially in unfamiliar environments. This learning paradigm has the potential to enable rapid visual learning in agents operating in new environments with unique visual characteristics. Potentially transformative applications span from robotics to space exploration. Our proof of concept demonstrates improved efficiency over methods that rely on extensive, disconnected datasets.
UR - http://www.scopus.com/inward/record.url?scp=85188791417&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85188791417&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2024.100964
DO - 10.1016/j.patter.2024.100964
M3 - Article
C2 - 38800363
AN - SCOPUS:85188791417
SN - 2666-3899
VL - 5
JO - Patterns
JF - Patterns
IS - 5
M1 - 100964
ER -