Project Details
Description
Modern computer vision models are trained on sets of images numbering in the billions and yet they are still far less robust than the visual systems of small children who have a much smaller range of visual experiences. This project will use our understanding of how infants experience the world in their first years of life in order to develop new methods of training artificial intelligence programs to decode information they receive from a camera. One advantage that children have over computers is that they experience the visual world as a journey through space, rather than as a series of randomly collected, unrelated images. Children thus have a way to evaluate the similarity of two visual scenes based on the child's vantage point for each scene. The investigators will generate highly realistic scenes modeled on the perspective of a young child moving through a house, which will be used to develop a computer algorithm that learns how to recognize objects, surfaces, and other visual concepts. The work will provide new insights into improving computer vision for real-world problems, a field that is under rapid growth due to its application in areas including household robots, assistive robots, and self-driving cars. The project will support interdisciplinary graduate and postdoctoral training as well as production of widely accessible STEM educational resources through Neuromatch, which is a summer school that emerged during the pandemic as a way to reach students while incurring minimal cost and maintaining a low carbon footprint. The investigators develop a critical theory of visual learning, inspired by how human children learn, with the potential to reshape the fundamentals of learning in computer vision and machine learning. The research hypothesizes that a key ingredient in human visual learning is spatiotemporal contiguity, which is the fact that images in the world are experienced in a sequence as a child moves through space. The project has two components aimed at ultimately developing a new algorithm for visual learning based on human learning. First, a data set will be created using ray-tracing to generate sequences of photorealistic images in a similar way that a child would experience them. Then, these images will be coupled with recent innovations in self-supervised deep learning to determine how spatiotemporal image sequences can augment computer vision using image classification and other tasks as tests. The resulting algorithm will produce artificial neural networks that respond to visual patterns. Those responses can be compared with the responses of neural networks in the human brain as measured through fMRI to determine through representational-similarity analysis if the sequence-learning mechanism is a better approximation of human visual learning than state-of-the-art computer vision methods. Moreover, this analysis technique can be used as a searchlight to highlight the regions in the brain that are most similar to the newly developed artificial neural networks; this is helpful for determining how different brain areas contribute to visual learning. Students supported by this project will conduct research at the interface between psychology and computer science and the project will also contribute to the development of STEM educational resources.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Status | Active |
---|---|
Effective start/end date | 9/15/22 → 8/31/25 |
Funding
- National Science Foundation: $497,009.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.