Behavioral studies show that the unisensory representations underlying within-modal visual and haptic object recognition are strikingly similar in terms of view- and size-sensitivity, and integration of structural and surface properties. However, the basis for these attributes differs in each modality, indicating that while these representations are functionally similar, they are not identical. Imaging studies reveal bisensory, visuo-haptic object selectivity, notably in the lateral occipital complex and the intraparietal sulcus, that suggests a shared representation of objects. Such a multisensory representation could underlie visuo-haptic cross-modal object recognition. In this chapter, we compare visual and haptic within-modal object recognition and trace a progression from functionally similar but separate unisensory representations to a shared multisensory representation underlying cross-modal object recognition as well as view-independence, regardless of modality. We outline, and provide evidence for, a model of multisensory object recognition in which representations are flexibly accessible via top-down or bottom-up processing, the choice of route being influenced by object familiarity and individual preference along the object-spatial continuum of mental imagery.